string concatenation

C

Casper H.S. Dik

James Antill said:
All the documentation, including the above, is laughable compared to
normal std. C text. Which is probably why there are a couple of different
implementations and the two major non OpenBSD implementations are
_documented_ to act differently than the OpenBSD versions do.


Please cite the specific differences between the implementations;
I don't know of any.


Casper
 
J

James Antill

Please cite the specific differences between the implementations; I don't
know of any.

The "well known" one is the return value for strlcat(), see...

http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/library-c.html#STRLCPY
http://docs.sun.com/db/doc/806-0627/6j9vhfn7h?a=view#indexterm-1028
http://docs.sun.com/db/doc/816-0213/6m6ne38cf?a=view#indexterm-1099

....the later two links might mean that this has changed for 2.9 but
my reading of it doesn't suggest that. However even though glib commits
said they followed the Solaris spec. the code is basically the same as
that in OpenBSD, so i guess there's movement for Solaris to change.

Of course there's always...

http://www.oreillynet.com/pub/a/network/2003/05/20/secureprogckbk.html

....where the strlcat() function is just broken.
 
J

James Antill

They weren't implemented subtly different on Solaris; our implementation
is a straightforward clone from the original OpenBSD version and behaves
exactly the same.

That's what I would have expected, but I'd heard that they were
reimplemented to follow the "spec" and the return value was different in
the Solaris version[1].
I don't have access to a Solaris machine to check, and I apologize if
that wasn't the case.

[1]

http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/library-c.html#STRLCPY
 
C

Casper H.S. Dik

James Antill said:
Please cite the specific differences between the implementations; I don't
know of any.
[/QUOTE]
The "well known" one is the return value for strlcat(), see...

For the bogus "size = 0" case; you can't have a string that takes 0 bytes
of storage. I don't consider that a serious impediment to portable
programming because the strlcat with a size == 0 argument is something that
I do not expect to happen.

Any call of "strlcat(dst, src, len)" where strlen(dst) >= len is suspect
and I would call any results in that domain "unspecified" and certainly not
any obstacle to using the functions in portable programs. Certainly not
sufficient to give this fact that much attention.

Come to think of it, abort() is possibly the better way to handle this.

Strange that the Solaris source does reference beyond the "len" in dest
because I distinctly remember telling Theo and Todd about the OpenBSD source
doing the same when we were about to implement it in Solaris.

you can find this fact in the CVS commit history at:

http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/strlcat.c

rev 1.2

A case of "do as I say, not do as I do", I guess; I see about
getting us to follow suit.

Casper
 
N

nrk

Richard said:
Not in ISO C, they haven't. Maybe in POSIX.

Richard

I was introduced to them by someone who worked on *BSD platforms quite a
bit, and ended up writing equivalents of my own. This being a boring
afternoon, here's my implementation of strlcpy and strlcat. Comments most
likely lifted from *BSD sources as I am always too lazy to write them
myself. As everyone knows, my code *never* has any bugs :)

strlcpy.c:
#include <stddef.h>

/*
* Copies src to string dst of size sz. sz is the full size of dst
* including the NUL character. At most sz-1 characters will be
* copied. Always NUL terminates dst, and doesn't fill dst with NULs
* like strncpy when sz > strlen(src). src *MUST* be a valid NUL
* terminated string.
*
* Returns strlen(src).
*
* If retval >= sz, truncation occurred.
*/
size_t strlcpy(char *dst, const char *src, size_t sz) {
register char *d = dst;
register const char *s = src;
register size_t n = sz;

if ( n ) {
--n;
while ( n && *s ) *d++ = *s++, --n;
*d = 0;
}

while ( *s++ );

return s - src - 1;
}

strlcat.c:

#include <stddef.h>

/*
* Appends src to string dst of size sz (unlike strncat, sz is the
* full size of dst, not space left). At most sz-1 characters will be
* copied. Always NUL terminates (unless sz <= strlen(dst)).
*
* Returns strlen(src) + MIN(sz, strlen(initial dst)).
*
* If retval >= sz, truncation occurred.
*/
size_t strlcat(char *dst, const char *src, size_t sz) {
register char *d = dst;
register const char *s = src;
register size_t n = sz;

if ( n ) {
--n;
while ( n && *d ) ++d, --n;
if ( n ) {
while ( n && *s ) *d++ = *s++, --n;
*d = 0;
}
n = d - dst + (*d != 0);
}

src = s;
while ( *s++ ) ;

return n + (s - src - 1);
}

-nrk.
 
S

Sean Burke

James Antill said:
Not only are they not in std. C, they aren't available at all on any
of the major Linux variants. They also are/were implemented subtly
differently on Solaris.

You probably also want to read:

http://www.and.org/vstr/security.html#alloc

An excellent summary of the reasons why strlcpy/cat are valuable.
I don't think availability is much of a problem either, since the
OpenBSD implementations are:

- Free as in beer
- Free as in speech
- A whopping 30 lines of code apiece.

-SEan
 
D

Douglas A. Gwyn

nrk said:
* ... Always NUL terminates dst, ...

Not when sz==0.
* .... src *MUST* be a valid NUL terminated string.

It doesn't need to be. It must point to at least sz-1
bytes and is terminated by the first 0-valued byte.

If this stuff is supposed to be used for safety-critical
programming, its interface specification should be
accurate. Also, once you fix it up you should see that
both example functions have behavior that is relatively
complicated to describe, which makes it relatively
difficult to use in provably correct algorithms.

It would be much better if a function either completes
a simply described task or else does nothing and reports
failure.
 
N

nrk

Douglas said:
Not when sz==0.

When sz == 0, there is no space to put the terminating NUL. I think that's
sort of self-evident :)
It doesn't need to be. It must point to at least sz-1
bytes and is terminated by the first 0-valued byte.

From the relevant code:
register const char *s = src;
...
while ( *s++ );

Seems to me that nasal demons shall fly if src is not NUL terminated. Or
are you saying the specification needs to be changed? What am I missing?
The whole point is that you could send a src string that's larger than your
destination, in which case the copy is still safe, but the return value
tells you that truncation occurred.
If this stuff is supposed to be used for safety-critical
programming, its interface specification should be
accurate. Also, once you fix it up you should see that
both example functions have behavior that is relatively
complicated to describe, which makes it relatively
difficult to use in provably correct algorithms.

The only questionably difficult interface is strlcat. But the difficult
situation shouldn't arise in "well-behaved" programs (it happens when you
pass in a size for the destination that is <= strlen(destination)). As for
correctness, you can guarantee that the functions will:
a) Never write more than what you specify in the sz parameter
b) Result in NUL terminated dst (with the exception above)
c) Clearly report truncation whenever it occurs
It would be much better if a function either completes
a simply described task or else does nothing and reports
failure.

These are supposed to be analogous to strncpy and strncat. Under those
circumstances, I am not sure what failure means. My guess is that it's up
to the user to determine what constitutes failure and what actions to take.
Hence the design that says we'll do our best and tell the user that not
everything was ok. It's upto the user to decide then what to do with that
information. IMHO these functions (not particularly my implementations
though) are quite well thought out and well designed. They do report
"failure" (assuming that means truncation), simply and quite effectively.
I don't understand why you would bother with functional specifications of
what they do to dst when you are solely interested in distinguishing
"failure" and "success".

-nrk.
 
D

Dan Pop

Presumably in at least some cases they could be provided by the
implementation, and in such a case programs using them would not
invoke UB, correct?

Wrong. Calling such a function *unconditionally* invokes undefined
behaviour.
This could potentially raise some tricky
questions about what is and what is not part of the implementation, I
suppose.

For the purpose of our discussion, it is enough that the function in
question is not part of the standard C library.

An implementation is free to provide such a function as an extension
without having its conformance affected, but *any* code using it invokes
undefined behaviour.

Dan
 
J

James Kuyper

Presumably in at least some cases they could be provided by the
implementation, and in such a case programs using them would not
invoke UB, correct? This could potentially raise some tricky

You need to be careful when you read the standard, it contains many
pieces of technical jargon which are defined by the standard itself to
mean something not quite the same as the ordinary english definition.
This is one example. "undefined behavior" doesn't mean "behavior that
is undefined"; it's a piece of technical jargon used by the C
standard, which roughly means "behavior that is not defined by the C
standard".
The behavior of such code remains UB even if the implementation
chooses to provide it's own definition. UB includes, as one of it's
literal infinity of possibilities, behaving exactly as the
implementation has defined it to behave.
 
M

Michael Wojcik

You need to be careful when you read the standard, it contains many
pieces of technical jargon which are defined by the standard itself to
mean something not quite the same as the ordinary english definition.
This is one example. "undefined behavior" doesn't mean "behavior that
is undefined"; it's a piece of technical jargon used by the C
standard, which roughly means "behavior that is not defined by the C
standard".

Right. I understood that...
The behavior of such code remains UB even if the implementation
chooses to provide it's own definition. UB includes, as one of it's
literal infinity of possibilities, behaving exactly as the
implementation has defined it to behave.

....but missed this bit when I glanced over the Standard before posting
that message. I somehow got the impression that if the implementation
provided additional str* functions (with external linkage), a program
running under that implementation could invoke them without causing UB.
Which doesn't make much sense, now that I think about it.

(Thanks also to Dan for posting a similar correction.)

--
Michael Wojcik (e-mail address removed)

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
-- David Bonde
 
S

Sean Burke

Brian Inglis said:
You mispelled n as noted below.


You must be one of those Pascal types who like to access strings as
character arrays with indices, instead of pointers as K&R intended.
You wouldn't have liked BCPL, B, or NB either! ;^>

The BSD string functions strl{cat,cpy} are unlikely to be standardised
as they aren't supported by many libraries, and don't fit with any of
the current string handling functions, except strspn() and strcspn(),
which are also not heavily advertised or used, as their names are
confusing, nondescriptive, unobvious, and they too return counts
instead of pointers.

In any case, I prefer to check my string lengths before I do anything
with strcpy(); strncpy()'s handy for clearing previous junk from
buffers; never need strcat(), as I like to know where I am, and going
to be in a buffer. Avoids buffer overflows, heap and stack overwrites,
SIGSEGV, ACCVIO, and other unpleasantries, don't ya know.

I'm not sure why you've revived this long-dormant thread.
As far as factual matters go, I recall that the upshot of
the discussion was that:

1. strcpy/strcat have not been formally deprecated in the
C standard

2. strlcpy/strlcat have not been incorporated into the
C standard

As far as matters of preference go, if you find strncpy/ncat
adequate for your needs, that's great. I agree that strlen()
alone is sufficient to avoid overflows with strcpy/cat, as
long as you are careful.

I find that code using strlcpy/lcat is cleaner and less
susceptible to error, but YMMV.

Oh, and as for strspn, strcspn - I find these are very useful
functions, but it's another example of the power of idiom in
programming. It took me a long time to notice that I could
this, for example:

if (strspn(argv, "-0123456789") != strlen(argv))
fprintf(stderr, "The argument \"%s\" must be a decimal integer.\n", argv);

-SEan
 
C

CBFalconer

Brian said:
.... snip ...

The BSD string functions strl{cat,cpy} are unlikely to be
standardised as they aren't supported by many libraries, and don't
fit with any of the current string handling functions, except
strspn() and strcspn(), which are also not heavily advertised or
used, as their names are confusing, nondescriptive, unobvious, and
they too return counts instead of pointers.

Which (return values) is precisely why they should be used in
preference to the already available standard routines. They
greatly reduce the prevalence of errors. They are also easily
implemented with purely standard C and no library dependencies, so
there is no reason to eschew them. For one implementation (put in
public domain) see:

<http://cbfalconer.home.att.net/download/strlcpy.zip>

For justifications for their use, etc, see:

<http://www.courtesan.com/todd/papers/strlcpy.html>

If your system has these routines, congratulations. If not, feel
free to mount mine. In either case using them will reduce your
error rate.
 
K

Keith Thompson

Sean Burke said:
Oh, and as for strspn, strcspn - I find these are very useful
functions, but it's another example of the power of idiom in
programming. It took me a long time to notice that I could
this, for example:

if (strspn(argv, "-0123456789") != strlen(argv))
fprintf(stderr,
"The argument \"%s\" must be a decimal integer.\n",
argv);


[code reformatted to fit screen]

That's fine if you consider "3-2", "---", and "" to be decimal integers.
 
D

Derk Gwen

# > if (strspn(argv, "-0123456789") != strlen(argv))
# > fprintf(stderr,
# > "The argument \"%s\" must be a decimal integer.\n",
# > argv);
#
# [code reformatted to fit screen]
#
# That's fine if you consider "3-2", "---", and "" to be decimal integers.

if (strtol(p,&q,0), p==q) not an integer
else if (*q) integer followed by something else
else an integer
 
S

Sean Burke

Derk Gwen said:
# > if (strspn(argv, "-0123456789") != strlen(argv))
# > fprintf(stderr,
# > "The argument \"%s\" must be a decimal integer.\n",
# > argv);
#
# [code reformatted to fit screen]
#
# That's fine if you consider "3-2", "---", and "" to be decimal integers.

if (strtol(p,&q,0), p==q) not an integer
else if (*q) integer followed by something else
else an integer


Anyone who knows how to use strtol doesn't need advice from me. :)
But how many people just pass the argument to atoi() and let any
problems manifest themselves further along?

I think the method I illustrated has the virtue of being a
"pretty good" filter, while also being much more widely
applicable than strtol().

Suppose I want to accept an IP address, but I want to ensure
that it's not a domain name:

if (strspn(argv, ".0123456789") != strlen(argv))
fprintf(stderr, "Arg \"%s\" must be an numeric IP address.\n", argv);

Sure, "...0" would pass this filter, but the chief goal is
to reject "foo.bar.com". Yes, I could use inet_addr() instead,
but what does inet_addr() do with "9.37.190.192.in-addr.arpa"?

Certainly, regcomp/regexec provide a filtering method that can
be both precise and very general. But strspn/cspn() provide a
useful balance of precision and convenience.

-SEan
 
S

Sean Burke

Derk Gwen said:
# > if (strspn(argv, "-0123456789") != strlen(argv))
# > fprintf(stderr,
# > "The argument \"%s\" must be a decimal integer.\n",
# > argv);
#
# [code reformatted to fit screen]
#
# That's fine if you consider "3-2", "---", and "" to be decimal integers.

if (strtol(p,&q,0), p==q) not an integer
else if (*q) integer followed by something else
else an integer


Anyone who knows how to use strtol doesn't need advice from me. :)
But how many people just pass the argument to atoi() and let any
problems manifest themselves further along?

I think the method I illustrated has the virtue of being a
"pretty good" filter, while also being much more widely
applicable than strtol().

Suppose I want to accept an IP address, but I want to ensure
that it's not a domain name:

if (strspn(argv, ".0123456789") != strlen(argv))
fprintf(stderr, "Arg \"%s\" must be an numeric IP address.\n", argv);

Sure, "...0" would pass this filter, but the chief goal is
to reject "foo.bar.com". Yes, I could use inet_addr() instead,
but what does inet_addr() do with "9.37.190.192.in-addr.arpa"?

Certainly, regcomp/regexec provide a filtering method that can
be both precise and very general. But strspn/cspn() provide a
useful balance of precision and convenience.

-SEan
 
V

Valentin Nechayev

Sean Burke wrote:

SB> Sure, "...0" would pass this filter, but the chief goal is
SB> to reject "foo.bar.com". Yes, I could use inet_addr() instead,
SB> but what does inet_addr() do with "9.37.190.192.in-addr.arpa"?

$ host 127.0.0.1
1.0.0.127.IN-ADDR.ARPA domain name pointer localhost
$ ping 1.0.0.127.in-addr.arpa
ping: cannot resolve 1.0.0.127.in-addr.arpa: No address associated with name

Feel the difference between asking A and PTR ;)


-netch-
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,142
Messages
2,570,818
Members
47,362
Latest member
eitamoro

Latest Threads

Top