How is strlen implemented?

C

CBFalconer

pete said:
Gregory said:
There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

The ptrdiff_t type of (p - s) disqualifies this code
from being an example of portable C code.

If the following description of undefined behavior doesn't
apply to your code, then it doesn't apply to anything.

N869
6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.

Huh? size_t and ptrdiff_t are both integral types, the first being
unsigned, and the second signed. The code above ensures that the
prtdiff_t value is not negative. I fail to see anything undefined
if we ignore the fact that strlen can only be defined in the
implementation.
 
P

pete

CBFalconer said:
Gregory said:
There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

The ptrdiff_t type of (p - s) disqualifies this code
from being an example of portable C code.

If the following description of undefined behavior doesn't
apply to your code, then it doesn't apply to anything.

N869
6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.

Huh?

A string longer than PTRDIFF_MAX breaks the code.

It's supposed to be an example of a standard library function
written in portable C code, right?
 
J

Joe Wright

pete said:
CBFalconer said:
pete said:
Gregory Pietsch wrote:

There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

The ptrdiff_t type of (p - s) disqualifies this code
from being an example of portable C code.

If the following description of undefined behavior doesn't
apply to your code, then it doesn't apply to anything.

N869
6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.

Huh?


A string longer than PTRDIFF_MAX breaks the code.

It's supposed to be an example of a standard library function
written in portable C code, right?

Assuming ptrdiff_t is long and 32 bits on a 32-bit machine, a string of
2,147,483,648 bytes will probably break lots of things before you ever
get to run strlen() on it.

Show us a case where (p - s) can be out-of-bounds.
 
P

pete

Joe said:
CBFalconer said:
pete wrote:

Gregory Pietsch wrote:

There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

The ptrdiff_t type of (p - s) disqualifies this code
from being an example of portable C code.

If the following description of undefined behavior doesn't
apply to your code, then it doesn't apply to anything.

N869
6.5.6 Additive operators
[#9]
If the result is not representable in an object of that
type, the behavior is undefined.
Show us a case where (p - s) can be out-of-bounds.

What do you think that part of the standard means?
 
C

CBFalconer

pete said:
CBFalconer said:
pete said:
Gregory Pietsch wrote:

There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

The ptrdiff_t type of (p - s) disqualifies this code
from being an example of portable C code.

If the following description of undefined behavior doesn't
apply to your code, then it doesn't apply to anything.

N869
6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.

Huh?

A string longer than PTRDIFF_MAX breaks the code.

It's supposed to be an example of a standard library function
written in portable C code, right?

But the string exists, thus the ptrdiff_t value exists by
_definition_. The exit values of p and s are valid, point to the
same entity, so the difference exists.
 
R

Richard Tobin

6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.
But the string exists, thus the ptrdiff_t value exists by
_definition_.

I thought the point of quoting the above paragraph was to show that
there can be cases where the difference between two pointers in the
same array *doesn't* exist, in that attempting to calculate it may
produce undefined behaviour. If no array can exist that's bigger than
the biggest ptrdiff_t value, what's the point of the last sentence of
the paragraph?

-- Richard
 
S

Stan Milam

pete said:
No, it wasn't.
Your posts in the "C FAQ 3.1" thread show that you don't see
the beauty of the concept of undefined behavior.

If you're going to write bad code,
then the C standard committee doesn't care about
what happens as a consequence.

This philosophy was in C originally,
and is maintained in the current C99 standard.

It's not that R was in too much of a hurry specifying C,
so that he didn't have enough time
to also specify what garbage code should do,
but rather it's the case that compiler writers
are in too much of a hurry writing compilers
to want to care about how to translate garbage code.

Wrong thread, Pete. Some people have no sense of humor, especially the
denizens of this newsgroup.
 
P

pete

Richard said:
6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.
But the string exists, thus the ptrdiff_t value exists by
_definition_.

I thought the point of quoting the above paragraph was to show that
there can be cases where the difference between two pointers in the
same array *doesn't* exist, in that attempting to calculate it may
produce undefined behaviour. If no array can exist that's bigger than
the biggest ptrdiff_t value, what's the point of the last sentence of
the paragraph?

That's the point.
 
P

pete

CBFalconer said:
Gregory said:
There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

The ptrdiff_t type of (p - s) disqualifies this code
from being an example of portable C code.

If the following description of undefined behavior doesn't
apply to your code, then it doesn't apply to anything.

N869
6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.

Huh? size_t and ptrdiff_t are both integral types, the first being
unsigned, and the second signed. The code above ensures that the
prtdiff_t value is not negative.

It makes no difference
that the code ensures that the prtdiff_t value is not negative.

If a positive value is within the range of ptrdiff_t,
then the additive inverse of that value
must also be within the range of ptrdiff_t.
 
L

Lawrence Kirby

pete said:
CBFalconer said:
pete wrote:

Gregory Pietsch wrote:

There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

The ptrdiff_t type of (p - s) disqualifies this code
from being an example of portable C code.

If the following description of undefined behavior doesn't
apply to your code, then it doesn't apply to anything.

N869
6.5.6 Additive operators
[#9] When two pointers are subtracted, both shall point to
elements of the same array object, or one past the last
element of the array object; the result is the difference of
the subscripts of the two array elements. The size of the
result is implementation-defined, and its type (a signed
integer type) is ptrdiff_t defined in the <stddef.h> header.
If the result is not representable in an object of that
type, the behavior is undefined.

Huh?


A string longer than PTRDIFF_MAX breaks the code.

It's supposed to be an example of a standard library function
written in portable C code, right?

Assuming ptrdiff_t is long and 32 bits on a 32-bit machine, a string of
2,147,483,648 bytes will probably break lots of things before you ever
get to run strlen() on it.

Show us a case where (p - s) can be out-of-bounds.

Instead think of a 16 bit system where size_t and ptrdiff_t are 16 bits
wide. It would be permissible for a string to be up to 65534 characters
plus the null character on that implementation but anything above 32767
can cause problems for ptrdiff_t. 32768 is long for a string but not
beyond the bounds of possibility.

size_t must be able to represent the size of any object (although some
debate is possible for calloc()). However C provides no corresponding
guarantee that ptrdiff_t can represent the difference of any 2 pointers to
elements of the same array.

Lawrence
 
C

CBFalconer

Lawrence said:
. snip ...

Instead think of a 16 bit system where size_t and ptrdiff_t are
16 bits wide. It would be permissible for a string to be up to
65534 characters plus the null character on that implementation
but anything above 32767 can cause problems for ptrdiff_t. 32768
is long for a string but not beyond the bounds of possibility.

size_t must be able to represent the size of any object (although
some debate is possible for calloc()). However C provides no
corresponding guarantee that ptrdiff_t can represent the
difference of any 2 pointers to elements of the same array.

Alright, you have finally convinced me. So this means that strlen
actually has to be a system function.
 
K

Keith Thompson

CBFalconer said:
Alright, you have finally convinced me. So this means that strlen
actually has to be a system function.

No, it just means that the (p - s) method isn't strictly portable.

size_t strlen(const char *s)
{
size_t result = 0;
const char *p = s;
while (*p++ != '\0') {
result ++;
}
return result;
}
 
P

pete

Keith said:
No, it just means that the (p - s) method isn't strictly portable.

size_t strlen(const char *s)
{
size_t result = 0;
const char *p = s;
while (*p++ != '\0') {
result ++;
}
return result;
}

One should bear in mind that we all know where to find
real strlen when we need it, and that these posted strlen
defintions aren't meant to be competitive in terms of performance.

Writing standard library functions in C,
and merely getting it right, brings up various C topics.

Do you not like to increment the s parameter directly?
Some people don't like to change the values of parameters.
I prefer to change them whenever it's handy.

size_t strlen(const char *s)
{
size_t n;

for (n = 0; *s != '\0'; ++s) {
++n;
}
return n;
}
 
K

Keith Thompson

pete said:
Keith Thompson wrote: [...]
size_t strlen(const char *s)
{
size_t result = 0;
const char *p = s;
while (*p++ != '\0') {
result ++;
}
return result;
}

One should bear in mind that we all know where to find
real strlen when we need it, and that these posted strlen
defintions aren't meant to be competitive in terms of performance.

Of course.
Writing standard library functions in C,
and merely getting it right, brings up various C topics.

Sure. The issue (or at least *an* issue) is why certain functions are
included in the C standard library. In many cases it's just arbitrary
historical precedent; a C library designed from scratch would probably
look very different from what we have now. Some functions are in the
C library because they can't be implemented portably (and library
implementers are not constrained to write portable code); the
offsetof() macro is a good example, as are most of the functions in
<stdio.h>. Other functions are in the standard library just because
they're convenient. Many of them *can* be implemented perfectly
portably, but it's nice that not every program has to provide its own
strlen() function -- and in some cases the implementer can provide a
non-portable version with improved performance.
Do you not like to increment the s parameter directly?
Some people don't like to change the values of parameters.
I prefer to change them whenever it's handy.

size_t strlen(const char *s)
{
size_t n;

for (n = 0; *s != '\0'; ++s) {
++n;
}
return n;
}

Yes, that's a good solution (probably a little better than mine).
 
C

CBFalconer

pete said:
.... snip ...

One should bear in mind that we all know where to find real
strlen when we need it, and that these posted strlen defintions
aren't meant to be competitive in terms of performance.

Writing standard library functions in C,
and merely getting it right, brings up various C topics.

Do you not like to increment the s parameter directly?
Some people don't like to change the values of parameters.
I prefer to change them whenever it's handy.

size_t strlen(const char *s)
{
size_t n;

for (n = 0; *s != '\0'; ++s) {
++n;
}
return n;
}

Even so, I think I would prefer to write:

inline size_t strlen(const char *s)
{
size_t n;

for (n = 0; *s++;) ++n;
return n;
}
 
T

Tim Rentsch

pete said:
Keith said:
[how might strlen be implemented portably...]

size_t strlen(const char *s)
{
size_t result = 0;
const char *p = s;
while (*p++ != '\0') {
result ++;
}
return result;
}

One should bear in mind that we all know where to find
real strlen when we need it, and that these posted strlen
defintions aren't meant to be competitive in terms of performance.

Writing standard library functions in C,
and merely getting it right, brings up various C topics.

Do you not like to increment the s parameter directly?
Some people don't like to change the values of parameters.
I prefer to change them whenever it's handy.

size_t strlen(const char *s)
{
size_t n;

for (n = 0; *s != '\0'; ++s) {
++n;
}
return n;
}

Normally I prefer parameters to retain their original values and
introduce new variables instead. I'm willing to break the rule
but in the absence of a compelling reason I usually don't.

In this case though it doesn't have to come up. Sometimes the
most straightforward code is best:

size_t
strlen( const char *s ){
size_t n=0;

while( s[n] ) n++;
return n;
}

This implementation ran faster in my tests than any of the
pointer versions posted.
 
F

Flash Gordon

Tim Rentsch wrote:

In this case though it doesn't have to come up. Sometimes the
most straightforward code is best:

size_t
strlen( const char *s ){
size_t n=0;

while( s[n] ) n++;
return n;
}

This implementation ran faster in my tests than any of the
pointer versions posted.

Personally I would probably do the following, since there is
initialisation, condition and increment. Purely as a matter of taste,
not correctness.

size_t strlen( const char *s )
{
size_t n;
for (n=0; s[n]; n++)
continue;
return n;
}
 
A

Alberto =?iso-8859-1?Q?Gim=E9nez?=

El Sat, 30 Apr 2005 17:03:47 +0100, Flash Gordon escribió:
size_t strlen( const char *s )
{
size_t n;
for (n=0; s[n]; n++)
continue;
return n;
}

I wonder if that 'continue' is required. Wouldn't a simple null
statement ';' do the work?

for (n=0; s[n]; n++)
; /* continue */


Greetings.
 
K

Keith Thompson

Alberto Giménez said:
El Sat, 30 Apr 2005 17:03:47 +0100, Flash Gordon escribió:
size_t strlen( const char *s )
{
size_t n;
for (n=0; s[n]; n++)
continue;
return n;
}

I wonder if that 'continue' is required. Wouldn't a simple null
statement ';' do the work?

for (n=0; s[n]; n++)
; /* continue */

Yes, in this context, an empty statement ";" is exactly equivalent to
"continue;". The "continue" statement is just more explicit; a lone
";" is easy to miss.

I note that you felt it necessary to add a comment; as long as you're
doing that, why not use the "continue" keyword?

I would probably have written it as:

for (n=0; s[n]; n++) {
;
}
 
T

Tim Rentsch

Flash Gordon said:
Tim Rentsch wrote:

In this case though it doesn't have to come up. Sometimes the
most straightforward code is best:

size_t
strlen( const char *s ){
size_t n=0;

while( s[n] ) n++;
return n;
}

This implementation ran faster in my tests than any of the
pointer versions posted.

Personally I would probably do the following, since there is
initialisation, condition and increment. Purely as a matter of taste,
not correctness.

size_t strlen( const char *s )
{
size_t n;
for (n=0; s[n]; n++)
continue;
return n;
}

Normally I expect 'for' statements are used when iterating over known
quantities; also they usually "do" something with each element
iterated over. Of course these conditions needn't be true but most
often they are. So the for loop here seems a little off.

On the other hand, 'while' statements are often used to establish
postconditions. The code

n = 0;
while( s[n] ) n++;

clearly establishes the postcondition

s[n] == 0 && s[k] != 0 for 0 <= k < n

which is more or less the definition for 'n' being the length of the
string 's'. (Initializing 'n' on its declaration is just a convenient
shortening of an initializing expression.)

Certainly you're right that operationally the two functions are
equivalent. It just seems to be a little more mental effort to be
sure that the 'for' code is doing the right thing - it's less clear
or less obvious or perhaps both. For these reasons I tend to favor
the 'while' form here.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,901
Members
47,439
Latest member
elif2sghost

Latest Threads

Top