How is strlen implemented?

R

roy

Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?
Thanks a lot
Roy
 
C

Chris McDonald

roy said:
I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?

Without the null-byte terminator, it's not a string!
strlen() can then do whatever it wants.
 
C

Chris Torek

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?

Q: What if a tree growing in a forest is made of plastic?
A: Then it is not a tree, or at least, it is not growing.

If something someone else is calling a "string" does not have the
'\0' terminator, it is not a string, or at least, not a C string.
In C, the word "string" means "data structure consisting of zero
or more characters, followed by a '\0' terminator". No terminator,
no string.

Since strlen() requires a string, it may assume it gets one.

There are functions that work on "non-stringy arrays"; in particular,
the mem* functions -- memcpy(), memmove(), memcmp(), memset(),
memchr() -- but they take more than one argument. If you have an
array that always contains exactly 40 characters, and it is possible
that none of them is '\0' but you want to find out whether there
is a '\0' in those 40 characters, you can use memchr():

char *p = memchr(my_array, '\0', 40);

memchr() stops when it finds the first '\0' or has used up the
count, whichever occurs first. (It then returns a pointer to the
found character, or NULL if the count ran out.) The strlen()
function has an effect much like memchr() with an "infinite" count,
except that because the count is "infinite", it "always" finds the
'\0':

size_t much_like_strlen(const char *p) {
const char *q = memchr(p, '\0', INFINITY);
return q - p;
}

except of course C does not really have a way to express "infinity"
here. (You can approximate it with (size_t)-1, though.)
 
R

roy

Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.
I am just not sure whether I am just lucky or sth else happened inside
strlen.
 
J

Jason

roy said:
Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?
Thanks a lot
Roy

strlen will read from the char* until it finds a '\0' char. If your
string does not use the '\0' as a terminator, then you should avoid
most of the <string.h> functions.

-Jason
 
C

Chris McDonald

roy said:
Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.
I am just not sure whether I am just lucky or sth else happened inside
strlen.

You were just lucky.
 
M

Martin Ambuhl

roy said:
Hi,

I was wondering how strlen is implemented.

It could be implemented in several ways. The obvious one is to count
characters until a '\0' is encountered.
What if the input string doesn't have a null terminator, namely the
'\0'?

Then it isn't a string, which has such a terminator by definition.
 
K

Keith Thompson

roy said:
Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.
I am just not sure whether I am just lucky or sth else happened inside
strlen.

It's helpful to provide some context when you post a followup. I
happen to have read the previous articles just before I read this one,
but I could as easily have seen your followup first.

If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.

As for your question, strlen()'s argument isn't a char array, it's a
pointer to a char. Normally the pointer should point to the first
element of a "string" (i.e., a sequence of characters marked by a '\0'
terminator). strlen() has doesn't know how many characters are
actually in the array. By calling strlen(), you're promising that
there's a '\0' terminator somewhere within the array; if you break
that promise, there's no telling what will happen.

A typical implementation of strlen() will simply traverse the elements
of what it assumes to be your array until it finds a '\0' character.
If it doesn't find a '\0' character within the array, it has no way of
knowing it should stop searching, so it will just continue until it
finds a '\0'. As soon as it passes the end of the array, it invokes
undefined behavior. It might happen to find a '\0' character (which
is what happened in your case). Or it might run past the memory owned
by your program and trigger a segmentation fault or something similar.
Or, as far as the C standard is concerned, it might make demons fly
out your nose.

So don't do that.
 
J

Joe Wright

Jason said:
strlen will read from the char* until it finds a '\0' char. If your
string does not use the '\0' as a terminator, then you should avoid
most of the <string.h> functions.

-Jason

More precisely, if your char array does not have a 0 terminator, it is
not a string.
 
R

Richard Tobin

roy said:
Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.

Bear in mind that a char array usually *does* have a null terminator.

If it doesn't, it's quite likely to be followed in by memory by a zero
byte, which is the representation of nul on almost all systems, so it
will often work by luck.

Debugging systems often have an option to initialize variables to
non-zero values, precisely to stop this kind of "luck" from obscuring
real errors. Some readers will remember the many bugs that were
revealed when dynamic linking was added to SunOS, causing
uninitialized variables in main() to no longer be zero.

-- Richard
 
G

Gregory Pietsch

There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

/* Gregory Pietsch */
 
J

Joe Estock

Gregory said:
There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

/* Gregory Pietsch */
Interesting seeing \0 so widely in use. On most systems, NULL is defined
as \0, however there are a few special cases where it is not. Shouldn't
we be using NULL instead of \0?

Joe Estock
 
J

Joe Wright

Joe said:
Interesting seeing \0 so widely in use. On most systems, NULL is defined
as \0, however there are a few special cases where it is not. Shouldn't
we be using NULL instead of \0?

Joe Estock

No Joe, NULL is the 'null pointer constant' while '\0' is a constant
character (with int type) and value zero. This is often called the null
character or the NUL character. Never NULL character.
 
M

Minti

Chris said:
Q: What if a tree growing in a forest is made of plastic?
A: Then it is not a tree, or at least, it is not growing.

If something someone else is calling a "string" does not have the
'\0' terminator, it is not a string, or at least, not a C string.
In C, the word "string" means "data structure consisting of zero
or more characters, followed by a '\0' terminator". No terminator,
no string.

Since strlen() requires a string, it may assume it gets one.

There are functions that work on "non-stringy arrays"; in particular,
the mem* functions -- memcpy(), memmove(), memcmp(), memset(),
memchr() -- but they take more than one argument. If you have an
array that always contains exactly 40 characters, and it is possible
that none of them is '\0' but you want to find out whether there
is a '\0' in those 40 characters, you can use memchr():

char *p = memchr(my_array, '\0', 40);

memchr() stops when it finds the first '\0' or has used up the
count, whichever occurs first. (It then returns a pointer to the
found character, or NULL if the count ran out.) The strlen()
function has an effect much like memchr() with an "infinite" count,
except that because the count is "infinite", it "always" finds the
'\0':

size_t much_like_strlen(const char *p) {
const char *q = memchr(p, '\0', INFINITY);
return q - p;
}

except of course C does not really have a way to express "infinity"
here. (You can approximate it with (size_t)-1, though.)

Pardon me Chris, but I really don't get the drift of what you are
trying to convey. These strings are also "stringy", I don't see how
these are "non-stringy".

IOW you are assuming that these "non-stringy" arrays are also supposed
to end with a null character. "Stringy" I say.
 
C

Chris Torek

Pardon me Chris, but I really don't get the drift of what you are
trying to convey. These strings are also "stringy", I don't see how
these are "non-stringy".

If there is no '\0' byte in all 40 characters, it is not a string.
If there is a '\0' byte somewhere within those 40 characters, it
*is* a string -- and any characters after the first such '\0' are
not part of the string (but remain part of the array).
IOW you are assuming that these "non-stringy" arrays are also supposed
to end with a null character. "Stringy" I say.

In other words, I am saying that these arrays do not contain strings
if and only if they do not contain a '\0'. Note that strncpy()
sometimes makes such arrays (which is one reason some people invented
strlcpy()).

If I may draw an analogy: in mathematics, a statement is false if
there is even a single counterexample. Hence "x * (1/x) = 1" is
a false statement mathematically, because it does not hold for x=0.
(But note that if we limit it, "x * (1/x) = 1 provided x \ne 0",
the statement becomes true for x \elem real, while it remains false
for x \elem integer, and so on.) (Note that details like "x is a
real number" also matter in computing, where float and double do
not really give us "real numbers", but rather approximations.)
 
K

Keith Thompson

Gregory Pietsch said:
There has to be a null terminator somewhere.

To clarify: This doesn't mean that there's a guarantee that there will
be a null terminator somewhere. It means that if there isn't a null
terminator anyway, you must not call strlen(). The burden is on the
caller.

(I briefly read your statement the other way.)
 
M

Mark McIntyre

Thanks. Maybe my question should be "what if the input is a char array
without a null terminator".

your question was already answered. However, a quote from hte ISO
Standard may help:

7.21.6.3 The strlen function

3. The strlen function returns the number of characters that precede
the terminating null character.

Clearly if there's no terminating null, this function can't return
anything meaningful. It may in fact not return at all, and its not
uncommon for it to return absurd numbers such as 5678905 or -456

But from my experimental results, it seems
that strlen can still return the number of characters of a char array.

How can it do that? Its /required/ to search for the terminating null.
Your compiler is either not standard compilant, or its exhibiting
random behaviour.
I am just not sure whether I am just lucky or sth else happened inside
strlen.

lucky
 
K

Keith Thompson

Mark McIntyre said:
On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"


How can it do that? Its /required/ to search for the terminating null.
Your compiler is either not standard compilant, or its exhibiting
random behaviour.

strlen() is almost certainly finding a zero byte immediately after his
array. I'd expect that to be a very common manifestation of the
undefined behavior in this case.

No, if he'd been lucky it would have crashed the program (with a
meaningful diagnostic) rather than quietly returning a meaningless
result.
 
S

Stan Milam

roy said:
Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?
Thanks a lot
Roy

I found some C functions coded in assembler for the 8086 way back when.

;
; -------------------------------------------------------
; int strlen(s)
; char *s;
; Purpose: Returns the length of the string, not
; including the NULL character
; -------------------------------------------------------
;
ifndef pca
include macro2.asm
include libdef.asm
endif
;
idt strlen
def strlen
strlen: qenter bx,di
mov di,parm1[bx]
; cmp di,zero
; jz null
mov ax,ds
mov es,ax
mov cx,-1
xor al,al
cld
repnz scasb
not cx
dec cx
mov ax,cx
exitf
;null xor ax,ax
; exitf
modend strlen

I guess it's C equivelent is:

unsigned
strlen( char *string )
{
unsigned rv = -1;

while( *string ) rv--, *string++;

rv = (-rv) - 1;
return rv;
}

of course I'd just write it like this:

size_t
strlen( char *string )
{
size_t rv = 0;
while ( *string++ ) rv++;
return rv;
}
 
S

Stan Milam

Keith said:
Mark McIntyre said:
On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
[...]
But from my experimental results, it seems
that strlen can still return the number of characters of a char array.

How can it do that? Its /required/ to search for the terminating null.
Your compiler is either not standard compilant, or its exhibiting
random behaviour.


strlen() is almost certainly finding a zero byte immediately after his
array. I'd expect that to be a very common manifestation of the
undefined behavior in this case.



No, if he'd been lucky it would have crashed the program (with a
meaningful diagnostic) rather than quietly returning a meaningless
result.

So, you are saying this is a poorly implemented compiler?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,901
Members
47,439
Latest member
elif2sghost

Latest Threads

Top