How is strlen implemented?

roy · Apr 23, 2005

Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?
Thanks a lot
Roy

Chris McDonald · Apr 23, 2005

roy said:
Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?

Without the null-byte terminator, it's not a string!
strlen() can then do whatever it wants.

Chris Torek · Apr 23, 2005

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?

Q: What if a tree growing in a forest is made of plastic?
A: Then it is not a tree, or at least, it is not growing.

If something someone else is calling a "string" does not have the
'\0' terminator, it is not a string, or at least, not a C string.
In C, the word "string" means "data structure consisting of zero
or more characters, followed by a '\0' terminator". No terminator,
no string.

Since strlen() requires a string, it may assume it gets one.

There are functions that work on "non-stringy arrays"; in particular,
the mem* functions -- memcpy(), memmove(), memcmp(), memset(),
memchr() -- but they take more than one argument. If you have an
array that always contains exactly 40 characters, and it is possible
that none of them is '\0' but you want to find out whether there
is a '\0' in those 40 characters, you can use memchr():

char *p = memchr(my_array, '\0', 40);

memchr() stops when it finds the first '\0' or has used up the
count, whichever occurs first. (It then returns a pointer to the
found character, or NULL if the count ran out.) The strlen()
function has an effect much like memchr() with an "infinite" count,
except that because the count is "infinite", it "always" finds the
'\0':

size_t much_like_strlen(const char *p) {
const char *q = memchr(p, '\0', INFINITY);
return q - p;
}

except of course C does not really have a way to express "infinity"
here. (You can approximate it with (size_t)-1, though.)

roy · Apr 23, 2005

Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.
I am just not sure whether I am just lucky or sth else happened inside
strlen.

Jason · Apr 23, 2005

roy said:
Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?
Thanks a lot
Roy

strlen will read from the char* until it finds a '\0' char. If your
string does not use the '\0' as a terminator, then you should avoid
most of the <string.h> functions.

-Jason

Chris McDonald · Apr 23, 2005

roy said:
Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.
I am just not sure whether I am just lucky or sth else happened inside
strlen.

You were just lucky.

Martin Ambuhl · Apr 23, 2005

roy said:
Hi,

I was wondering how strlen is implemented.

It could be implemented in several ways. The obvious one is to count
characters until a '\0' is encountered.

What if the input string doesn't have a null terminator, namely the
'\0'?

Then it isn't a string, which has such a terminator by definition.

Keith Thompson · Apr 23, 2005

roy said:
Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.
I am just not sure whether I am just lucky or sth else happened inside
strlen.

It's helpful to provide some context when you post a followup. I
happen to have read the previous articles just before I read this one,
but I could as easily have seen your followup first.

If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.

As for your question, strlen()'s argument isn't a char array, it's a
pointer to a char. Normally the pointer should point to the first
element of a "string" (i.e., a sequence of characters marked by a '\0'
terminator). strlen() has doesn't know how many characters are
actually in the array. By calling strlen(), you're promising that
there's a '\0' terminator somewhere within the array; if you break
that promise, there's no telling what will happen.

A typical implementation of strlen() will simply traverse the elements
of what it assumes to be your array until it finds a '\0' character.
If it doesn't find a '\0' character within the array, it has no way of
knowing it should stop searching, so it will just continue until it
finds a '\0'. As soon as it passes the end of the array, it invokes
undefined behavior. It might happen to find a '\0' character (which
is what happened in your case). Or it might run past the memory owned
by your program and trigger a segmentation fault or something similar.
Or, as far as the C standard is concerned, it might make demons fly
out your nose.

So don't do that.

Joe Wright · Apr 23, 2005

Jason said:
strlen will read from the char* until it finds a '\0' char. If your
string does not use the '\0' as a terminator, then you should avoid
most of the <string.h> functions.

-Jason

More precisely, if your char array does not have a 0 terminator, it is
not a string.

Richard Tobin · Apr 23, 2005

roy said:
Thanks. Maybe my question should be "what if the input is a char array
without a null terminator". But from my experimental results, it seems
that strlen can still return the number of characters of a char array.

Bear in mind that a char array usually *does* have a null terminator.

If it doesn't, it's quite likely to be followed in by memory by a zero
byte, which is the representation of nul on almost all systems, so it
will often work by luck.

Debugging systems often have an option to initialize variables to
non-zero values, precisely to stop this kind of "luck" from obscuring
real errors. Some readers will remember the many bugs that were
revealed when dynamic linking was added to SunOS, causing
uninitialized variables in main() to no longer be zero.

-- Richard

Gregory Pietsch · Apr 23, 2005

There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

/* Gregory Pietsch */

Joe Estock · Apr 23, 2005

Gregory said:
There has to be a null terminator somewhere.

Here's a short implementation:

#include <string.h>
size_t (strlen)(char *s)
{
char *p = s;

while (*p != '\0')
p++;
return (size_t)(p - s);
}

/* Gregory Pietsch */

Interesting seeing \0 so widely in use. On most systems, NULL is defined
as \0, however there are a few special cases where it is not. Shouldn't
we be using NULL instead of \0?

Joe Estock

Joe Wright · Apr 23, 2005

Joe said:
Interesting seeing \0 so widely in use. On most systems, NULL is defined
as \0, however there are a few special cases where it is not. Shouldn't
we be using NULL instead of \0?

Joe Estock

No Joe, NULL is the 'null pointer constant' while '\0' is a constant
character (with int type) and value zero. This is often called the null
character or the NUL character. Never NULL character.

Minti · Apr 23, 2005

Chris said:
Q: What if a tree growing in a forest is made of plastic?
A: Then it is not a tree, or at least, it is not growing.

If something someone else is calling a "string" does not have the
'\0' terminator, it is not a string, or at least, not a C string.
In C, the word "string" means "data structure consisting of zero
or more characters, followed by a '\0' terminator". No terminator,
no string.

Since strlen() requires a string, it may assume it gets one.

There are functions that work on "non-stringy arrays"; in particular,
the mem* functions -- memcpy(), memmove(), memcmp(), memset(),
memchr() -- but they take more than one argument. If you have an
array that always contains exactly 40 characters, and it is possible
that none of them is '\0' but you want to find out whether there
is a '\0' in those 40 characters, you can use memchr():

char *p = memchr(my_array, '\0', 40);

memchr() stops when it finds the first '\0' or has used up the
count, whichever occurs first. (It then returns a pointer to the
found character, or NULL if the count ran out.) The strlen()
function has an effect much like memchr() with an "infinite" count,
except that because the count is "infinite", it "always" finds the
'\0':

size_t much_like_strlen(const char *p) {
const char *q = memchr(p, '\0', INFINITY);
return q - p;
}

except of course C does not really have a way to express "infinity"
here. (You can approximate it with (size_t)-1, though.)

Pardon me Chris, but I really don't get the drift of what you are
trying to convey. These strings are also "stringy", I don't see how
these are "non-stringy".

IOW you are assuming that these "non-stringy" arrays are also supposed
to end with a null character. "Stringy" I say.

Chris Torek · Apr 23, 2005

Pardon me Chris, but I really don't get the drift of what you are
trying to convey. These strings are also "stringy", I don't see how
these are "non-stringy".

If there is no '\0' byte in all 40 characters, it is not a string.
If there is a '\0' byte somewhere within those 40 characters, it
*is* a string -- and any characters after the first such '\0' are
not part of the string (but remain part of the array).

IOW you are assuming that these "non-stringy" arrays are also supposed
to end with a null character. "Stringy" I say.

In other words, I am saying that these arrays do not contain strings
if and only if they do not contain a '\0'. Note that strncpy()
sometimes makes such arrays (which is one reason some people invented
strlcpy()).

If I may draw an analogy: in mathematics, a statement is false if
there is even a single counterexample. Hence "x * (1/x) = 1" is
a false statement mathematically, because it does not hold for x=0.
(But note that if we limit it, "x * (1/x) = 1 provided x \ne 0",
the statement becomes true for x \elem real, while it remains false
for x \elem integer, and so on.) (Note that details like "x is a
real number" also matter in computing, where float and double do
not really give us "real numbers", but rather approximations.)

Keith Thompson · Apr 23, 2005

Gregory Pietsch said:
There has to be a null terminator somewhere.

To clarify: This doesn't mean that there's a guarantee that there will
be a null terminator somewhere. It means that if there isn't a null
terminator anyway, you must not call strlen(). The burden is on the
caller.

(I briefly read your statement the other way.)

Mark McIntyre · Apr 24, 2005

Thanks. Maybe my question should be "what if the input is a char array
without a null terminator".

your question was already answered. However, a quote from hte ISO
Standard may help:

7.21.6.3 The strlen function

3. The strlen function returns the number of characters that precede
the terminating null character.

Clearly if there's no terminating null, this function can't return
anything meaningful. It may in fact not return at all, and its not
uncommon for it to return absurd numbers such as 5678905 or -456

But from my experimental results, it seems
that strlen can still return the number of characters of a char array.

How can it do that? Its /required/ to search for the terminating null.
Your compiler is either not standard compilant, or its exhibiting
random behaviour.

I am just not sure whether I am just lucky or sth else happened inside
strlen.

lucky

Keith Thompson · Apr 24, 2005

Mark McIntyre said:
On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"

How can it do that? Its /required/ to search for the terminating null.
Your compiler is either not standard compilant, or its exhibiting
random behaviour.

strlen() is almost certainly finding a zero byte immediately after his
array. I'd expect that to be a very common manifestation of the
undefined behavior in this case.

lucky

No, if he'd been lucky it would have crashed the program (with a
meaningful diagnostic) rather than quietly returning a meaningless
result.

Stan Milam · Apr 24, 2005

roy said:
Hi,

I was wondering how strlen is implemented.
What if the input string doesn't have a null terminator, namely the
'\0'?
Thanks a lot
Roy

I found some C functions coded in assembler for the 8086 way back when.

;
; -------------------------------------------------------
; int strlen(s)
; char *s;
; Purpose: Returns the length of the string, not
; including the NULL character
; -------------------------------------------------------
;
ifndef pca
include macro2.asm
include libdef.asm
endif
;
idt strlen
def strlen
strlen: qenter bx,di
mov di,parm1[bx]
; cmp di,zero
; jz null
mov ax,ds
mov es,ax
mov cx,-1
xor al,al
cld
repnz scasb
not cx
dec cx
mov ax,cx
exitf
;null xor ax,ax
; exitf
modend strlen

I guess it's C equivelent is:

unsigned
strlen( char *string )
{
unsigned rv = -1;

while( *string ) rv--, *string++;

rv = (-rv) - 1;
return rv;
}

of course I'd just write it like this:

size_t
strlen( char *string )
{
size_t rv = 0;
while ( *string++ ) rv++;
return rv;
}

Stan Milam · Apr 24, 2005

Keith said:
Mark McIntyre said:

On 22 Apr 2005 20:59:49 -0700, in comp.lang.c , "roy"
[...]

But from my experimental results, it seems
that strlen can still return the number of characters of a char array.

Click to expand...

How can it do that? Its /required/ to search for the terminating null.
Your compiler is either not standard compilant, or its exhibiting
random behaviour.

Click to expand...

strlen() is almost certainly finding a zero byte immediately after his
array. I'd expect that to be a very common manifestation of the
undefined behavior in this case.

lucky

Click to expand...

No, if he'd been lucky it would have crashed the program (with a
meaningful diagnostic) rather than quietly returning a meaningless
result.

So, you are saying this is a poorly implemented compiler?

If(strcmp(str, "") == 0) - What does this line of code mean?	0	Aug 8, 2022
WIN32 - Update Text in a Window in order to show its size in Pixels and coordinates	0	Oct 4, 2023
array-size/malloc limit and strlen() failure	26	Apr 2, 2014
C pipe	1	Dec 9, 2021
For loop and strlen	8	Sep 20, 2009
How to accept text and put each letter into a 2d matrix?	0	Jun 3, 2022
STRING - Remove small letters from string	1	Jan 20, 2023
Single put routine overlapping words during iteration	4	Jan 2, 2023

How is strlen implemented?

roy

Chris McDonald

Chris Torek

roy

Jason

Chris McDonald

Martin Ambuhl

Keith Thompson

Joe Wright

Richard Tobin

Gregory Pietsch

Joe Estock

Joe Wright

Minti

Chris Torek

Keith Thompson

Mark McIntyre

Keith Thompson

Stan Milam

Stan Milam

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads