Keith said:
John Kelly said:
There is only one argument. A pointer. It may point to a valid string,
or it may point to whatever garbage is in some region of memory.
I haven't looked at the C library code to see if that's true or not.
But I hate to think it is.
You snipped the part where I explained that invalid arguments cause
undefined behavior. It won't *necessarily* show up as an infinite loop;
that's just one of many possibilities. You don't need to look at the
code that implements strlen() (on a particular system) to understand
this.
Consider this program:
#include <string.h>
#include <stdio.h>
int main(void) {
char not_a_string[5] = "hello";
size_t len = strlen(not_a_string);
printf("len = %zu\n", len);
return 0;
}
How can the implementation of strlen() detect that there's a problem
and avoid undefined behavior?
On a computer there's no such thing as an object of infinite size. Your
address bits define a limit. Once you reach the limit, stop looking for
end of string. It's not there.
First off, there is no portable way to tell when you've run out of
address bits. The standard says very little about how addresses are
represented.
We do have the pigeonhole principle though, I think. 'CHAR_BIT * sizeof
(void *)' should give a reasonable maximum number of bits, shouldn't it?
If any pointer can be converted to a 'void *', assuming there isn't
any meta-data outside of the pointer's object representation, and
assuming an implementation must not allow two 'void *' values to
represent two separate objects, it seems like a fair upper bound.
But ok, given that an address is 32 bits, you could stop looking after
2**32 bytes. But that will still take you beyond the bounds of the
object you're examining.
I would be very hopeful that an implementation that actually checks
bounds also offers a documented means for the programmer to access those
bounds. If an implementation does not satisfy this hope, that would be
unfortunate.
Look again at the strlen() example above. Suppose the array is
followed in the machine's address space by a chunk of memory that
your process doesn't own. How can strlen() or trim() detect this
and avoid blowing up?
Let alone anyone. If there's no bounds information anywhere, what
actually determines the bounds? Intention?
If an implementation or
the environment has traps for such things, the information exists
somewhere. That somewhere might or might not be accessible to the
programmer, and even if accessible, might be a lot of work.
If you have a pointer to the beginning of a 100-byte array with a
'\0' in the last position, you must scan for 100 bytes. If you have
a pointer to a 10-byte array, not containing any '\0' characters,
immediately followed in the address space by memory not owned by
your process, you must not scan for more than 10 bytes. You cannot
tell the difference in any portable manner, and you very likely
cannot tell the difference even in some non-portable manner.
Well is John making a string-trimming function or a 'char[]'-trimming
function? For the latter, passing a count or a size in bytes might be a
good idea.
I want to prevent infinite loops and report errors. I'm not sure what
you want.
I want to explain to you how this stuff is actually defined.
[...]
Note that some languages treat arrays and/or strings as first-class
objects whose values carry their bounds with them. In such languages,
you can avoid these problems. For example, in Ada passing an array
parameter implicitly passes the array's bounds, which can be retrieved
from the parameter; in Perl, strings are scalar objects. In C, you
just have to be careful.
And if a pointer value includes bounds information, perhaps the
implementation would be kind enough to document either the
representation or how to access the information. Maybe not.
For full portability but great effort, John could force users of the
library to only pass references to objects which themselves were created
by other of John's functions. One could provide macros instead of
declarations, or use allocated storage exclusively, or include signature
checks to ensure the programmer used the provided functions to create
their objects. Seems complex, though.