Question about the clc string lib

J

Jordan Abel

Jeff said:
Jordan said:
boa wrote:
Jordan Abel wrote:
In the function below, can size ever be 0 (zero)?
I've never heard of the "clc string lib" - where can i find it?
http://libclc.sf.net

char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);
No idea how this would work in a way that needs the function pointer as
its first argument. Is it a macro that stringizes its first argument to
print an error?

size = strlen(s) + 1;
The result of this assignment cannot be zero.

if (size == 0)
p = NULL;
hold on - i take that back. size can be 0 if strlen returns SIZE_MAX.
size can never be SIZE_MAX as size doesn't include '\0'. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX.

boa
One more try: strlen() can never return SIZE_MAX. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX, so the max value strlen() can return is SIZE_MAX - 1.

boa
take SIZE_MAX 65535

char * foo = calloc(256,256);
memset(foo,"x",SIZE_MAX);
strlen(foo) == SIZE_MAX;

I don't think there's an explicit requirement in the standard that there
never be an object smaller than SIZE_MAX. There may, however, be a
requirement that the product of the arguments to calloc be less than
SIZE_MAX. [that, i don't know.]

That's my point below. You can't allocate more than SIZE_MAX so the
strlen would be SIZE_MAX -1, so in the example above, size could not be
0 (zero). I may be missing something, but the check 'if (size == 0)'
seems pointless.

See my other post, the standard does not forbid Jordan's code as far as
I can see. So if the call to calloc succeeds you have *legally* created
an object larger than SIZE_MAX.

Yeah - it's not particularly _likely_ to succeed on any reasonable
implementation [relevant code from my implementation: if (size != 0 &&
SIZE_T_MAX / size < num) { errno = ENOMEM; return (NULL); }], but if it
does succeed, there's nothing wrong with the rest of the code, and
strlen()+1 will be 0. [with some work, it would be possible to make a
strdup that uses the same trickery to successfully duplicate such an
object]

Note that this strdup does have another problem with implementations
that allow an object larger than SIZE_MAX.

/* SIZE_MAX 65535 */
char *p = calloc(256,257)
memset(p,'a',256);
memset(p+256,'a',SIZE_MAX);
char *q = clc_strdup(p);

if everything succeeds (and strlen does what can reasonably be expected
- it's unclear, and thus probaby undefined, what passing a string longer
than SIZE_MAX to strlen actually will result in), q points to an array
of 256 chars, each set to 'a', and not null-terminated.

Attempting to catch a corner case like a string that is exactly SIZE_MAX
long seems pointless when it ignores the possibility of a string longer
than SIZE_MAX.
 
A

aegis

Jordan said:
take SIZE_MAX 65535

char * foo = calloc(256,256);
memset(foo,"x",SIZE_MAX);

I'm pretty sure this is undefined behavior.
It parallels, I think, with an issue brought up
months ago(in comp.std.c) that labeled the
following as undefined behavior.

int array[10][10];

array[0][10] = 10;

and that each array object is not guaranteed to
be adjacent to the other.
strlen(foo) == SIZE_MAX;

I don't think there's an explicit requirement in the standard that there
never be an object smaller than SIZE_MAX. There may, however, be a
requirement that the product of the arguments to calloc be less than
SIZE_MAX. [that, i don't know.]
 
J

Jordan Abel

I'm pretty sure this is undefined behavior. It parallels, I think,
with an issue brought up months ago(in comp.std.c) that labeled the
following as undefined behavior.

No it doesn't.
int array[10][10];

array[0][10] = 10;

and that each array object is not guaranteed to
be adjacent to the other.

Yes they are, but that's another discussion. Even if they weren't,
that's doesn't apply to calloc - nothing in the standard provides for
there to be any padding "gaps" in the memory returned by calloc.

How would that even work?
 
K

Keith Thompson

aegis said:
Jordan Abel wrote: [...]
take SIZE_MAX 65535

char * foo = calloc(256,256);
memset(foo,"x",SIZE_MAX);

I'm pretty sure this is undefined behavior.

How so? The calloc() call can either succeed or fail; I don't see any
permission for it to do anything else. The memset() call is ok (if
calloc() succeeded). Note that it doesn't set the entire object; it
leaves the last byte as '\0'.
It parallels, I think, with an issue brought up
months ago(in comp.std.c) that labeled the
following as undefined behavior.

int array[10][10];

array[0][10] = 10;

and that each array object is not guaranteed to
be adjacent to the other.

As I understand the argument, the array elements are guaranteed to be
adjacent; the assignment invokes undefined behavior because an
implementation could do explicit bounds checking, not because the
address might be invalid. I don't see the connection between this and
the calloc() issue.
 
C

CBFalconer

Jeff said:
In the function below, can size ever be 0 (zero)?

char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);

size = strlen(s) + 1;
if (size == 0)
p = NULL;
else if ((p = malloc(size)) != NULL)
memcpy(p, s, size);

return p;
}

Very poor code. Apart from the missing definition of clc_assert...
and CLC_RESTRICT strlen returns a size_t, which is unsigned. Thus
size can never be less than 1, and the "p = NULL" will never be
executed. Assuming CLC_RESTRICT has something to do with the
restrict qualifier, it is pointless because s is const. Better
code might be:

char *clc_strdup(const char *s) {
size_t size;
char *p;

if (!s) p = NULL;
else {
size = strlen(s) + 1;
if ((p = malloc(size)) != NULL) memcpy(p, s, size);
}
return p;
}

Some will object to the guard agains s == NULL.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
 
C

CBFalconer

boa said:
.... snip ...

size can never be SIZE_MAX as size doesn't include '\0'. A valid
string always has the '\0' and the max size of a buffer containing
a string is SIZE_MAX.

Fortunately, c.l.c contains one of the world's largest herds of free-
roaming pedants, thundering majestically across the virtual plains...

Goring all who stand in their path with their sharp horns, such as
the above claim that zero is a possible value. See, I can quote it
too :)

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
 
J

Jordan Abel

Very poor code. Apart from the missing definition of clc_assert...
and CLC_RESTRICT

in headers whose inclusion was not pasted. geez, next you'll complain
that there's no definition for main().
strlen returns a size_t, which is unsigned. Thus
size can never be less than 1, and the "p = NULL" will never be
executed. Assuming CLC_RESTRICT has something to do with the
restrict qualifier, it is pointless because s is const. Better
code might be:

char *clc_strdup(const char *s) {
size_t size;
char *p;

if (!s) p = NULL;
else {
size = strlen(s) + 1;
if ((p = malloc(size)) != NULL) memcpy(p, s, size);
}
return p;
}

Some will object to the guard agains s == NULL.

It's an assertion, not a guard. like assert(), it is only used in debug
mode, outside of debug mode it expands to ((void)0).
 
J

Jordan Abel

Goring all who stand in their path with their sharp horns, such as
the above claim that zero is a possible value. See, I can quote it
too :)

So what do you think of this?

char *p = calloc(SIZE_MAX,2), q;
if(!p) exit(EXIT_FAILURE);
memset(p,'a',SIZE_MAX);
q = strdup(p); /* oops */
 
C

CBFalconer

Jordan said:
.... snip ...

It's an assertion, not a guard. like assert(), it is only used in
debug mode, outside of debug mode it expands to ((void)0).

Not in my code. It stands guard and returns sane values whenever
possible. It's called robustness.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
 
F

Flash Gordon

Jordan Abel wrote:

Note that this strdup does have another problem with implementations
that allow an object larger than SIZE_MAX.

/* SIZE_MAX 65535 */
char *p = calloc(256,257)
memset(p,'a',256);
memset(p+256,'a',SIZE_MAX);
char *q = clc_strdup(p);

if everything succeeds (and strlen does what can reasonably be expected
- it's unclear, and thus probaby undefined, what passing a string longer
than SIZE_MAX to strlen actually will result in), q points to an array
of 256 chars, each set to 'a', and not null-terminated.

I believe passing a string with length longer than SIZE_MAX invokes
undefined behaviour. My reasoning being as follows:
The standard says strlen returns a result of type size_t
The standard says that strlen returns the number of characters before
the terminating null character of a string.
The standard does not say what strlen does if the result does not fit
in size_t
The standard says that in any instance that it does not define the
behaviour, the behaviour is undefined.
Attempting to catch a corner case like a string that is exactly SIZE_MAX
long seems pointless when it ignores the possibility of a string longer
than SIZE_MAX.

The string length being exactly SIZE_MAX is easy to catch, the string
length being longer is difficult to catch efficiently since you would
have to implement an error checking version of strlen as well.

Personally, when I wrote a strdup implementation (named ffstrdup) I
decided that anyone generating a string with length SIZE_MAX or longer
deserves to be shot, so I left it as undefined behaviour what the
library would do in such cases.
 
J

Jordan Abel

Jordan Abel wrote:



I believe passing a string with length longer than SIZE_MAX invokes
undefined behaviour. My reasoning being as follows:
The standard says strlen returns a result of type size_t
The standard says that strlen returns the number of characters before
the terminating null character of a string.
The standard does not say what strlen does if the result does not fit
in size_t
The standard says that in any instance that it does not define the
behaviour, the behaviour is undefined.

[size_t being an unsigned type and the fact that arithmetic involving
such types is always reduced modulo max+1 creates an argument that it is
not undefined]
The string length being exactly SIZE_MAX is easy to catch, the string
length being longer is difficult to catch efficiently since you would
have to implement an error checking version of strlen as well.

You could force the buffer to be null-terminated. or check for a null
terminator at the offset of what strlen has told you the length is.
(that won't tell you where it is, but it'll tell you where it's not)
Though, that's certainly not something you want to waste time doing
unless you're routinely passed huge strings.
Personally, when I wrote a strdup implementation (named ffstrdup) I
decided that anyone generating a string with length SIZE_MAX or longer
deserves to be shot, so I left it as undefined behaviour what the
library would do in such cases.

my implementation's strdup also relies on strlen, but that's beside the
point - my implementation also doesn't have a way to malloc an object
let alone a string larger than SIZE_MAX [though, i believe it is
possible to create one with mmap and a carefully-constructed file, i'm
not sure if such calls actually succeed]
 
F

Flash Gordon

Jordan said:
Jordan Abel wrote:


I believe passing a string with length longer than SIZE_MAX invokes
undefined behaviour. My reasoning being as follows:
The standard says strlen returns a result of type size_t
The standard says that strlen returns the number of characters before
the terminating null character of a string.
The standard does not say what strlen does if the result does not fit
in size_t
The standard says that in any instance that it does not define the
behaviour, the behaviour is undefined.

[size_t being an unsigned type and the fact that arithmetic involving
such types is always reduced modulo max+1 creates an argument that it is
not undefined]

I'm aware of the behaviour of arithmetic on unsigned types and that
size_t is such a type, but that is not the issue here because you are
not doing unsigned arithmetic you are calling a library function. The
issue is as I stated purely one of the standard not defining what strlen
does if the result does not fit in size_t.

strlen does not have to be implemented in standard C. It could increment
a pointer to the end then use a non-standard method to subtract the
start pointer from the end pointer in a way that produces an unsigned
result in range of size_t if it fits and crashes the program if it doesn't.
You could force the buffer to be null-terminated. or check for a null
terminator at the offset of what strlen has told you the length is.
(that won't tell you where it is, but it'll tell you where it's not)
Though, that's certainly not something you want to waste time doing
unless you're routinely passed huge strings.

It also won't work with the strlen implementation I suggested above
because the program will already have crashed.
Personally, when I wrote a strdup implementation (named ffstrdup) I
decided that anyone generating a string with length SIZE_MAX or longer
deserves to be shot, so I left it as undefined behaviour what the
library would do in such cases.

my implementation's strdup also relies on strlen, but that's beside the
point - my implementation also doesn't have a way to malloc an object
let alone a string larger than SIZE_MAX [though, i believe it is
possible to create one with mmap and a carefully-constructed file, i'm
not sure if such calls actually succeed]

I've no idea if it does either. I don't think it si tremendously important.
 
C

CBFalconer

Flash said:
Jordan said:
On 2006-01-27, Flash Gordon <[email protected]> wrote:
.... snip ...
Personally, when I wrote a strdup implementation (named ffstrdup)
I decided that anyone generating a string with length SIZE_MAX or
longer deserves to be shot, so I left it as undefined behaviour
what the library would do in such cases.

my implementation's strdup also relies on strlen, but that's
beside the point - my implementation also doesn't have a way to
malloc an object let alone a string larger than SIZE_MAX [though,
i believe it is possible to create one with mmap and a carefully-
constructed file, i'm not sure if such calls actually succeed]

I've no idea if it does either. I don't think it si tremendously
important.

Strings of length approaching SIZE_MAX are so common in my code
that I worry about this possibility all the time. They are playing
havoc with my printer, and eating up the toner. It is especially
bad when I have to dump those strings out on a 110 baud ASR33
Teletype. Wears out the clutch and makes holes in the ribbon.

I challenge any c.l.c reader to provide any real working code that
uses a string of even SIZE_MAX / 2 length.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
 
J

Jordan Abel

Strings of length approaching SIZE_MAX are so common in my code
that I worry about this possibility all the time. They are playing
havoc with my printer, and eating up the toner. It is especially
bad when I have to dump those strings out on a 110 baud ASR33
Teletype. Wears out the clutch and makes holes in the ribbon.

I challenge any c.l.c reader to provide any real working code that
uses a string of even SIZE_MAX / 2 length.

A medium-length text file embedded as a char[] object is about the only
way i see this making sense - and that's for a SIZE_MAX of 65536.
Assuming an average line length of 64, that's 1024 lines, or 17 pages at
60 lines per page.

[I assume an ASR33 can handle printing 17 pages of text, or half that
anyway, even if it will take a few... hour and a half or so]
 
S

Sjouke Burry

Jordan said:
Strings of length approaching SIZE_MAX are so common in my code
that I worry about this possibility all the time. They are playing
havoc with my printer, and eating up the toner. It is especially
bad when I have to dump those strings out on a 110 baud ASR33
Teletype. Wears out the clutch and makes holes in the ribbon.

I challenge any c.l.c reader to provide any real working code that
uses a string of even SIZE_MAX / 2 length.


A medium-length text file embedded as a char[] object is about the only
way i see this making sense - and that's for a SIZE_MAX of 65536.
Assuming an average line length of 64, that's 1024 lines, or 17 pages at
60 lines per page.

[I assume an ASR33 can handle printing 17 pages of text, or half that
anyway, even if it will take a few... hour and a half or so]
What makes you want a whole document inside a
single string??
If you approach programming that way,be prepared
for a lot of problems(especially those 4GB files).
 
K

Keith Thompson

Sjouke Burry said:
Jordan said:
Strings of length approaching SIZE_MAX are so common in my code
that I worry about this possibility all the time. They are playing
havoc with my printer, and eating up the toner. It is especially
bad when I have to dump those strings out on a 110 baud ASR33
Teletype. Wears out the clutch and makes holes in the ribbon.

I challenge any c.l.c reader to provide any real working code that
uses a string of even SIZE_MAX / 2 length.
A medium-length text file embedded as a char[] object is about the
only
way i see this making sense - and that's for a SIZE_MAX of 65536.
Assuming an average line length of 64, that's 1024 lines, or 17 pages at
60 lines per page.
[I assume an ASR33 can handle printing 17 pages of text, or half that
anyway, even if it will take a few... hour and a half or so]
What makes you want a whole document inside a
single string??
If you approach programming that way,be prepared
for a lot of problems(especially those 4GB files).

Suppose you want to sort it. Having the whole thing in memory (if it
fits) is a *lot* more efficient than trying to sort it on disk.
 
K

Keith Thompson

Keith Thompson said:
Suppose you want to sort it. Having the whole thing in memory (if it
fits) is a *lot* more efficient than trying to sort it on disk.

Correction: it's *likely* to be a lot more efficient. (C doesn't say
anything about the relative efficiency of memory access vs. disk
access, and thrashing on a virtual memory system can slow things down
considerably.)
 
J

Jordan Abel

Jordan said:
Strings of length approaching SIZE_MAX are so common in my code
that I worry about this possibility all the time. They are playing
havoc with my printer, and eating up the toner. It is especially
bad when I have to dump those strings out on a 110 baud ASR33
Teletype. Wears out the clutch and makes holes in the ribbon.

I challenge any c.l.c reader to provide any real working code that
uses a string of even SIZE_MAX / 2 length.


A medium-length text file embedded as a char[] object is about the only
way i see this making sense - and that's for a SIZE_MAX of 65536.
Assuming an average line length of 64, that's 1024 lines, or 17 pages at
60 lines per page.

[I assume an ASR33 can handle printing 17 pages of text, or half that
anyway, even if it will take a few... hour and a half or so]
What makes you want a whole document inside a
single string??
If you approach programming that way,be prepared
for a lot of problems(especially those 4GB files).

How about it's your usage message? x11vnc, the longest i've seen,
clocks in at 74084. X has a relatively long one [but nowhere
approaching the min max for SIZE_MAX] at 6090.

[the more relevant question is why would you want _two_ of such a
thing - given that this is a strdup implementation we're talking
about]
 
S

Sjouke Burry

Keith said:
Sjouke Burry said:
Jordan said:
Strings of length approaching SIZE_MAX are so common in my code
that I worry about this possibility all the time. They are playing
havoc with my printer, and eating up the toner. It is especially
bad when I have to dump those strings out on a 110 baud ASR33
Teletype. Wears out the clutch and makes holes in the ribbon.

I challenge any c.l.c reader to provide any real working code that
uses a string of even SIZE_MAX / 2 length.

A medium-length text file embedded as a char[] object is about the
only
way i see this making sense - and that's for a SIZE_MAX of 65536.
Assuming an average line length of 64, that's 1024 lines, or 17 pages at
60 lines per page.
[I assume an ASR33 can handle printing 17 pages of text, or half that
anyway, even if it will take a few... hour and a half or so]

What makes you want a whole document inside a
single string??
If you approach programming that way,be prepared
for a lot of problems(especially those 4GB files).


Suppose you want to sort it. Having the whole thing in memory (if it
fits) is a *lot* more efficient than trying to sort it on disk.
He was talking about everything in one string!!
I dont see how you are going to sort that.
Of course if you read each item in a seperate
string,you can sort it, but then you will not
run into SIZE_MAX .
In 99 out of 100 cases,if you run into SIZE_MAX
your program has(or does) something wrong .
 
W

websnarf

CBFalconer said:
Flash said:
Jordan said:
Personally, when I wrote a strdup implementation (named ffstrdup)
I decided that anyone generating a string with length SIZE_MAX or
longer deserves to be shot, so I left it as undefined behaviour
what the library would do in such cases.

my implementation's strdup also relies on strlen, but that's
beside the point - my implementation also doesn't have a way to
malloc an object let alone a string larger than SIZE_MAX [though,
i believe it is possible to create one with mmap and a carefully-
constructed file, i'm not sure if such calls actually succeed]

I've no idea if it does either. I don't think it si tremendously
important.

Strings of length approaching SIZE_MAX are so common in my code
that I worry about this possibility all the time. They are playing
havoc with my printer, and eating up the toner. It is especially
bad when I have to dump those strings out on a 110 baud ASR33
Teletype. Wears out the clutch and makes holes in the ribbon.

I challenge any c.l.c reader to provide any real working code that
uses a string of even SIZE_MAX / 2 length.

In the tests for "The Better String Library", when running in a 16-bit
environment, the test strings exceed this value. (This is an important
part of the test, BTW.) This can be an issue if you have single string
objects (like the contents of a .HTML file for example) that exceeds
32K in Bstrlib. (Bstrlib correctly deals with all these cases as
detected errors -- you should generally use bstreams, not bstrings for
large entries like that.)

Obviously, if you were to try to write big string/text manipulation
programs in a 16-bit environments without Bstrlib, you would run into
at least those problems.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,175
Messages
2,570,942
Members
47,489
Latest member
BrigidaD91

Latest Threads

Top