Question about the clc string lib

J

Jeff

In the function below, can size ever be 0 (zero)?

char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);

size = strlen(s) + 1;
if (size == 0)
p = NULL;
else if ((p = malloc(size)) != NULL)
memcpy(p, s, size);

return p;
}
 
J

Jordan Abel

In the function below, can size ever be 0 (zero)?

I've never heard of the "clc string lib" - where can i find it?
char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);

No idea how this would work in a way that needs the function pointer as
its first argument. Is it a macro that stringizes its first argument to
print an error?
size = strlen(s) + 1;

The result of this assignment cannot be zero.
if (size == 0)
p = NULL;

hold on - i take that back. size can be 0 if strlen returns SIZE_MAX.
else if ((p = malloc(size)) != NULL)
memcpy(p, s, size);

return p;
}

Incidentally, speaking of prefixes, the str* prefix, and reserved
identifiers, is something along the lines of

#define strdup clc_strdup

legal? My reading of the standard says it is, but thought i'd ask here.
 
P

pemo

Jeff said:
In the function below, can size ever be 0 (zero)?

char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);

size = strlen(s) + 1;
if (size == 0)
p = NULL;
else if ((p = malloc(size)) != NULL)
memcpy(p, s, size);

return p;
}

If s is a null terminated string, even if s[0] == '\0', then strlen of s
will be 0. As you add one to that, in this case, size cannot be 0.

if s is of such a length that it overflows a size_t (on my system that's
SIZE_MAX = UNIT32_MAX = 4294967295 [+ 1], then size will go to 0. Again, as
you add 1 to that, I can't see how size can ever be zero.
 
J

Jordan Abel

Jeff said:
In the function below, can size ever be 0 (zero)?

char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);

size = strlen(s) + 1;
if (size == 0)
p = NULL;
else if ((p = malloc(size)) != NULL)
memcpy(p, s, size);

return p;
}

If s is a null terminated string, even if s[0] == '\0', then strlen of s
will be 0. As you add one to that, in this case, size cannot be 0.

if s is of such a length that it overflows a size_t (on my system that's
SIZE_MAX = UNIT32_MAX = 4294967295 [+ 1], then size will go to 0. Again, as
you add 1 to that, I can't see how size can ever be zero.

Except, if the length of s _is_ SIZE_MAX, then subsequently adding 1
will "overflow".

suppose we have an unrealistic system where SIZE_MAX is 63. [clearly not
ISO compliant, but i don't want to type out 65535 characters]

then, the string
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789"

has a length of 63, and 63+1 will wrap to 0.
 
P

pemo

Jordan said:
Jeff said:
In the function below, can size ever be 0 (zero)?

char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);

size = strlen(s) + 1;
if (size == 0)
p = NULL;
else if ((p = malloc(size)) != NULL)
memcpy(p, s, size);

return p;
}

If s is a null terminated string, even if s[0] == '\0', then strlen
of s will be 0. As you add one to that, in this case, size cannot
be 0.

if s is of such a length that it overflows a size_t (on my system
that's SIZE_MAX = UNIT32_MAX = 4294967295 [+ 1], then size will go
to 0. Again, as you add 1 to that, I can't see how size can ever be
zero.

Except, if the length of s _is_ SIZE_MAX, then subsequently adding 1
will "overflow".

suppose we have an unrealistic system where SIZE_MAX is 63. [clearly
not ISO compliant, but i don't want to type out 65535 characters]

then, the string
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789"

has a length of 63, and 63+1 will wrap to 0.

Yup - agreed, if the string is exactly SIZE_MAX in length. Thanks for the
correction.
 
V

Vladimir S. Oka

Jordan Abel wrote:

Incidentally, speaking of prefixes, the str* prefix, and reserved
identifiers, is something along the lines of

#define strdup clc_strdup

legal? My reading of the standard says it is, but thought i'd ask
here.

I believe it is (and can't be asked to open the Standard right now).

In any case, by the time the compiler proper sees the code, all
instances of `strdup` will be replaced by the pre-processor with the
`clc_strdup` which does not violate the "str[lowercaseletter] is
reserved" requirement.

Cheers

Vladimir
 
B

boa

Jordan said:
I've never heard of the "clc string lib" - where can i find it?
http://libclc.sf.net


No idea how this would work in a way that needs the function pointer as
its first argument. Is it a macro that stringizes its first argument to
print an error?


The result of this assignment cannot be zero.


hold on - i take that back. size can be 0 if strlen returns SIZE_MAX.

size can never be SIZE_MAX as size doesn't include '\0'. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX.

boa
 
B

boa

boa said:
size can never be SIZE_MAX as size doesn't include '\0'. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX.

boa

One more try: strlen() can never return SIZE_MAX. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX, so the max value strlen() can return is SIZE_MAX - 1.

boa
 
J

Jordan Abel

One more try: strlen() can never return SIZE_MAX. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX, so the max value strlen() can return is SIZE_MAX - 1.

boa

take SIZE_MAX 65535

char * foo = calloc(256,256);
memset(foo,"x",SIZE_MAX);
strlen(foo) == SIZE_MAX;

I don't think there's an explicit requirement in the standard that there
never be an object smaller than SIZE_MAX. There may, however, be a
requirement that the product of the arguments to calloc be less than
SIZE_MAX. [that, i don't know.]
 
J

Jeff

Jordan said:
Jeff said:
In the function below, can size ever be 0 (zero)?

char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);

size = strlen(s) + 1;
if (size == 0)
p = NULL;
else if ((p = malloc(size)) != NULL)
memcpy(p, s, size);

return p;
}

If s is a null terminated string, even if s[0] == '\0', then strlen of s
will be 0. As you add one to that, in this case, size cannot be 0.

if s is of such a length that it overflows a size_t (on my system that's
SIZE_MAX = UNIT32_MAX = 4294967295 [+ 1], then size will go to 0. Again, as
you add 1 to that, I can't see how size can ever be zero.

Except, if the length of s _is_ SIZE_MAX, then subsequently adding 1
will "overflow".

suppose we have an unrealistic system where SIZE_MAX is 63. [clearly not
ISO compliant, but i don't want to type out 65535 characters]

then, the string
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789"

has a length of 63, and 63+1 will wrap to 0.

But if the most you can allocate is SIZE_MAX, then your string can only
be SIZE_MAX-1 if it's going to be null terminated. Therefore I don't
see how strlen can return SIZE_MAX.

Jeff
 
B

boa

Jordan said:
One more try: strlen() can never return SIZE_MAX. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX, so the max value strlen() can return is SIZE_MAX - 1.

boa

take SIZE_MAX 65535

char * foo = calloc(256,256);
memset(foo,"x",SIZE_MAX);
strlen(foo) == SIZE_MAX;

I don't think there's an explicit requirement in the standard that there
never be an object smaller than SIZE_MAX. There may, however, be a
requirement that the product of the arguments to calloc be less than
SIZE_MAX. [that, i don't know.]

It goes without saying? ;-)

FWIW, I've looked through both the C99 standard, the C rationale and
TC1, looking for some description of the relationship between size_t and
SIZE_MAX, but found nothing. SIZE_T is only mentioned twice in C99.

So I really don't have much of a case here. One could argue that even if
it is OK to calloc() memory the way you do above, you don't allocate a
string, just 256 objects of 256 bytes so you cannot use it as an
argument to strlen().

The standard is vague on this issue, 7.21.1 says this:
The header <string.h> declares one type and several functions, and defines one
macro useful for manipulating arrays of character type and other objects treated as arrays
of character type.

Pretty clear that strlen() manipulates arrays of character type, but
what's "other objects"?

boa
 
S

Steve Summit

Jordan said:
I've never heard of the "clc string lib" -

Silly me, I hadn't either!
No idea how this would work in a way that needs the function pointer as
its first argument. Is it a macro that stringizes its first argument to
print an error?

Good question. But it could also just be used as a
guaranteed-unique, easy-to-produce token.

My suggestion, though, would be to omit that assertion entirely
and replace it with a simple

if(s == NULL)
return NULL;
 
M

Michael Wojcik

I believe it is (and can't be asked to open the Standard right now).

If you can't be bothered to check the Standard, why reply? That's
a serious question. Jordan's question was about the Standard, not
about anyone else's belief.

I *am* looking at the Standard (I'll cite C99 here, but C90 has
equivalent, indeed mostly identical, language). My interpretation
of the Standard contradicts Jordan's.

Note first that macro names are identifiers (6.2.1).

7.1.3 ("Reserved identifiers"):

Each header declares or defines all identifiers listed in its
associated subclause, and optionally declares or defines identifiers
listed in its associated future library directions subclause and
identifiers which are always reserved either for any use or for use
as file scope identifiers.
...
Each identifier with file scope listed in any of the following
subclauses (including the future library directions) is reserved for
use as a macro name and as an identifier with file scope in the same
name space if any of its associated headers is included.

Note the "str..." identifiers in string.h are identifiers with file
scope.

7.26.10 ("General utilities <stdlib.h>"):

Function names that begin with str and a lowercase letter may be
added to the declarations in the <stdlib.h> header.

7.26.11 ("String handling <string.h>"):

Function names that begin with str, mem, or wcs and a lowercase
letter may be added to the declarations in the <string.h> header.


"strdup" as a macro name is an identifier that begins with "str"
and a lowercase letter. It has file scope, because it is a macro
name. That means it is covered by 7.26.10 and 7.26.11. By 7.1.3,
it is thus reserved if stdlib.h or string.h is included.

The only thing that making it a macro name rather than simply having
a function (with external linkage) named "strdup" gives you is
relief from 7.26:

7.26 ("Future library directions"):

All external names described below are reserved no matter what
headers are included by the program.

So calling the function "clc_strdup" and using a macro to refer to it
as "strdup" is legal provided stdlib.h and string.h are not included
- but that seems rather unlikely, and you could achieve the same thing
by giving "strdup" internal linkage (ie by declaring it static).

In any case, by the time the compiler proper sees the code, all
instances of `strdup` will be replaced by the pre-processor with the
`clc_strdup` which does not violate the "str[lowercaseletter] is
reserved" requirement.

This is mostly wrong. There is no "compiler proper" as far as the
Standard is concerned; the replacement of macro-name identifiers with
macro bodies is part of translation phase four, carried out by the
same notional "implementation" as all other translation phases. More
importantly, some of the restrictions on reserved identifiers, like
this one, apply to macro names. That the macro name is replaced with
its associated body is irrelevant in this case.

Now, Jordan certainly knows how preprocessing directives and macro
expansion work. His question had to do with what the Standard says
about reserved identifiers and whether an identifier of particular
type and name was reserved. This is a question which can only be
answered by recourse to the Standard, not by speculation about what
happens during compilation; it is a point of law, not a point of
fact. It is, in other words, a question of pedantry, and only
pedantry will satisfy it.

Fortunately, c.l.c contains one of the world's largest herds of free-
roaming pedants, thundering majestically across the virtual plains...
 
K

Keith Thompson

Jordan Abel said:
I've never heard of the "clc string lib" - where can i find it?


No idea how this would work in a way that needs the function pointer as
its first argument. Is it a macro that stringizes its first argument to
print an error?

That's the most likely case (but then why not just pass a string?),
but it *could* be a function that compares its first argument against
the address of each function that it knows about. Silly, but
possible.
 
K

Keith Thompson

Fortunately, c.l.c contains one of the world's largest herds of free-
roaming pedants, thundering majestically across the virtual plains...

I just wanted to quote that sentence.
 
J

Jordan Abel

Each identifier with file scope listed in any of the following
subclauses (including the future library directions) is reserved for
use as a macro name and as an identifier with file scope in the same
name space if any of its associated headers is included.

Note the "str..." identifiers in string.h are identifiers with file
scope.

"strdup" as a macro name is an identifier that begins with "str"
and a lowercase letter. It has file scope, because it is a macro
name. That means it is covered by 7.26.10 and 7.26.11. By 7.1.3,
it is thus reserved if stdlib.h or string.h is included.

Can I #undef it and then #define my own? It seems like this would fall
into the same dubious area as redefining a keyword, but would still be
legal (just as it is to redefine a keyword)
 
V

Vladimir S. Oka

Michael said:
If you can't be bothered to check the Standard, why reply? That's
a serious question. Jordan's question was about the Standard, not
about anyone else's belief.

Your premise here seems to be that I have never actually opened the
Standard, and am in fact guessing. If you happened to agree with me (or
Jordan), I wonder whether you'd have bothered to chastise me. Instead,
you proceed believing (no pun intended) that your reading of the
Standard will justify the above paragraph, and the below statement.
I *am* looking at the Standard (I'll cite C99 here, but C90 has
equivalent, indeed mostly identical, language). My interpretation
of the Standard contradicts Jordan's.

But, let's see what your reading of the standard yields:
Note first that macro names are identifiers (6.2.1).

7.1.3 ("Reserved identifiers"):

Each header declares or defines all identifiers listed in its
associated subclause, and optionally declares or defines
identifiers listed in its associated future library directions
subclause and identifiers which are always reserved either for any
use or for use as file scope identifiers.
...
Each identifier with file scope listed in any of the following
subclauses (including the future library directions) is reserved
for use as a macro name and as an identifier with file scope in the
same name space if any of its associated headers is included.

Note the "str..." identifiers in string.h are identifiers with file
scope.

7.26.10 ("General utilities <stdlib.h>"):

Function names that begin with str and a lowercase letter may be
added to the declarations in the <stdlib.h> header.

7.26.11 ("String handling <string.h>"):

Function names that begin with str, mem, or wcs and a lowercase
letter may be added to the declarations in the <string.h> header.


"strdup" as a macro name is an identifier that begins with "str"
and a lowercase letter. It has file scope, because it is a macro
name. That means it is covered by 7.26.10 and 7.26.11. By 7.1.3,
it is thus reserved if stdlib.h or string.h is included.

The only thing that making it a macro name rather than simply having
a function (with external linkage) named "strdup" gives you is
relief from 7.26:

7.26 ("Future library directions"):

All external names described below are reserved no matter what
headers are included by the program.

So calling the function "clc_strdup" and using a macro to refer to it
as "strdup" is legal provided stdlib.h and string.h are not included
- but that seems rather unlikely, and you could achieve the same thing
by giving "strdup" internal linkage (ie by declaring it static).

In other words, there _are_ circumstances where it actually _is_ legal,
even according to your reading of the Standard. Why would it be
unlikely for a source file not to include certain standard headers?
I've seen quite a few.

So, your reading does not actually contradict Jordan's (or mine, for
that matter), at least not entirely.
In any case, by the time the compiler proper sees the code, all
instances of `strdup` will be replaced by the pre-processor with the
`clc_strdup` which does not violate the "str[lowercaseletter] is
reserved" requirement.

This is mostly wrong. There is no "compiler proper" as far as the
Standard is concerned; the replacement of macro-name identifiers with
macro bodies is part of translation phase four, carried out by the
same notional "implementation" as all other translation phases. More
importantly, some of the restrictions on reserved identifiers, like
this one, apply to macro names. That the macro name is replaced with
its associated body is irrelevant in this case.

I do agree that I got this bit wrong.
Now, Jordan certainly knows how preprocessing directives and macro
expansion work. His question had to do with what the Standard says
about reserved identifiers and whether an identifier of particular
type and name was reserved.

Thanks for clarifying this for me...
This is a question which can only be
answered by recourse to the Standard, not by speculation about what
happens during compilation; it is a point of law, not a point of
fact. It is, in other words, a question of pedantry, and only
pedantry will satisfy it.

Fortunately, c.l.c contains one of the world's largest herds of free-
roaming pedants, thundering majestically across the virtual plains...

And sometimes, just sometimes, being a bit too self-righteous... ;-)

Cheers

Vladimir
 
J

Jeff

Jordan said:
One more try: strlen() can never return SIZE_MAX. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX, so the max value strlen() can return is SIZE_MAX - 1.

boa

take SIZE_MAX 65535

char * foo = calloc(256,256);
memset(foo,"x",SIZE_MAX);
strlen(foo) == SIZE_MAX;

I don't think there's an explicit requirement in the standard that there
never be an object smaller than SIZE_MAX. There may, however, be a
requirement that the product of the arguments to calloc be less than
SIZE_MAX. [that, i don't know.]

That's my point below. You can't allocate more than SIZE_MAX so the
strlen would be SIZE_MAX -1, so in the example above, size could not be
0 (zero). I may be missing something, but the check 'if (size == 0)'
seems pointless.

Jeff
 
F

Flash Gordon

This is a variation of something I suggested in another thread some time
ago. I don't remember anyone proving it invalid back then, so I'll see
if I can find the justification again.
I don't think there's an explicit requirement in the standard that there
never be an object smaller than SIZE_MAX. There may, however, be a
requirement that the product of the arguments to calloc be less than
SIZE_MAX. [that, i don't know.]

It goes without saying? ;-)

The definition of calloc places no limit on what you can pass to it. Of
course, an implementation could legally always return NULL if you try to
allocate an object larger than SIZE_MAX.
FWIW, I've looked through both the C99 standard, the C rationale and
TC1, looking for some description of the relationship between size_t and
SIZE_MAX, but found nothing. SIZE_T is only mentioned twice in C99.

size_t is mentioned a lot more than twice in C99, although SIZE_T isn't
mentioned at all. SIZE_MAX is mentioned 3 times in C99 and once is
defining it as the limit of size_t.
So I really don't have much of a case here.

I see nothing in the standard that forbids it, and definitely nothing
that forbids the implementation from successfully allocating such an object.
> One could argue that even if
it is OK to calloc() memory the way you do above, you don't allocate a
string, just 256 objects of 256 bytes so you cannot use it as an
argument to strlen().

You allocate space for an array of objects, and that space must be
contiguous. Since the standard defines an object as, "region of data
storage in the execution environment, the contents of which can
represent values" the space allocated by calloc clearly meets the
definition of an object. The standard also states, "When a pointer to an
object is converted to a pointer to a character type, the result points
to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining
bytes of the object." so it is clearly legal to increment a char pointer
over such an object created by a call to calloc, assuming the calloc
call succeeds. Now for the big one, section 7.1.1of N1124:

| A string is a contiguous sequence of characters terminated by and
| including the first null character. The term multibyte string is
| sometimes used instead to emphasize special processing given to
| multibyte characters contained in the string or to avoid confusion
| with a wide string. A pointer to a string is a pointer to its initial
| (lowest addressed) character. The length of a string is the number of
| bytes preceding the null character and the value of a string is the
| sequence of the values of the contained characters, in order.

No where does the above place any limitations on the length of the
string or how you create it.

So if the implementation allows the call to calloc to succeed, it is
perfectly legal to construct a string with a length of SIZE_MAX.
The standard is vague on this issue, 7.21.1 says this:

Pretty clear that strlen() manipulates arrays of character type, but
what's "other objects"?

You can treat any object as an array of type unsigned char. See the bit
about being allowed to convert any pointer to a character pointer. See
also http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_274.htm which
says that the string functions should treat the characters as type
unsigned char which definitely has no trap representations or padding bits.

So I believe it is technically possibly to generate a string of size
SIZE_MAX without invoking undefined behaviour if the implementation
allows it, but the implementation is not required to allow you to do
this (the calloc call can fail).

Note that using a 2d array instead of calloc'd space still does not
force the implementation to allow you to do this since you would be
exceeding an environmental limit, this is covered by another DR.
 
F

Flash Gordon

Jeff said:
Jordan said:
boa wrote:
Jordan Abel wrote:
In the function below, can size ever be 0 (zero)?
I've never heard of the "clc string lib" - where can i find it?
http://libclc.sf.net

char *clc_strdup(const char * CLC_RESTRICT s)
{
size_t size;
char *p;

clc_assert_not_null(clc_strdup, s);
No idea how this would work in a way that needs the function pointer as
its first argument. Is it a macro that stringizes its first argument to
print an error?

size = strlen(s) + 1;
The result of this assignment cannot be zero.

if (size == 0)
p = NULL;
hold on - i take that back. size can be 0 if strlen returns SIZE_MAX.
size can never be SIZE_MAX as size doesn't include '\0'. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX.

boa
One more try: strlen() can never return SIZE_MAX. A valid string
always has the '\0' and the max size of a buffer containing a string is
SIZE_MAX, so the max value strlen() can return is SIZE_MAX - 1.

boa
take SIZE_MAX 65535

char * foo = calloc(256,256);
memset(foo,"x",SIZE_MAX);
strlen(foo) == SIZE_MAX;

I don't think there's an explicit requirement in the standard that there
never be an object smaller than SIZE_MAX. There may, however, be a
requirement that the product of the arguments to calloc be less than
SIZE_MAX. [that, i don't know.]

That's my point below. You can't allocate more than SIZE_MAX so the
strlen would be SIZE_MAX -1, so in the example above, size could not be
0 (zero). I may be missing something, but the check 'if (size == 0)'
seems pointless.

See my other post, the standard does not forbid Jordan's code as far as
I can see. So if the call to calloc succeeds you have *legally* created
an object larger than SIZE_MAX.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,175
Messages
2,570,942
Members
47,489
Latest member
BrigidaD91

Latest Threads

Top