Implicit init to 0

N

Noob

Hello,

I am aware that, on some platforms, the representation of NULL and 0.0
may not be all-bits-zero.

On such platforms, does setting all bits to 0 with memset invoke UB?

#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar;
memset(&bar, 0, sizeof bar); /* UB? */
printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);
return 0;
}

Would u be correctly initialized? (I think so.)
Would i and j? (I think so, but am not sure.)

I now initialize bar differently...

#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar = { 0 };
printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);
return 0;
}

Will i,j,u,p,d all be correctly initialized to 0?

Does this mean that some compilers must generate different code for
"struct foo bar = { 0 }" than they do for "memset(&bar, 0, sizeof bar)" ?

On platforms where the two are equivalent, I imagine the compiler is free
to use whichever way is best. But on platforms where they are not, the
compiler must be careful?

Regards.
 
B

Ben Bacarisse

Noob said:
I am aware that, on some platforms, the representation of NULL and 0.0
may not be all-bits-zero.

On such platforms, does setting all bits to 0 with memset invoke UB?

I don't think so, but I may be wrong about that. Accessing the object
afterwards may be UB because all bits zero might be a trap
representation.

Of course, you can tell if your C implementation uses IEEE floating
point. If it does, all bits zero is 0.0.
#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar;
memset(&bar, 0, sizeof bar); /* UB? */
printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);
return 0;
}

Would u be correctly initialized? (I think so.)
Would i and j? (I think so, but am not sure.)

Yes. n1256.pdf says that all bits zero must be a zero value for all
integer types, though there are change bars against that text
suggesting that it might be the result of clarification.
I now initialize bar differently...

#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar = { 0 };
printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);
return 0;
}

Will i,j,u,p,d all be correctly initialized to 0?
Yes.

Does this mean that some compilers must generate different code for
"struct foo bar = { 0 }" than they do for "memset(&bar, 0, sizeof bar)" ?
Yes.

On platforms where the two are equivalent, I imagine the compiler is free
to use whichever way is best. But on platforms where they are not, the
compiler must be careful?

Yes.

Certain values can have multiple representations so it is
theoretically possible for both to work and to generate different,
correct, representations for some members of bar. In other words they
don't have to mean the same thing even when they both work.
 
E

Eric Sosman

Noob said:
Hello,

I am aware that, on some platforms, the representation of NULL and 0.0
may not be all-bits-zero.

Right. Such platforms seem to have become rarer than
the dodo, but there's always the chance. (Besides, we work
in a fashion-driven industry, and fashions change.)
On such platforms, does setting all bits to 0 with memset invoke UB?

No; you can always set any writeable byte to zero (or to
any other unsigned char value) without undefined behavior.
It's not the setting of the byte(s) that risks U.B., but the
attempt to use or even inspect the value of the bigger-than-byte
object afterwards.
#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar;
memset(&bar, 0, sizeof bar); /* UB? */

No U.B. here ...
printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);

.... but potential U.B. here, when the five tainted values are
fetched from the all-bits-zero struct.
return 0;
}

Would u be correctly initialized? (I think so.)
Would i and j? (I think so, but am not sure.)

Under C99 rules, all the struct elements are suspect and
might have invalid or trap representations. I think I've read
that the committee plans to require all-bits-zero to be a valid
zero for integer types in the next version of the Standard (or
maybe they've already done so in a TC). As far as I know, there
are still no promises for pointers and floating-point types.
I now initialize bar differently...

#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar = { 0 };
printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);
return 0;
}

Will i,j,u,p,d all be correctly initialized to 0?

Yes.
Does this mean that some compilers must generate different code for
"struct foo bar = { 0 }" than they do for "memset(&bar, 0, sizeof bar)" ?

Yes. They might do so anyhow. (For example, the compiler
might exploit knowledge of bar's alignment to generate faster
code than alignment-blind memset() would use.)
On platforms where the two are equivalent, I imagine the compiler is free
to use whichever way is best. But on platforms where they are not, the
compiler must be careful?

Compilers must always be careful, doing "whatever it takes"
to get the code to behave as the Standard dictates. If you
write code to initialize a value to 0 and the compiler gives you
a 42 instead, it's the compiler's fault and not yours.
 
N

Noob

Eric said:
No U.B. here ...


... but potential U.B. here, when the five tainted values are
fetched from the all-bits-zero struct.

Are you saying even bar.u (the unsigned int) has an "unknown" value?

I thought unsigned integer types were "safe".

i.e. if struct foo were
{
unsigned char v[8]; unsigned int ui; unsigned long ul;
};
then I thought the memset would "safely" set them all to 0.

Am I mistaken?

Regards.
 
B

Ben Bacarisse

Noob said:
Eric said:
No U.B. here ...


... but potential U.B. here, when the five tainted values are
fetched from the all-bits-zero struct.

Are you saying even bar.u (the unsigned int) has an "unknown" value?

I thought unsigned integer types were "safe".

i.e. if struct foo were
{
unsigned char v[8]; unsigned int ui; unsigned long ul;
};
then I thought the memset would "safely" set them all to 0.

Am I mistaken?

No, you are right, but signed int is also OK provided you are using
modern C. C89 does not have that guarantee.

Eric may have been talking about C89 or about bar.p and bar.d which
are not guaranteed to be safe even in C99.
 
N

Noob

Ben said:
Noob said:
Eric said:
Noob wrote:

#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar;
memset(&bar, 0, sizeof bar); /* UB? */
No U.B. here ...

printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);
... but potential U.B. here, when the five tainted values are
fetched from the all-bits-zero struct.

Are you saying even bar.u (the unsigned int) has an "unknown" value?

I thought unsigned integer types were "safe".

i.e. if struct foo were
{
unsigned char v[8]; unsigned int ui; unsigned long ul;
};
then I thought the memset would "safely" set them all to 0.

Am I mistaken?

No, you are right, but signed int is also OK provided you are using
modern C. C89 does not have that guarantee.

Thanks for pointing this out. I mostly use "gnu89" which is described
as "GNU dialect of ISO C90 (including some C99 features)".

I'll see what this means in term of initialization, but thanks to you
and Eric for making me aware that memset is "unsafe" in C89, even
for unsigned integer types.

Could you cite the relevant part of C99 that was added to make it safe?

Regards.
 
E

Eric Sosman

Ben said:
Noob said:
[...]
i.e. if struct foo were
{
unsigned char v[8]; unsigned int ui; unsigned long ul;
};
then I thought the memset would "safely" set them all to 0.

Am I mistaken?

No, you are right, but signed int is also OK provided you are using
modern C. C89 does not have that guarantee.

Eric may have been talking about C89 or about bar.p and bar.d which
are not guaranteed to be safe even in C99.

Under C99 (ISO/IEC 9899:1999) I believe that all the
elements of the original example's struct (int, unsigned,
void*, and double) could conceivably treat all-bits-zero as
trap representations. Ditto the unsigned int and unsigned
long elements of the revised struct (but not the array of
unsigned char).

As I mentioned, I think that this has changed (or is
in the process of being changed; not sure which) in an
amendment/TC/revision that came/will come after C99. My
understanding is that the change makes all-bits-zero a valid
representation of the zero value for all integer types (not
necessarily "the" representation of zero, but one that will
work). I'm not sure what the official status of the change
is: Whether it's under consideration or actually adopted.
 
B

Ben Bacarisse

Eric Sosman said:
Ben said:
Noob said:
[...]
i.e. if struct foo were
{
unsigned char v[8]; unsigned int ui; unsigned long ul;
};
then I thought the memset would "safely" set them all to 0.

Am I mistaken?

No, you are right, but signed int is also OK provided you are using
modern C. C89 does not have that guarantee.

Eric may have been talking about C89 or about bar.p and bar.d which
are not guaranteed to be safe even in C99.

Under C99 (ISO/IEC 9899:1999) I believe that all the
elements of the original example's struct (int, unsigned,
void*, and double) could conceivably treat all-bits-zero as
trap representations. Ditto the unsigned int and unsigned
long elements of the revised struct (but not the array of
unsigned char).

As I mentioned, I think that this has changed (or is
in the process of being changed; not sure which) in an
amendment/TC/revision that came/will come after C99. My
understanding is that the change makes all-bits-zero a valid
representation of the zero value for all integer types (not
necessarily "the" representation of zero, but one that will
work). I'm not sure what the official status of the change
is: Whether it's under consideration or actually adopted.

This is Defect Report 263. It seems to have been accepted by the
committee but after TC1 and TC2 were published so it has no official
status. I suppose it is possible it will be reversed, but given that
the committee's response was to accept Clive Feather's Suggest TC as
a (so far unpublished) TC, one can infer there was never any intent to
make all bits zero a trap for integer types.

Anyway, you are right, though I would have no hesitation in assuming
that memset(..., 0, ...) works since it seems to have committee
backing.
 
K

Keith Thompson

Noob said:
Ben said:
Noob said:
Eric Sosman wrote:

Noob wrote:

#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar;
memset(&bar, 0, sizeof bar); /* UB? */
No U.B. here ...

printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);
... but potential U.B. here, when the five tainted values are
fetched from the all-bits-zero struct.

Are you saying even bar.u (the unsigned int) has an "unknown" value?

I thought unsigned integer types were "safe".

i.e. if struct foo were
{
unsigned char v[8]; unsigned int ui; unsigned long ul;
};
then I thought the memset would "safely" set them all to 0.

Am I mistaken?

No, you are right, but signed int is also OK provided you are using
modern C. C89 does not have that guarantee.

Thanks for pointing this out. I mostly use "gnu89" which is described
as "GNU dialect of ISO C90 (including some C99 features)".

I'll see what this means in term of initialization, but thanks to you
and Eric for making me aware that memset is "unsafe" in C89, even
for unsigned integer types.

Could you cite the relevant part of C99 that was added to make it safe?

It's not actually in the C99 standard itself. It was added in
Technical Corrigendum 2 in response to Defect Report #263
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_263.htm>.

The amended text is in N1256 6.2.6.2p5:

For any integer type, the object representation where all the bits
are zero shall be a representation of the value zero in that type.

It should be reasonably safe to depend on this even for older
implementations. I've never heard of an implementation that doesn't
satisfy this requirement, and now that it's carved in stone it's even
less likely that it will be a problem for any new implementations.
 
K

Keith Thompson

Ben Bacarisse said:
This is Defect Report 263. It seems to have been accepted by the
committee but after TC1 and TC2 were published so it has no official
status.
[...]

No, it was published in TC2 and appears in N1256. It's official.
 
B

Ben Bacarisse

Noob said:
Ben said:
Noob said:
Eric Sosman wrote:

Noob wrote:

#include <stdio.h>
#include <string.h>
struct foo
{
int i; int j; unsigned u; void *p; double d;
};

int main(void)
{
struct foo bar;
memset(&bar, 0, sizeof bar); /* UB? */
No U.B. here ...

printf("%d %d %u %p %f\n", bar.i, bar.j, bar.u, bar.p, bar.d);
... but potential U.B. here, when the five tainted values are
fetched from the all-bits-zero struct.

Are you saying even bar.u (the unsigned int) has an "unknown" value?

I thought unsigned integer types were "safe".

i.e. if struct foo were
{
unsigned char v[8]; unsigned int ui; unsigned long ul;
};
then I thought the memset would "safely" set them all to 0.

Am I mistaken?

No, you are right, but signed int is also OK provided you are using
modern C. C89 does not have that guarantee.

Thanks for pointing this out. I mostly use "gnu89" which is described
as "GNU dialect of ISO C90 (including some C99 features)".

I'll see what this means in term of initialization, but thanks to you
and Eric for making me aware that memset is "unsafe" in C89, even
for unsigned integer types.

Could you cite the relevant part of C99 that was added to make it
safe?

No because I am wrong as far as C99 is concerned! I wanted to avoid
the details by using a vague phrase like "modern C" because I was not
sure of the status of the phrase in question.

It seems that, after a defect report was filed (number 263), a
correction was accepted but it has not be published as a correction to
the standard nor, of course, has a new standard been issued.

All the drafts for the new standard have, in 6.2.6.2 p5:

The values of any padding bits are unspecified. A valid (non-trap)
object representation of a signed integer type where the sign bit is
zero is a valid object representation of the corresponding unsigned
type, and shall represent the same value. For any integer type, the
object representation where all the bits are zero shall be a
representation of the value zero in that type.

It is one of those things that I'd be perfectly happy to accept as
portable despite it being technically undefined.

I think the general rule is that change bars in n1256.pdf mark text
accepted after C99+TC1+TC2 were released.
 
B

Ben Bacarisse

Keith Thompson said:
Ben Bacarisse said:
This is Defect Report 263. It seems to have been accepted by the
committee but after TC1 and TC2 were published so it has no official
status.
[...]

No, it was published in TC2 and appears in N1256. It's official.

Oh dear. I better shut up. I looked in what I thought was TC2 but it
was TC2 to C90! I see it is there in

http://www.open-std.org/JTC1/SC22/WG14/www/docs/9899-1999_cor_2-2004.pdf

for all to see. I am happy that I was right to start with (in that it
is official) but sorry that I confused matters by trying to correct
myself.
 
K

Keith Thompson

Ben Bacarisse said:
No because I am wrong as far as C99 is concerned! I wanted to avoid
the details by using a vague phrase like "modern C" because I was not
sure of the status of the phrase in question.

It seems that, after a defect report was filed (number 263), a
correction was accepted but it has not be published as a correction to
the standard nor, of course, has a new standard been issued.

The change was published in TC2, and the TCs *are* corrections
to the standard. The current official C standard, as I
understand it, consists of C99 + TC1 + TC2 + TC3. (N1256 is an
almost-but-not-quite-official document containing the same content.)

See <http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm>,
and search for 263:

Defect Report #263 UK C Panel – Closed, published in TC 2
Q1: All-zero bits representations.

The page for the DR itself,
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_263.htm>,
doesn't mention TC 2, but it does have a "Technical Corrigendum"
section, indicating that an official change was made.
All the drafts for the new standard have, in 6.2.6.2 p5:

The values of any padding bits are unspecified. A valid (non-trap)
object representation of a signed integer type where the sign bit is
zero is a valid object representation of the corresponding unsigned
type, and shall represent the same value. For any integer type, the
object representation where all the bits are zero shall be a
representation of the value zero in that type.

And all but the last sentence appears in the C99 standard.
It is one of those things that I'd be perfectly happy to accept as
portable despite it being technically undefined.

I think the general rule is that change bars in n1256.pdf mark text
accepted after C99+TC1+TC2 were released.

I think n1256.pdf has change bars for everything not in C99 (TC1, TC2,
and TC3).
 
P

Phil Carmody

Noob said:
Hello,

I am aware that, on some platforms, the representation of NULL and 0.0
may not be all-bits-zero. ....
On platforms where the two are equivalent, I imagine the compiler is free
to use whichever way is best. But on platforms where they are not, the
compiler must be careful?

I think you'll find the compiler has to be careful about every
part of the compilation process anyway.

Phil
 
M

Michael Tsang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,

I am aware that, on some platforms, the representation of NULL and 0.0
may not be all-bits-zero.

On such platforms, does setting all bits to 0 with memset invoke UB?

Null pointer can be converted to and from integer 0. However, its internal
representation may not be all bits zero. As using memset circumvents the
type system (treating everything char[]), the operation itself is not UB.
However, after memsetting, using that pointer value does invoke UB on such
implementations.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksjs5sACgkQG6NzcAXitM8olwCeIvRXGwPDjvJ5WZzOjyeVx4pY
6oEAnRHscJhDLRZ5xwzF3TXh7nDVhX/z
=W8t/
-----END PGP SIGNATURE-----
 
B

Ben Bacarisse

Michael Tsang said:
Null pointer can be converted to and from integer 0.

A null pointer is not guaranteed to convert to 0. The conversion is
implementation defined. Conversions from 0 are also implementation
defined unless the 0 is an integer constant expression. For example,
int null = 0; (void *)null == 0 is not guaranteed to be true (though
I'd lay odds on it being true if I had to bet).

<snip>
 
M

Michael Tsang

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ben said:
A null pointer is not guaranteed to convert to 0. The conversion is
implementation defined. Conversions from 0 are also implementation
defined unless the 0 is an integer constant expression. For example,
int null = 0; (void *)null == 0 is not guaranteed to be true (though
I'd lay odds on it being true if I had to bet).

<snip>

Sorry, I've got the wrong concept thinking that because "void *p = 0;" -> p
is a null pointer, 0 and null pointer are interchangeable.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAksk/SsACgkQG6NzcAXitM9agQCfRVTJDxbOTqUNoj+cdJX6r78i
2DQAoJFiXl/yCKqrtVVaaWI8ye+wAWe2
=FJmC
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top