Fao said:
Actually, let me see if I can figure this out for myself.
The sizeof operator results in something of type size_t, which I assume
on some implementations *could* be an unsigned long. If that's correct,
why not unsigned long long?
But this doesn't answer my question as to where the UD could come into
play. Obviously, the original code with a %d could not display any
number over INT_MAX; however, the result would still be defined (at
least I _think_ rolling over is defined)
Am I missing something?
Yes, you're missing the fact that printf() has no way of knowing the
actual types of its arguments other than the format string.
Let's take a simple example:
#include <stdio.h>
int main(void)
{
long int n = 42;
printf("n = %d\n", n);
return 0;
}
By using "%d" in the format string, you're promising that the
corresponding argument is going to be of type int. The compiler can't
check that you've kept your promise; the format string could be a
variable, so the compiler has no way of knowing what's in it.
(Actually, some compilers can and do perform such checks if the format
is a literal, but the standard doesn't require it. "gcc -Wall" prints
a warning for mismatched printf formats.) The long int value is not
converted it int; you've told the compiler to assume that it's
*already* of type int. In other words, you've lied to the compiler,
and it will get its revenge.
What actually happens is going to depend on a number of things, such
as the parameter-passing convention. One possibility is that the
caller will push a long int (the value of n) onto the stack, and
printf() will pop an int from the stack, because you told it to expect
an int. In many cases, this will happen to work (because int and long
int are the same size, or because an int argument is passed the same
way as a long int argument, or because you just got lucky.) In other
cases, it could corrupt the stack pointer, causing subsequent code to
lose track of which local variables are stored where (this is unlikely
for historical reasons, but the standard allows it). The bottom line
is that it's undefined behavior, meaning that the standard places no
constraints on what could happen, from behaving as you expect to
making demons fly out your nose. (Behaving as you expect is actually
the worst outcome, since it prevents you from finding the error until
you port the code to another platform, and it fails subtly or
spectactularly at the most inconvenient possible moment.)
By contrast, consider this example:
#include <stdio.h>
static void print_int(int x)
{
printf("%d", x);
}
int main(void)
{
long int n = 42;
printf("n = ");
print_int(n);
printf("\n");
return 0;
}
This doesn't invoke undefined behavior; it's guaranteed to print
n = 42
(assuming there are no problems writing to stdout). The function
print_int() expects an argument of type int, but you're passing it an
argument of type long int, so the argument is implicitly converted
from long int to int before being passed. Since the value 42 is
guaranteed to fit into an int, overflow is not an issue.
So why is the implicit conversion performed on the call to print_int()
but not on the call to printf()? Because for print_int(), there's a
prototype that tells the compiler that it expects an argument of type
int. The compiler knows it's going to need to generate an implicit
conversion, and the standard requires it to do so. For printf(),
there's also a prototype (assuming you've remembered the
"#include <stdio.h>"; if not, any call to printf() invokes undefined
behavior) -- but the prototype looks like this:
int printf(const char * restrict format, ...);
(The "restrict" keyword was added in C99; don't worry about the
"const" or "restrict" keywords for now.) The point is that the
compiler, given this prototype, has no way of knowing that the second
argument in printf("%d", x) is supposed to be an int.
Getting back to the original example:
printf("%d %d\n",sizeof(xyz),sizeof(abc));
Given the format string, printf() assumes (because you promised it)
that the second and third arguments are going to be of type int. The
compiler doens't know about this promise, so it passes arguments of
type size_t. Undefined behavior.
One solution is to use *explicit* conversions, so you know that you're
passing arguments of the right type:
printf("%d %d\n", (int)sizeof(xyz), (int)sizeof(abc));
If you happen to know that sizeof(xyz) and sizeof(abc) will both fit
into an int, this is fine. If not, use a bigger type (probably an
unsigned one, since size_t is unsigned):
printf("%lu %lu\n",
(unsigned long)sizeof(xyz),
(unsigned long)sizeof(abc));
Or, if you happen to have a C99-compliant implementation, you can use
the new 'z' length modifier, which specifically applies to size_t
arguments:
printf("%zu %zu\n", sizeof(xyz), sizeof(abc));
but that gives you undefined behavior in a C90 implementation.