And that may bethe central point of disagreement between me and you.
Agreed, types (typically) have no existence in object modules. I don't
think that implies that they're purely nomenclature. After all, objects
don't (necessarily) exist in object modules either.
Given:
int arr[10];
int *ptr = arr+3;
arr[3], *(arr+3), and *ptr are exactly the same object, referred to
by different names. Given:
typedef int word;
int and word, are exactly the same type, referred to by different names.
True. But consider:
#if INT_MAX > 65536
typedef int word;
#else
typedef long word;
#endif
Now... After this, I think it's quite reasonable for someone to say that
"word" is not the same type as int or long, but rather, is the same type
as one of them on any given system but possibly not on other systems. At
this point, "word" has acquired a meaning which is definitely diffeent
from the definitions of either int or long. And if you're talking about
the program as a C program, rather than as a program built with a specific
implementation (including compiler options), it is incorrect to claim that
it is int, and also incorrect to claim that it is long.
It gets weirder. So far as I can tell, given:
foo.c:
typedef struct { int a; } word;
extern word x;
bar.c:
typedef struct { int a; } word;
word x = { 1 };
The two declarations of x are of DIFFERENT types. Which happen to be
compatible.
There is a coherent concept of "type". (I don't think the C Standard
has a single definition of the term, but one of the normative
references probably does.) I assert that the name is not part of the
type, and that two different names can refer to the same type, just
as (to be a bit more precise than I was in the previous paragraph)
two different lvalues can designate the same object. A "type" is
an abstract concept, but it exists independently of its name.
In trivial cases, yes. In more complex cases, though, you have a name
which maps onto some "type" in that sense, but you don't know which one,
and you also don't need to, because you can just treat it as being its
own type.
I think the model of viewing the alias as "merely" an alias makes it harder
to work effectively with such a type.
As for many other concepts, we often gloss over the distinction between
a name and the entity that the name refers to. But the distinction is
still there.
Agreed.
But when you are looking at logical types, like size_t, it turns out that
you get a much better model of their behavior by glossing over the
distinction, I think.
I agree. It's not the terminology I would have chosen, but it requires
a more strained reading to conclude that a typedef doesn't "define a
type" than to say that it does. Which, as I've said, leaves me with the
conclusion that "defining a type" does not *create* a type.
Hmm.
Okay, let's try an experiment.
Let us label terms:
type1: The underlying logical mapping of storage space to interpretation,
plus any magic flags that allow you to distinguish, e.g., between char and
signed/unsigned char, things like that.
type2: The mapping from a name to a type1.
type3: The name that is mapped to a type1 by a type2.
With these, we can now say:
* typedef defines type3s, and creates type2s.
* typedef does not create a type1.
* typedef does not define a type1.
* "size_t" is a type3.
* <stddef.h> defines a type2 named "size_t".
I think that as soon as we have the distinction between the different
senses in which people talk about types, it's easy to agree on everything.
I'm still wrestling with what the standard really means by "define"; I
suspect that in at least some cases the standard is just a bit sloppily
worded.
I think so.
A puzzle for you: Can you propose a strictly conforming program which can
determine which standard unsigned integer type, if any, size_t is?
If not, I think that explains why the standard is loose with the terminology
here; *for the purposes of strictly conforming programs*, size_t is its own
type, and you can never have a strictly conforming program which assumes that
it is the same as any other type. You can *know* that it must be the same
as some other type, but since you don't know which one, you can't write a
reasonably sane C program which uses that information in any way.
Knowing that typedef "does not create a new type" is perhaps important when
you're thinking about typedefs *you* create, because it's why you know
that you can do:
typedef int banana;
banana main(banana argc, char **argv) {
return (banana) 0;
}
(Okay, maybe that SPECIFIC example isn't important, but the principle is.)
But when it comes to typedefs provided by the language, so far as I can tell,
it's *not* important to know that they don't create new types, because there's
no way for portable code to rely on or make use of that information.
But yet again, char and signed char are distinct in a very real
way that size_t and unsigned long are not (replace signed char and
unsigned long by the appropriate types for a given implementation).
I don't see how your type model recognizes this distinction.
I think it's that parenthetical that matters. Since my model of C is,
for the most part, abstracted away from any specific implementation, even
though I know that size_t is almost certainly the same as a standard
unsigned integer type, there is no code I can ever reasonably write which
is affected by this information.
So the weakness of my model is that it doesn't help me much when I want
to write code specific to a given implementation that I specifically plan
to prevent people from porting.
And no,
it's not just a matter of either the language or any compiler
not being clever enough to tell things apart; it's about whether
they're defined *by the language* to be the same thing. It's not
a distinction that should necessarily affect how you write code,
but it's a very real distinction in terms of how the language is
defined, and how the standard uses the word "type".
Lemme give you another example. I firmly believe that, intentionally
or otherwise, C89 invoked undefined behavior if you accessed an uninitialized
object of type 'unsigned char', because all access to uninitialized
objects was undefined behavior. (I'm also pretty sure this was not
intentional.)
In C99, that's gone, because the undefined behavior from accessing
uninitialized objects is now handled through trap representations (and
let me say, I think that it's a beautiful solution). And unsigned char
doesn't *have* any trap representations.
That said:
#include <stdio.h>
int main(void) {
unsigned char u;
printf("%d\n", (int) u);
return 0;
}
This code still looks wrong to me. I can show you chapter and verse to prove
that this code does NOT invoke undefined behavior, only unspecified
behavior. Assuming that INT_MAX is greater than UCHAR_MAX, I can even
tell you that it's guaranteed to print a number from 0 to UCHAR_MAX
(inclusive).
But my primary working model of C ignores this in favor of saying "that
accesses an uninitialized value, and it's wrong."
Which is to say... My primary working model of C is noticably more
conservative than the language spec. I avoid writing some lines of code
which are technically not constraint violations, *as if* they were in
fact constraint violations, because it produces better code, and because
I consider it a mere artifact of circumstance or necessity that the
compiler won't catch and flag those things.
Hmm. I suppose the idea of two types being the same (or of two type
names referring to the same type) is limited in scope to a single
translation unit.
I think it's slightly fancier. I think it's a single translation unit *or*
the built-in types. I am pretty sure that "int" is the same type in every
translation unit. And so are all the aliases of the built-in types, I think.
I think structures, unions, enums, and functions (and function pointers)
have unique types in each translation unit, but that arrays and basic types
don't. I am not, however, totally sure of this.
-s