Malcolm said:
Time of writing != compile time.
Let's say we want to calculate a standard deviation. The prototype is
double stdev(double *x, N);
Technically, x is a pointer, not an array. Three likely possibilities...
[1.]
double foo, bar[] = { /* ... */ };
foo = stdev(bar, sizeof bar / sizeof bar[0]);
Here, `N' is the result of sizeof, a size_t.
[2a.]
#define NUMBARS 100
double foo, bar[NUMBARS];
foo = stdev(bar, NUMBARS);
`N' is of type int, implicitly converted to size_t.
[2b.]
#define NUMBARS 100
double foo, bar[NUMBARS];
foo = stdev(bar, sizeof bar / sizeof bar[0]);
`N' is the result of sizeof, a size_t.
[3.]
#define NUMBARS 100
double foo, *bar;
bar = malloc(NUMBARS * sizeof *bar);
foo = stdev(bar, NUMBARS);
`N' is closely related to the allocation size in the malloc() call,
which is a size_t.
what type should N be? If you don't know how big the maximum array is
going to be, which you don't for this function - except that it fits in
memory - it must be a size_t.
You make it sound like that's a bad thing. In the hypothetical (but
possible) case that CHAR_BITS is sufficiently large and
(sizeof double) == 1; yes, N can be as big as the biggest unit of
allocatable memory; the exact range a variable of type size_t can
hold.
Thus we must write
size_t i;
for(i=0;i<N;i++)
{
}
Of course that is misleading, because i is not in any shape or form a size
type. It is an index counter. The implications of introducing size_t
simply weren't thought through.
Think of it this way -- `i' is a variable that has to be able to
represent the same set of values as `N'; the name of its type is
the least important here.
You find that functions like stdev() are by no means uncommon. Very
frequently you will not hard code the size of an array, until maybe in the
very top layer of code.
If stdev() is designed so it can take the maximum possible number
of input values only restricted by continuous addressable memory,
what other type could N be? The calling code, on the other hand,
may pass a narrower type in the second argument which will be
implicitly converted to the wider type. Different layers of code
may hold the same value in different types.
The worse problem is that, frequently, you don't know the exact size of an
array but you know that it will be small. For instance the number of
children in a class. Should that be a size_t or not? If we sort the class
by grade, qsort() takes two size_ts. However people will naturally gib at
using a size_t when an int, realistically, is going to be enough. So you
get inconsistency.
Same thing here. qsort() is a general purpose function that is able
to deal with a wider range of inputs than this example uses. Since
neither of the two size_t arguments can be negative, I'd suggest
unsigned int, but since both types (assuming non-negative values)
convert losslessly to a size_t, what is the exact problem? Different
layers of code can hold the same value in different types.
Most integers are ultimately used as index variables.
Maybe, maybe not, depends. Does not really matter, either; narrower
integers can be index variables, too, as long as the expected set
of values fits into the smaller type.
[some compilers/assemblers on some CPUs will even produce shorter
opcodes for small index types and/or small constant indices]
Not every integer,
of course, for instance if you dealing with amounts of money you may
choose to represent the sums by integers.
Presumably this referes to the data type used for array elements and
their sum; unrelated.
But every time you add up a list
of amounts of moeny, you will have one index integer to iterate through
the array and another to count it.
While both, index and count, should have the same (preferably unsigned)
type, they don't have to be size_t.
Programs don't spend their time doing
calculations, but on moving data from one place to another. Something like
20% of all mainframe cycles are used in sorting, for instance.
I'll pretend I have no opinion.
Even if an integer is a type field, typically that is used as an array
index.
Maybe, maybe not, depends, does not really matter, ...
For instance if we have an emum {MR, MRS, MISS, MS, DR, REV, PROF,
LORD} we will probably have an array of strings we index into to help us
construct letters.
....the number of enumeration constants in said enum cannot portably
be larger than 1023; assuming it starts at 0 and has no gaps in it,
a variable of that type can indeed be used for such purposes.
Where does size_t come into play here?
That's the problem. Really virtually every integer in the program should
be a size_t, because they will almost all end up being used to derive
index calculations.
Whether that's true or not (I have no opinion)... why do they have
to be size_t? As far as I can tell, in
foo_expression '[' bar_expression ']'
"One of the expressions shall have type ``pointer to object type'',
the other expression shall have integer type, and the result has
type ``[object] type''." [6.5.2.1.1]
Nowhere I can find is it mentioned that an index must be size_t,
nor that any differently typed index is converted to size_t, or may
results in any other form of computational overhead. In fact, as
already mentioned above, a smaller index type may produce *shorter*
code on some architectures.
But that is unlikely to be accepted, partly because of
the unsignedness and efficiency considerations, but mainly because to type
"size_t i;" is so counter-intutitive.
Personally, I think the set of storable values is more important
than the name of a type; *iff* the full range of values a size_t can
hold is even needed in any particular place.
That's why I think think size_t will ultimately have to go, and the
introduction of 64-bit types on the desktop
"Desktops" aren't everything.
will be the catalyst for this,
because it will no longer be true that int can index an arbitrary array.
Just curious, did anyone suggest this change to the standards body?
What did they respond?