type of array index?

C

CBFalconer

Andrey said:
Stephen Sprunk wrote:
.... snip ...

Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works. One
day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array,
like 'deque', for example. Size of that container is no longer
limited by the range of 'size_t'.

All of which leads to easy generation of buffer overflows. What is
really needed is an index type to match the array, i.e. a
subrange. Unfortunately C doesn't have such a thing. Even an
enumeration won't work. An enhancement of typedef would allow
stronger typing in future, but probably won't happen. I.E. let
typedef create a real type:

typedef 0U ... 10U myindex; /* reusing the ... varargs token */

T arraything[tmax(myindex)];

for (myindex i = 0; i < tmax(myindex); i++) ...

using tmax (and tmin) as operators, similar to sizeof.

crossed to comp.std.c in case it stirs some thoughts.
 
A

Andrey Tarasevich

Stephen said:
Per the standard, size_t CANNOT overflow when used as an array index.

That's exactly what I said in my original message (the one you initially
replied to). You omitted that portion from the quote. What's the point
of repeating it here?
One day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array, like
'deque', for example. Size of that container is no longer limited by
the range of 'size_t'.

A 'deque' is not an array and therefore any problem indexing them with
size_t is moot.
[ Rest of post snipped because it doesn't apply to arrays. ]

Quite the opposite. Within this context anything that applies
specifically to indexing of arrays is moot. This whole branch of this
thread doesn't apply to arrays. This whole branch is started by your
reply to the "generic" portion of my message (quoted at the very top).
And I clearly and repeatedly explained that issue I'm talking about is
the _conceptual_ issue of using 'size_t' as an index type for a generic
indexed container.
The OP specifically asked about the type to use for an array index.
Hijacking the thread to generalize to other non-standard container types is
irrelevant at best.

You did not reply to the OP. You replied to me. You replied to something
that was noting more than a side note in my response to the OP. You
extracted this side note from my message, robbing it of its original
context, and tried to misrepresent it as thread hijacking? Sorry, but
that's you who hijacked this thread. If you did this unintentionally,
the I can only suggest that in the future, when you reply to a
particular portions of the people messages, make sure you understand
what they are talking about.
 
A

Andrey Tarasevich

CBFalconer said:
Well, just as easily you can get into a situation where 'size_t'
overflows and a dedicated application-specific type works. One
day one of those other coders decides to replace the built-in
array with a custom implementation of some "disjoint" array,
like 'deque', for example. Size of that container is no longer
limited by the range of 'size_t'.

All of which leads to easy generation of buffer overflows. What is
really needed is an index type to match the array, i.e. a
subrange. Unfortunately C doesn't have such a thing. Even an
enumeration won't work. An enhancement of typedef would allow
stronger typing in future, but probably won't happen. I.E. let
typedef create a real type:

typedef 0U ... 10U myindex; /* reusing the ... varargs token */

T arraything[tmax(myindex)];

for (myindex i = 0; i < tmax(myindex); i++) ...

using tmax (and tmin) as operators, similar to sizeof.

crossed to comp.std.c in case it stirs some thoughts.
...

That's true, but again, at application level the issue of choosing the
index type for an array is not really related to arrays in any way,
hoiwever strange it might sound. In other words, at application level
the original issue does not exist at all. The very fact that someone is
asking about the choice of type for array index already indicates that
there's a more generic problem with the way that someone designs its
code or with his way of thinking about the issue. Let me explain once again.

In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing). It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.

However, in specific (application-level) code the choice of index type
immediately follows from the choice of the corresponding "quantity"
type. Whenever there is a need to store an application-specific
"quantity" in the program (number of cars on the lot, number of
employees in the company, number of lines in the file etc.) there's an
issue of choosing a type to represent that quantity. Assuming that we
are choosing from the set of built-in types, there's will always be a
limitation on the maximum quantity we can handle in the program. This
limitation will become a part of the program's specification. It is the
program's responsibility to observe that this part of the specification
is met. Otherwise, the chosen quantity type will overflow with
disastrous results. Your suggestion concerning "range types" is directly
relevant to this particular issue - the issue of choosing (or creating)
a type for representing an application specific "quantity" (although the
need to watch for overflows will always be there, as long as we are
using a fixed-range type). Note, that so far I never mentioned any
arrays or any other containers. There might be no containers in the
program at all. The issue of choosing the "quantity" type, however,
still exists.

Now, let's assume that one needs an array in the program and needs to
choose the index type. The first question one should ask himself is what
this array is going to store, what is the number of elements in that
array and (!) what type is _currently_ used to represent that quantity.
Note, that by the time one needs an array the choice of the "quantity"
type has _already_ been made. _That_ type is the type that should be
used for array indexing, period. Someone here suggested that if array
index type might overflow, if it is not 'size_t'. Wrong. As long as the
same type is used to represent the number of elements in the array and
the index (regardless of the concrete type, could be 'unsigned char' for
example) no overflow can ever occur. More precisely, at that stage
there's no issue of index type overflow. What _can_ overflow is the
originally chosen "quantity" type, but this issue has absolutely nothing
to do with array indexing. The overflow will happen long before one even
gets to any array indexing. The only thing that overflow will mean is
that the author of the code made a bad choice of "quantity" type (not
index type) or failed to enforce the program specification.

The suggestion to _unconditionally_ use 'size_t' for array indexing
because it "can never overflow" is one of those "fake wisdoms" that
float around the net in great numbers. It is as correct in theory as it
is useless and irrelevant in practice. Something from the area of "Never
use division, because there's a danger of dividing something by zero".

As a disclaimer: I said that many times already, but just in case, for
"I'm not a reader, I'm a writer" type of posters - 'size_t' is a correct
choice of index type for generic non-negative array indexing. Generic
array processing functions should always use this particular type for
representing array sizes and array indices (note, BTW, that in this case
the latter follows form the former too, i.e. the index type is
determined by the corresponding "quantity" type).
 
P

pete

Andrey Tarasevich wrote:
In generic (library-level) code, i.e. code that works with generic
arrays, the choice of the index type is obvious - it's 'size_t'
(assuming that we are not considering negative indexing).
It looks like
nobody is arguing with that. 'size_t' is also the correct choice of
index type for indexing strings for pretty much the same reasons.

What type do you like for counting list nodes?
 
A

Andrey Tarasevich

pete said:
...

What type do you like for counting list nodes?
...

At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.

In application-level code the choice is dictated by the application
area, just like with arrays.
 
K

Keith Thompson

Andrey Tarasevich said:
At what level?

In generic library-level code I'd use 'unsigned long'. Or, maybe, I'd
provide the client with a compile-time choice between 'unsigned long'
and 'unsigned', with the default setting being 'unsigned long'.

And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)

Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.

On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.
 
A

Andrey Tarasevich

Keith said:
And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)

If faced with a task of implementing a portable generic list library I'd
still never use 'size_t' to represent list element count. Platforms that
have 'size_t's range narrower than that of 'unsigned long' are also
possible (moreover, I think they are far less exotic than the one you
mention). Range of the type 'size_t' is too "volatile" to be used for
this purpose. And the conceptual reason for that "volatility" is the one
I already mentioned - 'size_t' is directly related to the concept of
"object size" but completely unrelated to the concept of "object count".

Also, consider the applications that would use this library. What are
they going to use to represent application-specific quantities on the
platform you describe? If they use 'int' or 'long' or 'long long' types,
then the issue doesn't exist - they should simply never have lists that
are longer than the ranges of those types. Otherwise, I simply don't see
any good solution for this situation (besides the one described in the
next paragraph).

In situations when providing the source code of the library to the
client is an option, a possible solution would be to implement it in
template-like manner, i.e. make the source code to depend on an
externally '#define'd parameter and give the client the responsibility
to '#define' it and "instantiate" the code with whatever count/index
type he/she prefers. This is, of course, not always an option.
Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.

I don't really see what guarantees you are talking about. 'size_t's
range can easily be narrower than that of, for example, 'unsigned long'.
There's no guarantees against that.
On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.

Surprised? Why? This is a typical situation for segmented-memory
platforms. Many of those not-so-old 16 bit platforms possessed this
property. Most (if not all) C compilers on DOS or Win16 platform were
like that. 'size_t' had 0-65535 range, but there was absolutely no
problem in allocating more than 65535 objects.
 
C

Christian Bau

Keith Thompson said:
And what if the system has a 32-bit "unsigned long" and a 64-bit
size_t? This is possible in C99 (though a number of people have
argued that it shouldn't be). On a 64-bit system, a large linked list
could conceivably have more than 2**32 nodes. (More realistically, a
data structure that allows access faster than O(N) could have more
than 2**32 nodes; you wouldn't want to traverse a list that long.)

Even in generic library-level code, I'd probably use size_t, because
it provides some guarantees that unsigned long doesn't.

On the other hand, even size_t might not be sufficient. It can
represent the size of any single object, but it may not be able to
represent a count of distinct objects. I'd be surprised to see a
system that can allocate more than SIZE_MAX objects, but the standard
doesn't forbid it.

Consider old 8086 systems, where the size of on object was restricted to
64KB, so size_t = 16 bit, but the total size of all objects was
restricted to 1 MB. Should be quite possible to allocate more than 65535
very small objects in that situation. (80286 systems might have had
limits of 65535 bytes per object and 16 MB total, but I am not sure
about that).
 
K

Keith Thompson

Christian Bau said:
Consider old 8086 systems, where the size of on object was restricted to
64KB, so size_t = 16 bit, but the total size of all objects was
restricted to 1 MB. Should be quite possible to allocate more than 65535
very small objects in that situation. (80286 systems might have had
limits of 65535 bytes per object and 16 MB total, but I am not sure
about that).

Ok, so color me surprised (and yes, I should have thought of that).

How about using a nest of #ifs to create a typedef "count_t" that's an
alias for the first of uintmax_t, unsigned long long, or unsigned long
that exists? (The second test caters to C90 implementations with
extensions, the third to unenhanced C90 implementations.) This should
get you the biggest available unsigned integer type. It might be
bigger than you need, but unless you're going to create huge arrays of
them, or unless it imposes a serious performance hit, that shouldn't
be too much of a problem.

Whatever count type you use, you should probably define it in a single
location, so even if users can't easily customize it, future
maintainers can.
 
C

Chris Croughton

How about using a nest of #ifs to create a typedef "count_t" that's an
alias for the first of uintmax_t, unsigned long long, or unsigned long
that exists? (The second test caters to C90 implementations with
extensions, the third to unenhanced C90 implementations.) This should
get you the biggest available unsigned integer type. It might be
bigger than you need, but unless you're going to create huge arrays of
them, or unless it imposes a serious performance hit, that shouldn't
be too much of a problem.

That's basically what I do, except that I do it with a header file, and
that file is created by a program which includes limits.h for that
system (also includes float.h and spits out some of the limits as
#defines, because C89 says "Of the values in the <float.h> header,
FLT_RADIX shall be a constant expression suitable for use in #if
preprocessing directives; all other values need not be constant
expressions" which makes them not suitable to do such type
determination). (The header file could of course be edited by hand if
it is not possible to run that program on the target or with the target
header files to generate the header.)
Whatever count type you use, you should probably define it in a single
location, so even if users can't easily customize it, future
maintainers can.

That should be true for a lot of implementation determined things (like
byte order where that matters, sizes of system specific things like
filesnames, etc.).

Chris C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,429
Latest member
JacelynKit

Latest Threads

Top