what is size_t

C

candy_init

I sometimes comes across statements which invloves the use of
size_t.But I dont know exactly that what is the meaning of size_t.What
I know about it is that it is used to hide the platform details.I tried
to find its meaning in the header files but did'nt got a good answer.So
can somebody please tell me that what is the meaning of size_t and what
are its possible values?

Thanks
 
E

Eric Sosman

I sometimes comes across statements which invloves the use of
size_t.But I dont know exactly that what is the meaning of size_t.What
I know about it is that it is used to hide the platform details.I tried
to find its meaning in the header files but did'nt got a good answer.So
can somebody please tell me that what is the meaning of size_t and what
are its possible values?

`size_t' is a type suitable for representing the amount
of memory a data object requires, expressed in units of `char'.
It is an integer type (C cannot keep track of fractions of a
`char'), and it is unsigned (negative sizes make no sense).
It is the type of the result of the `sizeof' operator. It is
the type you pass to malloc() and friends to say how much
memory you want. It is the type returned by strlen() to say
how many "significant" characters are in a string.

Each implementation chooses a "real" type like `unsigned
int' or `unsigned long' (or perhaps something else) to be its
`size_t', depending on what makes the most sense. You don't
usually need to worry about what `size_t' looks like "under the
covers;" all you care about is that it is the "right" type for
representing object sizes.

The implementation "publishes" its own choice of `size_t'
in several of the Standard headers: <stdio.h>, <stdlib.h>,
and some others. If you examine one of these headers (most
implementations have some way of doing this), you are likely
to find something like

#ifndef __SIZE_T
#define __SIZE_T
typedef unsigned int size_t;
#endif

.... meaning that on this particular implementation `size_t' is
an `unsigned int'. Other implementations make other choices.
(The preprocessor stuff -- which needn't be in exactly the form
shown here -- ensures that your program will contain only one
`typedef' for `size_t' even if it includes several of the headers
that declare it.)

General guidance: If you want to express the size of something
or the number of characters in something, you should probably use
a `size_t' value to do so. Some people also hold that an array
index is a sort of "proxy" for a size, so `size_t' should be used
for array indices as well; I see merit in the argument but confess
that I usually disregard it.
 
M

Mike Wahler

Eric Sosman said:
General guidance: If you want to express the size of something
or the number of characters in something, you should probably use
a `size_t' value to do so. Some people also hold that an array
index is a sort of "proxy" for a size, so `size_t' should be used
for array indices as well; I see merit in the argument but confess
that I usually disregard it.

I'm one of those who recommend using 'size_t' for an array index,
because it will be able to represent any possible index value for
any possible size array on a given implementation.

-Mike
 
M

Malcolm

Mike Wahler said:
I'm one of those who recommend using 'size_t' for an array index,
because it will be able to represent any possible index value for
any possible size array on a given implementation.
You're right, but it uglifies code.

The other problem is that, assuming garbage values are random, I know that
50% of garbage integers passed to my routine will be negative. So an
"assert( N >= 0)" will have a very high chance of trapping garbage, if N is
declared as an int. Declare "N" as a size_t, and you cannot legitmately do
this test, only "sanity check". Sanity checks are pretty dangerous - who is
to say that in a few years time images of a million by a million pixels
won't be in routine use?
 
M

Mike Wahler

Malcolm said:
You're right, but it uglifies code.

How so?

Do you feel that:

size_t i = 0;

is somenow 'uglier' than e.g:

int i = 0;

I don't. I think the first form very clearly expresses
the intended usage of 'i'.
The other problem is that, assuming garbage values are random,

Eh? What garbage values? Prevent garbage by always initializing
your objects. Prevent overflow/underfow/div by zero, etc. by
thinking carefully when writing computational expressions.
I know that
50% of garbage integers passed to my routine will be negative.

No, you can't know that.
So an
"assert( N >= 0)" will have a very high chance of trapping garbage, if N is
declared as an int.

Doing that is far outside of any methodical or coherent way
to trap invalid data.
Declare "N" as a size_t, and you cannot legitmately do
this test,

The test itself is what's not legitimate.
only "sanity check". Sanity checks are pretty dangerous -

They can be, and they can also help, but only as a 'rough'
test, not conclusive.
who is
to say that in a few years time images of a million by a million pixels
won't be in routine use?

What's that got to do with anything?

-Mike
 
?

=?ISO-8859-1?Q?Bj=F8rn_Augestad?=

Mike said:
How so?

Do you feel that:

size_t i = 0;

is somenow 'uglier' than e.g:

int i = 0;

I don't. I think the first form very clearly expresses
the intended usage of 'i'.

I'm also one of those who use size_t wherever appropriate, not just
because it is correct, but also because it reduces the number of
warnings from lint-like programs.

size_t i = 0; is not ugly, the ugly part IMHO is the missing #include
directive needed to get a definition of the size_t type. size_t should
be defined by the language just like int and long, not by some header file.

<POSIX rant>
It gets even uglier if you use posix functions like read() and write()
and have to mix size_t with its signed cousin, ssize_t. Its pretty
stupid to have a function which accepts a size_t argument which value
cannot be greater than SSIZE_MAX. Kinda makes me miss K&R C and just use
int's.
</POSIX>

Bjørn
 
G

Greg Comeau

You're right, but it uglifies code.

If beauty is not in the eye of the beholder, then the argument
being made seems to be not that it uglifies code, but that the
code is ugly either way.
The other problem is that, assuming garbage values are random,
I know that 50% of garbage integers passed to my routine will
be negative.

Please don't ever use that as a general guide.
So an
"assert( N >= 0)" will have a very high chance of trapping garbage,

This tends to be a low level, and perhaps misplaced, trapping....
if N is declared as an int

....And orchestrated too then in that case.
Declare "N" as a size_t, and you cannot legitmately do
this test,

Some would argue that's the point.
only "sanity check". Sanity checks are pretty dangerous -

Let's assume this is true...
who is
to say that in a few years time images of a million by a million pixels
won't be in routine use?

.... If so, you're way is neither here nor there about it,
which makes this a red herring argument.

Notwithstanding that, it seem you are prescribing an
insane sanity check then.
 
M

Malcolm

Greg Comeau said:
Notwithstanding that, it seem you are prescribing an
insane sanity check then.
Consider this. I'm writing a function to create an image

IMAGE *create_image(size_t width, size_t height)
(The suggested form)
or
IMAGE *create_image(int width, int height)
(The pre-ANSI form).

Now my caller has allocated a list of image parameters, with malloc(),
thinks he has initialised them to inputs from a file, but in fact due to a
bug in his routine only the first set of parameters are initialised, the
others are set to whatever malloc() happened to return. Happens all the
time.

So in my second function, I write

assert(width >= 0);
assert(height >= 0);
(I might want to allow zero_dimension images).

The caller is calling the function many times with gargbage functions. We
have to be very unlucky for the assert() not to trigger and tell him what he
has done.

In the first function, width and height are size_t. So the test won't work.
No probs, because the function will still be called with huge garbage
values.

So I can write
assert(width <= 8000)

because an 8000 * 8000 image isn't going to fit in memory. Values of that
size must be corrupt, this is "sanity checking".

However now we have several problems. 8000 is a reasonable value for my
particular machine, but do I know, for instance, that the routine won't be
used on some high-end machine that processes massive images?
The second thing is that I now have to document the behaviour. My caller is
an intelligent man who knows that he can expect bad things to happen if he
tries to create an image with negative dimensions. He might also guess that
there is a limit on image size, but he cannot be expected to know that it is
8000. So I've got to put in a little note saying "dimensions must be 8000 or
less".
Or I could just omit the sanity check and let the allocation routine run out
of memory, in which case caller will waste time wondering whether the values
are wrong or the machine in low on memory.

So these are not huge issues, but we've got something that is slightly less
friendly and easy to use than we had before. The really important point is
that the cumulative effect of such little annoyances is significant in terms
of code quality and reliability.
 
S

S.Tobias

Malcolm said:
IMAGE *create_image(size_t width, size_t height)
(The suggested form)
or
IMAGE *create_image(int width, int height)
(The pre-ANSI form).
[snip]

So in my second function, I write
assert(width >= 0);
assert(height >= 0);
(I might want to allow zero_dimension images).

I don't see how that would be *not* equivalent to writing:

assert(width < SIZE_MAX / 2);

in the first function. In this case it happens that whatever value
you get, it is "correct". You have to control the image size anyway,
in this case it would be something similar to:

assert(width < MAX_WIDTH);

whereas in the second case it must be:

assert(width >= 0 && width < MAX_WIDTH);

I just can't see advantage of signed arguments; the amount of work is
the same as in the unsigned case, and _additionally_ you have to
take care for the negative values (ie. fight the problems that you
have created yourself).
 
M

Malcolm

S.Tobias said:
I don't see how that would be *not* equivalent to writing:

assert(width < SIZE_MAX / 2);

in the first function.

I just can't see advantage of signed arguments; the amount of work is
the same as in the unsigned case, and _additionally_ you have to
take care for the negative values (ie. fight the problems that you
have created yourself).
The mistake you're making is to forget that the calling programmer is a
human.
What you are saying is that it is possible to trap exactly the same set of
inputs by using some expression to test the size_t argument.

But the int argument is self-documenting. Everyone knows that trying to
create an image of negative dimensions is illegal. It is also probably true
that horrible things will happen if width * height overflows the size of a
size_t, but that check is harder to put in. But it is not inherently illegal
to create a huge image.

Specifically, use of your assert is beginning to dig a hole for yourself.
Why, the calling programming might ask, is width constrained to be less than
some expression?

You are not "creating problems for yourself" by declaring create_image to
take an integer, and thus opening the possibility of being passed negative
argument. The problem is the calling programmer's and he is passing garbage
to your function. If you can recognise it as garbage, you've done him a
favour.

I'll give you another poser. How would you write the following set of
functions?

/*
Create an image set to black
*/
IMAGE *create_image(mystery_t width, mystery_t height);
/*
set a pixel (COLOUR is a type defined elsewhere that holds a colour value)
Out-of-bounds values to be rejected.
*/
void set_pixel(IMAGE *image, mystery-t x, mystery_t y, COLOUR col);
/*
draw a circle, parts outside the image to be clipped.
*/
void circle(IMAGE *image, mystery_t x, mystery_t y, mystery_t r, COLOUR
col);

What would you use for mystery_t, in each case?
 
M

Michael Mair

Malcolm said:
The mistake you're making is to forget that the calling programmer is a
human.
What you are saying is that it is possible to trap exactly the same set of
inputs by using some expression to test the size_t argument.

But the int argument is self-documenting. Everyone knows that trying to
create an image of negative dimensions is illegal. It is also probably true
that horrible things will happen if width * height overflows the size of a
size_t, but that check is harder to put in. But it is not inherently illegal
to create a huge image.

Specifically, use of your assert is beginning to dig a hole for yourself.
Why, the calling programming might ask, is width constrained to be less than
some expression?

You are not "creating problems for yourself" by declaring create_image to
take an integer, and thus opening the possibility of being passed negative
argument. The problem is the calling programmer's and he is passing garbage
to your function. If you can recognise it as garbage, you've done him a
favour.

I'll give you another poser. How would you write the following set of
functions?

/*
Create an image set to black
*/
IMAGE *create_image(mystery_t width, mystery_t height);

No problem here with size_t or int.
/*
set a pixel (COLOUR is a type defined elsewhere that holds a colour value)
Out-of-bounds values to be rejected.
*/
void set_pixel(IMAGE *image, mystery-t x, mystery_t y, COLOUR col);

If I use size_t, I can replace checks for >=0 and <=width by
one check for <=width, analogously for height.
/*
draw a circle, parts outside the image to be clipped.
*/
void circle(IMAGE *image, mystery_t x, mystery_t y, mystery_t r, COLOUR
col);
dito.

What would you use for mystery_t, in each case?

size_t, on all counts.

The only critical part is create_image; here we have to put a comment
at the check against SIZE_MAX/2.
Checks against too large image size are as easy as in int and we have
more ways of doing it, e.g.
not only INT_MAX/width<height or SIZE_MAX/width<height but also
(width*height)/height!=width


Cheers
Michael
 
M

Malcolm

"Michael Mair"
No problem here with size_t or int.


If I use size_t, I can replace checks for >=0 and <=width by
one check for <=width, analogously for height.


size_t, on all counts.
The question was, of course, designed so that there is a problem with that
answer.
The only critical part is create_image; here we have to put a comment
at the check against SIZE_MAX/2.
Checks against too large image size are as easy as in int and we have
more ways of doing it, e.g.
not only INT_MAX/width<height or SIZE_MAX/width<height but also
(width*height)/height!=width
Actually you probably need more than one byte per pixel. However with a bit
of care you can come up with a better "sanity check" than testing against
8000. This supposes of course that you are allocating the image in one
block, which is what I would do today, but not in the old 286 memory model
days. The point was never that use of size_t, by itself, will instantly
create a totally unworkable and un manageable disaster, but that it
introduces little niggly difficulties that have a cumulative effect of
making code harder to maintain.
 
R

Randy Howard

I'm also one of those who use size_t wherever appropriate, not just
because it is correct, but also because it reduces the number of
warnings from lint-like programs.

I like to use it as well in the appropriate places.
size_t i = 0; is not ugly, the ugly part IMHO is the missing #include
directive needed to get a definition of the size_t type.

Well, having to worry about casting to unsigned long or something
else appropriate for size_t variables in printf() (without %z, C99) is
what bothers me most about it aesthetically.
size_t should be defined by the language just like int and long,
not by some header file.

Good point.
 
L

Lawrence Kirby

Malcolm said:
IMAGE *create_image(size_t width, size_t height)
(The suggested form)
or
IMAGE *create_image(int width, int height)
(The pre-ANSI form).
[snip]

So in my second function, I write
assert(width >= 0);
assert(height >= 0);
(I might want to allow zero_dimension images).

I don't see how that would be *not* equivalent to writing:

assert(width < SIZE_MAX / 2);

Make that assert(width <= SIZE_MAX/2); and I'd probably agree with you
subject to a couple of notes

1. size_t doesn't have to have the same size as int

2. even if it does have the same size it doesn't have to have double
(roughly) the range, although typically that is the case.
in the first function. In this case it happens that whatever value you
get, it is "correct". You have to control the image size anyway, in
this case it would be something similar to:

assert(width < MAX_WIDTH);

Even if you don't do this directly the chances are that something eise
will report a failure for an oversized image, e.g. memory allocation.

Lawrence
 
M

Malcolm

Lawrence Kirby said:
Even if you don't do this directly the chances are that something eise
will report a failure for an oversized image, e.g. memory allocation.
What you want to happen is for the function to return an out-of-memory
condition if you try to allocate an enormous image (which request may well
be legitimate, if you design posters or something). You want it to assert
fail on invalid parameters if you try to pass it garbage.
My point was that by using ints as parameters, you have a free garbage
detector, because negative values have to be garbage. Using an unsigned
type, you never know whether the request is legitimate or not.

However a naive programmer might try to malloc(width * height *
sizeof(pixel)), not realising that if width * height overflows then he may
ask for the wrong amount of memory, and maybe the function will appear to
succeed when in fact it has failed.

So a more sophisticated "sanity check" is actually a good idea, for this
particular function. The general observation however remains valid; "int"
allows a self-documenting check for garnbage, whilst size_t doesn't.
 
M

Mike Wahler

Malcolm said:
What you want to happen is for the function to return an out-of-memory
condition if you try to allocate an enormous image (which request may well
be legitimate, if you design posters or something). You want it to assert
fail on invalid parameters if you try to pass it garbage.
My point was that by using ints as parameters, you have a free garbage
detector, because negative values have to be garbage. Using an unsigned
type, you never know whether the request is legitimate or not.

Using a signed integer, overflow will give undefined behavior.
Using an unsigned integer, overflow gives well-defined behavior,
but an incorrect value. Which is easier to detect?
However a naive programmer might try to malloc(width * height *
sizeof(pixel)), not realising that if width * height overflows then he may
ask for the wrong amount of memory, and maybe the function will appear to
succeed when in fact it has failed.

It will have succeeded in performing what was requested of it.
If the request was wrong, it's the coder's fault. Choosing
signed over unsigned can't prevent it.
So a more sophisticated "sanity check" is actually a good idea, for this
particular function. The general observation however remains valid; "int"
allows a self-documenting check for garnbage,

It allows the possiblity of undefined behavior.
whilst size_t doesn't.

It always has well-defined behavior. (And can represent the
size of any object. No other type provides this guarantee).

-Mike
 
M

Malcolm

Mike Wahler said:
Using a signed integer, overflow will give undefined behavior.
Using an unsigned integer, overflow gives well-defined behavior,
but an incorrect value. Which is easier to detect?
In this particular example, we probably want to call malloc() with width *
height to create the pixels for our image. So any values of width * height
that overflow SIZE_MAX are potential problems.
And because of the way that ANSI have defined the behaviour of signed and
unsigned types, it is actually easier to do this using unsigned rather than
signed arithmetic, so you have a point, in this particular case.
However if we were to use a different allocation scheme internally, then the
point would no longer hold.

Also, if create_image() is called with huge parameters, there are two
possibilities. Either they have been entered by a human who genuinely wants
a huge image for some reason, or they are corrupt values (eg random memory).
As a humans we know that if the function is called with a demand for an
image 1000000 by 1000000 pixels then it is impossible that such round vlaues
could have arisen by chance, and it must be someone wanting a giant image.
Such a person doesn't want an assertion fail, or to be told that his
parameters are invalid, because an image of a milion pixels square is
clearly a logical possibility. He wants to be told "sorry, the computer does
not have enough memory to fulfil your request".
However there is no way a computer can distinguish such a call from a
request for 1352678 by 2511044 pixels, which is typical garbage.
It will have succeeded in performing what was requested of it.
If the request was wrong, it's the coder's fault. Choosing
signed over unsigned can't prevent it.
It's a bug in the function. A huge value could overflow to a small value, so
the call to malloc() succeeds, and UB when you try to access the pixels.
It allows the possiblity of undefined behavior.
whilst size_t doesn't.
If we call malloc(width * height) with huge values, then if width and height
are ints then this is UB. In this case UB is actually good, because it means
the computer is allowed to perform correct behaviour (terminating the
program with an error message). Defined wrong behaviour is far more
dangerous than UB.
It always has well-defined behavior. (And can represent the
size of any object. No other type provides this guarantee).
This is the problem. In my opinion ANSI have dug C into a hole with size_t.
In a narrow technical sense they are right - malloc() can legitmately be
called with a request for more memory than will fit in INT_MAX, so let's
have a special type. But then that means that strings can be longer than
INT_MAX as well, so strlen() has to return a size_t. Then if strings can be
longer than INT_MAX, then an index into a character array must be size_t as
well, And in fact it applies to all objects, so if we represent the number
of employees in a payroll function by an int that is wrong as well, strictly
it must be size_t. So without really realising they were doing it, ANSI made
a fundamental change to the language. And this is when more modern language
like Java have done away with unsigned types altogether, because of the
problems they cause.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,414
Latest member
GayleWedel

Latest Threads

Top