size_t problems

M

Martin Wells

I cannot find any requirement in the standard regarding the minimum size
of size_t. It is simply an "unsigned integer type".


Usually I'm a stickler for portability, but in this case I think I
jumped the gun with my assumptions about size_t. If the Standard
doesn't make any guarantees/restrictions then I suppose we're left
with some_size_t_variable != (size_t)-1.

Martin
 
B

Ben Pfaff

Martin Wells said:
Usually I'm a stickler for portability, but in this case I think I
jumped the gun with my assumptions about size_t. If the Standard
doesn't make any guarantees/restrictions then I suppose we're left
with some_size_t_variable != (size_t)-1.

I'd use SIZE_MAX, to make the code unarguably correct and easier
to read at the same time. If you don't have <stdint.h>, you can
always just "#define SIZE_MAX ((size_t) -1)" yourself.
 
A

A N Other

In all cases, caches or not, i/o of 20MB is faster than i/o of 80MB.

It *could* be that disk caches go to 80MB and then it would be the same.
Not with main memory though. Main memory caches go into the MB range at
most, and L1 caches are even smaller.

Besides there are other effects to take into consideration.
Normally, machines load data from memory into a cache line
of 32 or 64 bytes, i.e. you read 32 bytes at a time.

Using sparingly memory, for instance making
your structure fit into 32 bytes, will make it load in a single
read operation into the L1 cache. Bigger structures take MUCH more
time since 2 or more reads are necessary...

You have no clue what you are talking about... there are many other
considerations than the size in bytes of a struct when you're thinking
at the level of hitting or missing cache.
The general point I want to make is that we have to use the correct type
for the data and situation we are working with.

It would be a bad situation where we would have just
"int with 64 bits"
and we would use int for characters, shorts, etc etc, wasting
an enormous space for nothing at each structure we would use.

This sort of absolute statement is complete balderdash. Once again you
completely fail to understand the complex interaction of various
tradeoffs in a modern system.
True, everything would be much simpler (this is the reason why
Malcolm advocates this I think), but the lost of flexibility would
be enormous.

Besides, in small machines, C would not even run: it wouldn't have
enough RAM to load the program!

A C program runs quite happily in my toaster. What do you mean by a
"small machine"?
 
J

jacob navia

A said:
You have no clue what you are talking about...

Arrogance anyone?

Can't you put your arguments without trying to demean the other
person?

Or you *need* this kind of useless polemic?
there are many other
considerations than the size in bytes of a struct when you're thinking
at the level of hitting or missing cache.

When we are speaking of cache, the size *is* a fundamental
parameter. If my structure makes 1MB it will not fit in the
cache sorry. And yes, there are many OTHER considerations,
like way of accessing structures in an array, locality
and many others, but size is surely an important one!
This sort of absolute statement is complete balderdash. Once again you
completely fail to understand the complex interaction of various
tradeoffs in a modern system.

With that empty sentence you just say:

"I do not agree with you."

No arguments are proposed. You mention "complex interactions"
"various trade offs" without naming a single one, or why they
are relevant to this discussion.

You disagree that using a singl size of 64 bits would be wasteful?

Nice. Please explain me why.

A C program runs quite happily in my toaster. What do you mean by a
"small machine"?

Can't you read?
If we use 64 bits everywhere programs would take such an amount of
memory that they would not run in small machines.

Many people here have seen that I am attacked for each word I say.

And they start mobbing around, like a group of dogs that smells blood.

You are a typical example: Just polemic, you haven't advanced a
single argument, but you feel entitled to

"You have no clue what you are talking about"

This way, you advance in the hierarchy, and become another member
of the "inner group", those that are stronger than the isolated
frenchie guy anyway...
 
J

Jean-Marc Bourguet

Peter J. Holzer said:
I also consider such an implementation very strange and probably wouldn't
care about in production code (although I might add an assertion to
document (and check) the assumption).

If memory serve, the "huge mode" of 8086 compilers had this characteristic
(int was 16 bits, huge mode allowed for objects of more than 65536 bytes).

Yours,
 
M

Malcolm McLean

jacob navia said:
A N Other wrote:
When we are speaking of cache, the size *is* a fundamental
parameter. If my structure makes 1MB it will not fit in the
cache sorry. And yes, there are many OTHER considerations,
like way of accessing structures in an array, locality
and many others, but size is surely an important one!
I am prepared to accept that cache usage would be improved with smaller
integers. However size_t has to be 64 bits anyway, and size_t is such a
common type, if code is written properly, that the savings from having 64
bit ints would be quite small.

However the problem we face isn't usually that code executes too slowly. It
is that units of code don't talk to each other properly. That's why most
projects go over budget, or don't get completed at all.
 
A

Army1987

I also consider such an implementation very strange and probably
wouldn't care about in production code (although I might add an
assertion to document (and check) the assumption).



I cannot find any requirement in the standard regarding the minimum size
of size_t. It is simply an "unsigned integer type".

I find no plausible scenario where this would make sense for a hosted
implementation, but it could happen for some embedded devices. Consider
a CPU with a word size (integer registers and address bus) of 14 bits.
That's not enough for an int, so an int must use a double word (28
bits). But it's enough for a size_t, since no object can be larger than
SIZE_MAX is guaranteed to be at least 0xFFFF.
16383 bytes. So CHAR_BIT could be 14, sizeof size_t == 1, sizeof short
== sizeof int == 2, and sizeof long == 3.

I was remembering that the standard forbid size_t to have a lower
integer conversion rank than int, but probably I just dreamt of
it, since I can't find it anymore anywhere.
 
A

Army1987

The third option is to restrict the object size to half the
theoretically possible value, i.e. 32767 bytes with 16 bit size_t,
2147483647 with 32 bit size_t and 9223372036854775807 bytes with 64 bit
size_t.

In a defect report, it was suggested to either require size_t to
be large enough to be able to contain the size of any supported
object, or to specify what happens if sizeof is applied on an
object larger than SIZE_MAX bytes, or to explicitly state that the
behavior is undefined in such a case. The committee's answer was
more-or-less "any program using any object larger than 64KB is not
strictly conforming anyway, so we don't give a damn".
 
K

Keith Thompson

Army1987 said:
I was remembering that the standard forbid size_t to have a lower
integer conversion rank than int, but probably I just dreamt of
it, since I can't find it anymore anywhere.

N1124 recommends (but doesn't require) that size_t and ptrdiff_t
shouldn't have an integer conversion rank greater than that of signed
long. (C99 didn't have this wording; it was added by one of the TCs.)
Perhaps that's what you were thinking of?
 
F

Flash Gordon

Malcolm McLean wrote, On 06/09/07 21:03:
I am prepared to accept that cache usage would be improved with smaller
integers.
Good.

However size_t has to be 64 bits anyway, and size_t is such a
common type, if code is written properly, that the savings from having
64 bit ints would be quite small.

Maybe in your applications, but as has been pointed out that does not
apply to vast number of applications.
However the problem we face isn't usually that code executes too slowly.

You might not, but I have to deal with hundreds of users on a single
server and vast amounts of data. Just like lots of other people.
It is that units of code don't talk to each other properly.

It may cause you problems, I find it very easy both with my own SW and
3rd party SW I link to or otherwise communicate with.
That's why
most projects go over budget, or don't get completed at all.

It has *never* stopped any project I have been involved with, and has
always had a far smaller impact than changing or poorly specified
requirements.

Now, seeing as conservatively your proposal will increase my companies
storage requirements by 10% (it is probably a lot more) and that could
easily increase our storage costs (we are replacing kit currently) by
5%, that will be another 3500UKP. Then if it ups the cost of RAM
required by servers by 50UKP, and we have to go to the next processor up
costing another 100UKP (we are already getting high end processors), for
our 10 servers that is another 5000UKP. Will you give us that 5000UKP to
cover the increased costs you want to force on us? We are only a small
company, our customers will find there costs increased by rather more.
That is assuming your are write that it will have only a small impact on
performance, if everyone else is write you will have to give us a lot
more money!
 
M

Martin Wells

Army:
SIZE_MAX is guaranteed to be at least 0xFFFF.

So that means size_t must have at least 16 value representation bits.
But then you could have a machine with 32-bit unsigned int's, and
something like:

typedef short unsigned size_t;

, which still makes our "!= -1" code buggy.

Martin
 
A

A N Other

Arrogance anyone?

Can't you put your arguments without trying to demean the other
person?

Or you *need* this kind of useless polemic?

Nonsense! It was a plain statement of fact in response to a provocative
and stupid assertion.
With that empty sentence you just say:

"I do not agree with you."

No arguments are proposed. You mention "complex interactions"
"various trade offs" without naming a single one, or why they
are relevant to this discussion.

You disagree that using a singl size of 64 bits would be wasteful?

Nice. Please explain me why.

Maybe I don't care about the extra memory needed to store 64-bit types -
that's one of the tradeoffs I can make.

Your argument that using 64-bit types, even on a machine with 64-bit
word size, will be slower because there will be more cache misses is
utterly ridiculous. Memory access patterns, not just (or even primarily)
the size of the type, are involved in determining cache performance.
Many people here have seen that I am attacked for each word I say.

And they start mobbing around, like a group of dogs that smells blood.

You are a typical example: Just polemic, you haven't advanced a
single argument, but you feel entitled to

"You have no clue what you are talking about"

This way, you advance in the hierarchy, and become another member
of the "inner group", those that are stronger than the isolated
frenchie guy anyway...

You sound like you're paranoid and deluded. Looking back through the
posts on this group, I'd say if anyone is baying for blood then it is
you, with your constant unprovoked attacks on Mr. Heathfield - and
typically in those threads as in this one, you embarrass yourself with
your displays of ignorance about C, computers, and basic politeness.
 
C

CBFalconer

Army1987 said:
In a defect report, it was suggested to either require size_t to
be large enough to be able to contain the size of any supported
object, or to specify what happens if sizeof is applied on an
object larger than SIZE_MAX bytes, or to explicitly state that
the behavior is undefined in such a case. The committee's answer
was more-or-less "any program using any object larger than 64KB
is not strictly conforming anyway, so we don't give a damn".

Where did you drag this bit of foolishness from? Pure nonsense,
obviously so since the purpose of size_t is to handle ANY existing
object size.
 
K

Keith Thompson

CBFalconer said:
Where did you drag this bit of foolishness from? Pure nonsense,
obviously so since the purpose of size_t is to handle ANY existing
object size.

It may be the intent that size_t can hold the size of any object, but
it doesn't quite say so; what it says is that size_t is "the unsigned
integer type of the result of the sizeof operator" (C99 7.17p2).

One issue that was discussed at length a while ago (and I don't
propose to reopen it) is that calloc() could conceivably allow the
creation of an object bigger than SIZE_MAX bytes (though an
implementation can avoid that issue by rejecting any such request).

Another possible issue:

char really_big[SIZE_MAX][2];
sizeof really_big;

Certainly an implementation may reject the declaration of
'really_big', but must it do so? If the declaration is accepted I
believe that evaluating 'sizeof really_big' invokes undefined
behavior. What about a program that declares 'really_big' but never
applies 'sizeof' to it?
 
P

pete

Martin said:
Army:


So that means size_t must have at least 16 value representation bits.
But then you could have a machine with 32-bit unsigned int's, and
something like:

typedef short unsigned size_t;

, which still makes our "!= -1" code buggy.

You're right. It should have a cast.
 
P

Peter J. Holzer

SIZE_MAX is guaranteed to be at least 0xFFFF.

Right. I was thinking of

| 5.2.4.1 Translation limits
| The implementation shall be able to translate and execute at least one
| program that contains at least one instance of every one of the following
| limits:13)
[...]
| — 65535 bytes in an object (in a hosted environment only)

But 7.18.3 contains no similar qualification, so a size_t must be at
least 16 bits, even if no object can actually be that large on that
implementation.

hp
 
P

Peter J. Holzer

If memory serve, the "huge mode" of 8086 compilers had this characteristic
(int was 16 bits, huge mode allowed for objects of more than 65536 bytes).

That's the other way around: size_t is larger than int. Martins code
works well in this case. It fails if (for example) int is 32 bits, but
size_t is only 16 bits.

hp
 
C

CBFalconer

Peter J. Holzer said:
That's the other way around: size_t is larger than int. Martins
code works well in this case. It fails if (for example) int is
32 bits, but size_t is only 16 bits.

Why all this fuss constraining a for loop to something unnatural
and confusing. Try the simple straight-forward code:

j = 7;
do {
/* whatever */
++k;
} while (j--);
 
M

Malcolm McLean

Flash Gordon said:
Malcolm McLean wrote, On 06/09/07 21:03:
our 10 servers that is another 5000UKP. Will you give us that 5000UKP to
cover the increased costs you want to force on us? We are only a small
company, our customers will find there costs increased by rather more.
That is assuming your are write that it will have only a small impact on
performance, if everyone else is write you will have to give us a lot more
money!
Ten thousand pounds, or twenty thousand UD dollars, is by business standards
quite a small amount of money. If you are at all typical your costs are not
in the hardware, which in any case doubles in speed and capacity every
eighteen months or so, but in the software.
 
T

Tor Rustad

CBFalconer wrote:

[...]
Why all this fuss constraining a for loop to something unnatural
and confusing. Try the simple straight-forward code:

j = 7;
do {
/* whatever */
++k;
} while (j--);

The original loop I wrote was:

for (j=7; j>=0; j--,k++)
{
if ( (bitmap >> j) & 1 )
{
has_field[k] = TRUE;
}
else
{
has_field[k] = FALSE;
}
}

so your replacement reads:

j = 7;
do {
if ( (bitmap >> j) & 1 )
{
has_field[k] = TRUE;
}
else
{
has_field[k] = FALSE;
}
++k;
} while (j--);


To me, the first option is more readable by far. for(...) loops are
excellent for looping over an index, doing the same with while(..) and
do while(...) loops, gives me headache.

When knowing in advance, how many loops there will be, I "always" use
the for(...) loop.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top