size_t problems

C

Chris Torek

Chris:
Furthermore, the easy-ness of portability varies with the goal of
the code. Clearly, something like "calculate mortgage payments"
or "concatenate files specified by argv[] elements" is going to be
easier than "write an operating system" or "do bitmapped graphics":
the latter two *require* at least *some* non-portable code.

Let's say for instance that on a particular platform, that an
"unsigned int" contains padding bits (or whatever they call those bits
that don't part-take in indicating what the number is). This could
possibly throw a big spanner in the works for playing around with
bitmaps.

However, it's still not impossible to achieve what you want if you
play around with arrays of "unsigned char". Indeed, the code might be
ugly, but it's definitely possible. And probably fun to write too.

Indeed; and this would be completely portable, up to -- but not
including -- the step where you draw the results on your fancy
bitmapped color display. :)

Often we (by "we" I mean "I and other people I work with") make
deliberate trades of "giving up some degree of portability for some
degree of performance". Continuing with the above example, we
might work with "unsigned int"s to handle 16 or 32 or 64 bits at
a time (however many are in "unsigned int") instead of restricting
ourselves to "unsigned char". I think it is important to be aware
that one is making such a trade-off, though.
 
C

Chris Dollin

Kelsey said:
Any technical book can contain errors. Any can contain mistakes.

In the last two weeks, I discovered a ... bugette ... an
appendix to a software development book, viz, in the description
of converting a regular expression into a state machine, it
could generate multiple redundant states under a plausible reading
of the algorithm.

The bug has been sitting there for thirteen years. So far as I
know, no-one has ever informed the author, which makes me suspect
that either no-one ever used that algorithm, or if they did, they
also saw the fix.

It would have been nice to have noticed it fourteen years ago,
though.
 
R

Richard Bos

Malcolm McLean said:
Yes.
It the standards problem. As long as every nut will fit every bolt,
everything is simple. Engineer one says "give me a nut", engineer two
supplies it.

Once you allow for more than one standard, there is always trouble.
Engineer one says "give me a quarter inch nut". Engineer two "I can't, I've
only got a nut-making machine that does centimetres". Engineer one, "oh
never mind, I just need to fiddle with this design to make the holes
centimetres rather than quarter inch. Hardly matters. Back to you tomorrow".

And your solution to this problem is to use oil tanker-sized nuts on
office chairs, or office chair-sized nuts on oil tankers. Frankly, I
think you're nuts.

Richard
 
J

jacob navia

Richard said:
And your solution to this problem is to use oil tanker-sized nuts on
office chairs, or office chair-sized nuts on oil tankers. Frankly, I
think you're nuts.

Richard

This is exactly the problem Malcolm. Your "one size fits all" is
impracticable in the real world: It produces a bloat in all data
structures that do not need 64 bits but can use 16 or even 8.

Note that the ages of all humans in the planet fit into an unsigned
char. There is no need to use 64 when 8 will do.
 
M

Martin Wells

jacob:
This is exactly the problem Malcolm. Your "one size fits all" is
impracticable in the real world: It produces a bloat in all data
structures that do not need 64 bits but can use 16 or even 8.


Oh God I can see where this is going...

Note that the ages of all humans in the planet fit into an unsigned
char. There is no need to use 64 when 8 will do


For a compiler writer, you seem to know very little about efficiency.

I myself NEVER use anything smaller than an "int" or "unsigned".
Never. Unless memory consumption is a BIG deal.

The C Standard says that "int" and "unsigned" will be the "natural"
integer types, the quickest ones.

If the Standard Library functions didn't use "char" for strings then
I'd probably use arrays of int or unsigned. That's of course assuming
that I'm not low on memory.

The beauty of the C Standard when it comes to integer types is that
not only are they portable, but they turn out as efficient as possible
on the next platform too.

Only if you need a really big number or a really small negative
number, should you resort to anything bigger than "int" or "unsigned".

Only if memory consumption is a big deal should you resort to anything
smaller.

Of course, there's a few exceptions, like using char to play around
with bytes, or short to play around with small chunks of bytes.

Martin
 
J

jacob navia

Martin said:
jacob:



Oh God I can see where this is going...




For a compiler writer, you seem to know very little about efficiency.

Arrogance anyone?
I myself NEVER use anything smaller than an "int" or "unsigned".
Never. Unless memory consumption is a BIG deal.

Yes. "Unless memory consumption is a BIG deal".

If I am doing statistical analysis of age correlated data for
20 million people, I would rather have it as 20MB than as 80MB.

Note that in current CPUs, memory access is one of the main limiting
factors in a program. Memory I/O can slow everything down
because the speed of RAM is snail paced compared to the CPU.

Right now, the ratio of main memory RAM to CPU clock speed is around 10
at least, i.e. doubling the amount of memory used can reduce the
speed of the program by half in some circumstances!
The C Standard says that "int" and "unsigned" will be the "natural"
integer types, the quickest ones.

This is no longer the case in 64 bit windows: the natural size is 64
bits but int is 32.
If the Standard Library functions didn't use "char" for strings then
I'd probably use arrays of int or unsigned. That's of course assuming
that I'm not low on memory.

I repeat: "That's of course assuming I'm not low on memory".

Using 64 bits or 32 bits for an 8 bit character would slow down
operations by a factor of 4 to a factor of 8!
The beauty of the C Standard when it comes to integer types is that
not only are they portable, but they turn out as efficient as possible
on the next platform too.

That is why I use the types adapted to the data they hold!

I was pointing out that "one size fits all" attitude would
produce waste of memory space!
Only if you need a really big number or a really small negative
number, should you resort to anything bigger than "int" or "unsigned".

That is what I was saying, but you are not strong at reading
something BEFORE you start mumbling something like:
 
R

Richard

Martin Wells said:
jacob:



Oh God I can see where this is going...




For a compiler writer, you seem to know very little about efficiency.

I myself NEVER use anything smaller than an "int" or "unsigned".
Never. Unless memory consumption is a BIG deal.

The C Standard says that "int" and "unsigned" will be the "natural"
integer types, the quickest ones.

Efficiency isn't all about "quickest". Most people dont give a fig if it
takes 11 minutes to 10.5 minutes.

What is endemic though is this complete disregard for optimal data
sizes. It's called bloatware. It's part of the reason Machines are about
1000 times faster than in 1990 yet take the same time bring up a word
processor.
If the Standard Library functions didn't use "char" for strings then
I'd probably use arrays of int or unsigned. That's of course assuming
that I'm not low on memory.

The beauty of the C Standard when it comes to integer types is that
not only are they portable, but they turn out as efficient as possible
on the next platform too.

Only if you need a really big number or a really small negative
number, should you resort to anything bigger than "int" or "unsigned".

Only if memory consumption is a big deal should you resort to anything
smaller.

C programmer should nearly always consider memory use IMO. It's one of
the things which gives C the edge.

I would be horrified if some idiot saved a million records to memory
using a size_t to represent the length in bytes of the records string
UID or somesuch.

Common sense.
 
R

Richard

Martin Wells said:
jacob:



Oh God I can see where this is going...




For a compiler writer, you seem to know very little about efficiency.

I myself NEVER use anything smaller than an "int" or "unsigned".
Never. Unless memory consumption is a BIG deal.

The C Standard says that "int" and "unsigned" will be the "natural"
integer types, the quickest ones.

Efficiency isn't all about "quickest". Most people dont give a fig if it
takes 11 minutes to 10.5 minutes.

What is endemic though is this complete disregard for optimal data
sizes. It's called bloatware. It's part of the reason Machines are about
1000 times faster than in 1990 yet take the same time bring up a word
processor.
If the Standard Library functions didn't use "char" for strings then
I'd probably use arrays of int or unsigned. That's of course assuming
that I'm not low on memory.

C programmer should nearly always consider memory use IMO. It's one of
the things which gives C the edge.

I would be horrified if some idiot saved a million records to memory
using a size_t to represent the length in bytes of the records string
UID or somesuch.

Common sense.
 
R

Richard

Martin Wells said:
jacob:



Oh God I can see where this is going...




For a compiler writer, you seem to know very little about efficiency.

I myself NEVER use anything smaller than an "int" or "unsigned".
Never. Unless memory consumption is a BIG deal.

The C Standard says that "int" and "unsigned" will be the "natural"
integer types, the quickest ones.

Efficiency isn't all about "quickest". Most people dont give a fig if it
takes 11 minutes to 10.5 minutes.

What is endemic though is this complete disregard for optimal data
sizes. It's called bloatware. It's part of the reason Machines are about
1000 times faster than in 1990 yet take the same time bring up a word
processor.
If the Standard Library functions didn't use "char" for strings then
I'd probably use arrays of int or unsigned. That's of course assuming
that I'm not low on memory.

C programmers should nearly always consider memory use IMO. It's one of
the things which gives C the edge.

I would be horrified if some idiot saved a million records to memory
using a size_t to represent the length in bytes of the records string
UID or somesuch.

Common sense.
 
R

Richard

Richard said:
I would be horrified if some idiot saved a million records to memory
using a size_t to represent the length in bytes of the records string
UID or somesuch.

Common sense.

Hmm. Sorry about the double post. (Oh and I probably didnt mean a
million there - but you get the point).
 
M

Martin Wells

jacob:
If I am doing statistical analysis of age correlated data for
20 million people, I would rather have it as 20MB than as 80MB.


Brilliant. In such a case, you might opt for an unsigned char. Or if
you were REALLY clever you'd write your own code to manipulate the
bits so that a human age only takes 7 bits. As far as I know, the
oldest recorded and confirmed human was 128 or so, but then I suppose
on exception out of 6 billion isn't too bad.

Note that in current CPUs, memory access is one of the main limiting
factors in a program. Memory I/O can slow everything down
because the speed of RAM is snail paced compared to the CPU.


If a machine is 32-bit, shouldn't it access a 32-bit number quicker
than an 8-bit number?

Right now, the ratio of main memory RAM to CPU clock speed is around 10
at least, i.e. doubling the amount of memory used can reduce the
speed of the program by half in some circumstances!


See the point I made just above.

This is no longer the case in 64 bit windows: the natural size is 64
bits but int is 32.


This is a shortcoming of Microsoft. One of many.

I repeat: "That's of course assuming I'm not low on memory".

Using 64 bits or 32 bits for an 8 bit character would slow down
operations by a factor of 4 to a factor of 8!


Again see my argument above. It should be faster.

That is why I use the types adapted to the data they hold!

I was pointing out that "one size fits all" attitude would
produce waste of memory space!


Yes, you'd waste 3 bytes for every byte, but it should be faster. If
the 3 bytes waste is too much for you, then use an unsigned char, or
manipulate individual bits directly with your own code.

Martin
 
R

Richard

Martin Wells said:
jacob:



Brilliant. In such a case, you might opt for an unsigned char. Or if
you were REALLY clever you'd write your own code to manipulate the
bits so that a human age only takes 7 bits. As far as I know, the
oldest recorded and confirmed human was 128 or so, but then I suppose
on exception out of 6 billion isn't too bad.




If a machine is 32-bit, shouldn't it access a 32-bit number quicker
than an 8-bit number?

No. Why? Or maybe. It depends .... OT.

What is NOT off topic though is the strategic use of types to represent
data. And programming in C to TRY and reduce memory consumption can
never be a bad thing UNLESS the results result in undefined
behaviour. And in a DEFINED limitation of say "name is never longer than
32 chars" then you I personally would never store that length as 32 or
64 bit when an array of 16 bits or 8 bits would do and save potentially
megabytes of memory.
See the point I made just above.




This is a shortcoming of Microsoft. One of many.




Again see my argument above. It should be faster.




Yes, you'd waste 3 bytes for every byte, but it should be faster. If
the 3 bytes waste is too much for you, then use an unsigned char, or
manipulate individual bits directly with your own code.

Martin

--
 
C

Chris Dollin

Martin said:
If a machine is 32-bit, shouldn't it access a 32-bit number quicker
than an 8-bit number?

That depends on the machine.

No-one really cares, anyway, not about /a/ 32-bit number. In the case
of Jacob's 20 million people, the question should be whether it's
faster to access 20 million ints or twenty million bytes -- at which
point, such atopical things as caches (presence and size of) and
disc speed (data for the loading of) may matter more than how long
it takes for the machine to mask out the upper 24 bits of a value.
 
J

jacob navia

Chris said:
That depends on the machine.

No-one really cares, anyway, not about /a/ 32-bit number. In the case
of Jacob's 20 million people, the question should be whether it's
faster to access 20 million ints or twenty million bytes -- at which
point, such atopical things as caches (presence and size of) and
disc speed (data for the loading of) may matter more than how long
it takes for the machine to mask out the upper 24 bits of a value.

In all cases, caches or not, i/o of 20MB is faster than i/o of 80MB.

It *could* be that disk caches go to 80MB and then it would be the same.
Not with main memory though. Main memory caches go into the MB range at
most, and L1 caches are even smaller.

Besides there are other effects to take into consideration.
Normally, machines load data from memory into a cache line
of 32 or 64 bytes, i.e. you read 32 bytes at a time.

Using sparingly memory, for instance making
your structure fit into 32 bytes, will make it load in a single
read operation into the L1 cache. Bigger structures take MUCH more
time since 2 or more reads are necessary...

The general point I want to make is that we have to use the correct type
for the data and situation we are working with.

It would be a bad situation where we would have just
"int with 64 bits"
and we would use int for characters, shorts, etc etc, wasting
an enormous space for nothing at each structure we would use.

True, everything would be much simpler (this is the reason why
Malcolm advocates this I think), but the lost of flexibility would
be enormous.

Besides, in small machines, C would not even run: it wouldn't have
enough RAM to load the program!
 
R

Richard

jacob navia said:
In all cases, caches or not, i/o of 20MB is faster than i/o of 80MB.

Never let real world pollute the nirvana of c.l.c

If its not compilable on a DeepThought Nippon 0.2sxxyy Hiroshima
vibrator circuit then it's crap C. Never forget that.
 
E

Ed Jensen

Joe Wright said:
Indeed. I didn't mean to exclude anyone or to include myself imperiously
into "we". I believe comp.lang.c is properly about how to use the C
language, not how to change it.

I guess I consider comp.lang.c an appropriate place to discuss the
future of C. With any luck, the discussions help us better understand
what we want and need, what we want and don't really need, etc. With
even more luck, someone in a position to do something about it might
look at our discussions and help push C in that direction.
 
P

pete

Ed said:
I guess I consider comp.lang.c an appropriate place to discuss the
future of C. With any luck, the discussions help us better understand
what we want and need, what we want and don't really need, etc. With
even more luck, someone in a position to do something about it might
look at our discussions and help push C in that direction.

comp.std.c is for discussing the future of C.

People in positions to do something about it,
do participate in that newsgroup regularly.
 
J

Joe Wright

Ed said:
I guess I consider comp.lang.c an appropriate place to discuss the
future of C. With any luck, the discussions help us better understand
what we want and need, what we want and don't really need, etc. With
even more luck, someone in a position to do something about it might
look at our discussions and help push C in that direction.

You might consider comp.lang.c an appropriate place to discuss ballroom
dancing. You might even find someone here to discuss it with you. But
you would be polluting the environment.

I think comp.std.c is the appropriate place to discuss the future of
Standard C.

I think comp.lang.c is the appropriate place to discuss the C language
as described in the current Standard (C99) and previous ones. It is the
place for learning and working C programmers to share knowledge in order
to apply C successfully to their programming projects.

This is a place to learn C from people who know it, not to complain that
it doesn't have your favorite feature from another language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top