Storing an integer: why "int"?

J

JKop

When you want to store an integer in C++, you use an integral type, eg.

int main()
{
unsigned char amount_legs_dog = 4;
}

In writing portable C++ code, there should be only two factors that
influence which integral type you choose:

A) Signedness. Do you want only positive values? Or do you want both
positive and negative values?

B) The minimum range for that type as specified by the C++ Standard.


The minimum range for "short" and "int" are identical. The following
statement is always true on all implementations:

sizeof(short) <= sizeof(int)

As this is so, why would one ever use the type "int" at all? It seem to have
no merit whatsoever. I will always use "short" in its place.

Any thoughts on this?

-JKop
 
J

John Harrison

When you want to store an integer in C++, you use an integral type, eg.

int main()
{
unsigned char amount_legs_dog = 4;
}

In writing portable C++ code, there should be only two factors that
influence which integral type you choose:

A) Signedness. Do you want only positive values? Or do you want both
positive and negative values?

B) The minimum range for that type as specified by the C++ Standard.


The minimum range for "short" and "int" are identical. The following
statement is always true on all implementations:

sizeof(short) <= sizeof(int)

As this is so, why would one ever use the type "int" at all? It seem to
have
no merit whatsoever. I will always use "short" in its place.

Any thoughts on this?

-JKop

The standard says something to the effect of 'int shall be the natural
size for the architecture used'. In other words if you are on a 32 bit
machine you can reasonably expect an integer to be 32 bits instead of the
16 guaranteed by the standard.

Also (although I'm no expert) if you are on an architecture where int and
short are different sizes its quite reasonable to expect int to be more
efficient than short since it uses the natural size of the architecture.
In other words use short if you want to save space but otherwise use int,
it might be better but it won't be worse (except at saving space).

john
 
J

JKop

John Harrison posted:
The standard says something to the effect of 'int shall be the natural
size for the architecture used'. In other words if you are on a 32 bit
machine you can reasonably expect an integer to be 32 bits instead of
the 16 guaranteed by the standard.

Also (although I'm no expert) if you are on an architecture where int
and short are different sizes its quite reasonable to expect int to be
more efficient than short since it uses the natural size of the
architecture. In other words use short if you want to save space but
otherwise use int, it might be better but it won't be worse (except at
saving space).

john

But again I want to stress that we're writing portable
code. In writing portable code, one's decision on which
integral type to use should be based solely upon:

a) Signedness

b) The minimum range

From this, (again writing portable code), "int" appears to
have no merit whatsoever, and it seems that one should
always use "short" in its place.


-JKop
 
A

Andre Kostur

JKop said:
But again I want to stress that we're writing portable
code. In writing portable code, one's decision on which
integral type to use should be based solely upon:

a) Signedness

b) The minimum range

From this, (again writing portable code), "int" appears to
have no merit whatsoever, and it seems that one should
always use "short" in its place.

By your own argument, there is no merit to using a short vs. an int.
Specifically:

a) Signedness - Both int and short have the same sign (as does unsigned
int, and unsigned short)

b) The minimum range - short's minimum is <= int's minimum. Thus anything
you can store in a short is going to fit in an int

c) Word alignment - int is supposed to be the natural word length for the
platform, short has no such suggestion. Thus you have the possibility of a
more efficient (run-time) program.

So on the two points you mention, there is no benefit to using a short vs.
an int, and adding the third point tips the scales in favour of int.

However, I don't agree with the basic premise upon which your argument is
based. I belive that there are other concerns.
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

JKop said:
But again I want to stress that we're writing portable
code. In writing portable code, one's decision on which
integral type to use should be based solely upon:
a) Signedness
b) The minimum range

Why solely? The code is not less portable if you take other things into
account.
 
S

Spacen Jasset

JKop said:
John Harrison posted:
But again I want to stress that we're writing portable
code. In writing portable code, one's decision on which
integral type to use should be based solely upon:

a) Signedness

b) The minimum range

From this, (again writing portable code), "int" appears to
have no merit whatsoever, and it seems that one should
always use "short" in its place.


-JKop

I see what your thinking. But if you want *at least 16 bits* but fast, then
you should probably use an int. Only where you require exactly 16 bits --
use a short -- this isn't guaranteed anyway but often it's used behind
typedefs i.e. typedef unsigned short U16; to get a "exact" 16 bit sized
value. Shorts can ofcourse be bigger than 16 bits though, hence the need for
the typedef, and the hope that machine accessible sizes are available to
support a 16 bit type;
One use for an exact 16 bit size is to represent UTF16 a 16 it unicode
encoding. Using more than 16 bits to represent UTF16 may cause difficulty
and require more work.

This whole thing of differing word size is a thorny issue and there appears
to be much sense in having fixed sizes, like in java, considering the
trouble it causes and the care that must be taken.
 
A

Aguilar, James

JKop said:
But again I want to stress that we're writing portable
code. In writing portable code, one's decision on which
integral type to use should be based solely upon:

a) Signedness

b) The minimum range

From this, (again writing portable code), "int" appears to
have no merit whatsoever, and it seems that one should
always use "short" in its place.

From the fact that int is the suggested to be the natural length for any
architecture, isn't it true that int will be portably faster?

Also, why should those two characteristics be the deciding factor of what
type you use? Is there a rule somewhere that says you must use the minimum
amount of storage possible? I don't recall that being anywhere in the
standard either . . . what if C++ is eventually implemented on a machine
that runs more quickly when it has less free memory?

In any case, code that uses int as its integral type is no less portable
than code that doesn't, and it's likely to be far faster. There is a limit
to how far lofty ideals (portability, anarchy, communism, optimism, et al)
can be applied to real life.
 
J

Jack Klein

When you want to store an integer in C++, you use an integral type, eg.

int main()
{
unsigned char amount_legs_dog = 4;
}

In writing portable C++ code, there should be only two factors that
influence which integral type you choose:

A) Signedness. Do you want only positive values? Or do you want both
positive and negative values?

B) The minimum range for that type as specified by the C++ Standard.


The minimum range for "short" and "int" are identical. The following
statement is always true on all implementations:

sizeof(short) <= sizeof(int)

As this is so, why would one ever use the type "int" at all? It seem to have
no merit whatsoever. I will always use "short" in its place.

You will be making a big mistake if you do. In the first place, on
many current 32 bit processors, accessing shorts in memory is slower
and/or takes more code. Not only on common platforms like x86, but
popular embedded processors like ARM.
Any thoughts on this?

The real problem is the subtle "gotcha" hiding inside the "usual
arithmetic conversions", originally defined by the first C ANSI/ISO
standard, inherited by C++.

unsigned short us = 3;
signed short ss = -1;

/* other code that doesn't change value of ss or us */

if (ss < us)
{
/* do something */
}
else
{
/* do something else */
}

Now which block, the if or the else, is executed? There are two
possible implementation-defined results, both of them perfectly legal
and correct C++.

If short and int share the same range and representation, the "usual
arithmetic conversions" state that they are ss and us must be
converted to unsigned int, because signed int cannot hold all possible
values of the unsigned short type. That causes us to be converted to
(unsigned int)3, and ss to be converted to (unsigned int)-1, the
latter resulting in UINT_MAX. The conditional will be false and the
else branch executed.

On the other hand, on a common desk top implementation where short has
a narrower range of values than int, both us and ss will be converted
to signed ints, -1 is less than 3, and the if branch executed.

Google for "stdint.h", a header added to C in the 1999 C standard
update. This will almost certainly become a part of the next C++
standard update, preferably as <cstdint>. It provides a very flexible
way of using the platform's optimum integer type for specific
purposes.
 
J

JKop

Andre Kostur posted:
By your own argument, there is no merit to using a short vs. an int.
Specifically:

a) Signedness - Both int and short have the same sign (as does unsigned
int, and unsigned short)

b) The minimum range - short's minimum is <= int's minimum. Thus
anything you can store in a short is going to fit in an int

But an "int" may possibly use more memory.
c) Word alignment - int is supposed to be the natural word length for
the platform, short has no such suggestion. Thus you have the
possibility of a more efficient (run-time) program.

So on the two points you mention, there is no benefit to using a short
vs. an int, and adding the third point tips the scales in favour of
int.

Apart ofcourse from the "int" possibly using more memory.

However, I don't agree with the basic premise upon which your argument
is based. I belive that there are other concerns.


They way I look at it is that there's many integral types provided in C++.
The only difference between them in signedness and range. As such, one's
decision on which to choose can only be based upon those two factors.


-JKop
 
J

JKop

Jack Klein posted:
in its place.

You will be making a big mistake if you do. In the first place, on
many current 32 bit processors, accessing shorts in memory is slower
and/or takes more code. Not only on common platforms like x86, but
popular embedded processors like ARM.


The real problem is the subtle "gotcha" hiding inside the "usual
arithmetic conversions", originally defined by the first C ANSI/ISO
standard, inherited by C++.

unsigned short us = 3;
signed short ss = -1;

/* other code that doesn't change value of ss or us */

if (ss < us)
{
/* do something */
}
else
{
/* do something else */
}

Now which block, the if or the else, is executed? There are two
possible implementation-defined results, both of them perfectly legal
and correct C++.

If short and int share the same range and representation, the "usual
arithmetic conversions" state that they are ss and us must be
converted to unsigned int, because signed int cannot hold all possible
values of the unsigned short type. That causes us to be converted to
(unsigned int)3, and ss to be converted to (unsigned int)-1, the
latter resulting in UINT_MAX. The conditional will be false and the
else branch executed.

On the other hand, on a common desk top implementation where short has
a narrower range of values than int, both us and ss will be converted
to signed ints, -1 is less than 3, and the if branch executed.

Google for "stdint.h", a header added to C in the 1999 C standard
update. This will almost certainly become a part of the next C++
standard update, preferably as <cstdint>. It provides a very flexible
way of using the platform's optimum integer type for specific
purposes.


So then your argument would suggest never use "short",
always use "int" in its place as it's faster.

So it looks like, if you're writing a program for:

a) Speed: Then use "int"

b) Optimal memory usage: Then use "short"

But then we're left with: Which should we *typically* use?


BTW, what does it mean to say that a system is 32-Bit?


-JKop
 
G

Gernot Frisch

So it looks like, if you're writing a program for:
a) Speed: Then use "int"

b) Optimal memory usage: Then use "short"

But then we're left with: Which should we *typically* use?


BTW, what does it mean to say that a system is 32-Bit?

int is always at least as fast as short. Good compilers can optimize
your code immensly if you use int, since int is the bus width of the
processor you are targeting for.
The old DOS (until 6.x) was a 16 bit system. Windows 3.x was, too. So
an int was (propably - depending on your compiler) 16 bits (=short).
Nowadays it's 32 bits, but the 64 bit processor families are there and
new OSes will propably have compilers that define int to 64 bits,
since the processor can handle them faster than 32 bits.
Memory efficiency is not the smaller the better, since small memory
addresses might take longer to address than e.g. 32 bit aligned
addresses. I'm not sure and if I'm wrong, flame grill me.

Anyway, use short if you have a very large set of them and use long
where you need longer values. I don't use int at all, for exaclty this
reason - I want to know the size of my variables, not the "minimum"
size of them.
Just my .02$
-Gernot
 
K

Karl Heinz Buchegger

JKop said:
Andre Kostur posted:


But an "int" may possibly use more memory.

But usually you don't safe much memory by using short throughout
the program. The reason is: alignement.

If the compiler sees to consecutive short's, it may insert some
padding bytes between them to satisfy alignement requirements.

struct UseShort
{
short int A;
short int B;
};

struct UseInt
{
int A;
int B;
};

In most systems, even if sizeof(short) != sizeof(int), it will
happen that sizeof( UseShort ) == sizeof( UseInt ), because the
compiler inserted some extra bytes after UseShort::A to bring
UseShort::B onto an address which satisfies the alignement.

Since int represents is the 'natural' data type of a specific
architecture, it is safe to assume that it also fullfills the
alignment requirements without a need for padding bytes.

So in theory you are right: short may use less memory. But
you pay this price with more wasted memory due to padding.

PS: Of course most compilers allow you to change the alignement
by means of some pragma or compiler option. Usually you pay
for this by increased run time. There are eg. CPU's where eg memory
access *has to be* on an even address or else the CPU generates
an exception. A operating system function then kicks in, restarts
the load, but this time at a correctly aligned address, and uses
register manipulation to fetch the bytes you want.
 
J

jeffc

JKop said:
The minimum range for "short" and "int" are identical. The following
statement is always true on all implementations:

sizeof(short) <= sizeof(int)

As this is so, why would one ever use the type "int" at all? It seem to have
no merit whatsoever. I will always use "short" in its place.

Whoa, if the sign is "less than or equal to", then why are you acting like
it's "less than"?
 
P

Peter van Merkerk

Karl said:
But usually you don't safe much memory by using short throughout
the program. The reason is: alignement.

If the compiler sees to consecutive short's, it may insert some
padding bytes between them to satisfy alignement requirements.

struct UseShort
{
short int A;
short int B;
};

struct UseInt
{
int A;
int B;
};

In most systems, even if sizeof(short) != sizeof(int), it will
happen that sizeof( UseShort ) == sizeof( UseInt ), because the
compiler inserted some extra bytes after UseShort::A to bring
UseShort::B onto an address which satisfies the alignement.

Since int represents is the 'natural' data type of a specific
architecture, it is safe to assume that it also fullfills the
alignment requirements without a need for padding bytes.

So in theory you are right: short may use less memory. But
you pay this price with more wasted memory due to padding.

PS: Of course most compilers allow you to change the alignement
by means of some pragma or compiler option. Usually you pay
for this by increased run time. There are eg. CPU's where eg memory
access *has to be* on an even address or else the CPU generates
an exception. A operating system function then kicks in, restarts
the load, but this time at a correctly aligned address, and uses
register manipulation to fetch the bytes you want.

On x86 processors in 32-bit mode (e.g. Linux or MS-Windows on a PC),
instructions operating on 16-bit values need a prefix code. In other
words when working with 16-bit values the code becomes larger and slower.
 
A

Andrew Koenig

In writing portable C++ code, there should be only two factors that
influence which integral type you choose:

A) Signedness. Do you want only positive values? Or do you want both
positive and negative values?

B) The minimum range for that type as specified by the C++ Standard.


The minimum range for "short" and "int" are identical. The following
statement is always true on all implementations:

sizeof(short) <= sizeof(int)

As this is so, why would one ever use the type "int" at all? It seem to have
no merit whatsoever. I will always use "short" in its place.

Any thoughts on this?

This analysis is correct as far as it goes, but it doesn't go very far.

What it leaves out is the question of *why* you are using integral types in
the first place.

In my experience, almost all uses of integral types fall into two
categories:

1) Counting
2) Computation

If you are using an integral type for counting, you should probably be using
an unsigned type. Beyond that, the correct type to use depends on what you
are counting.

If you are counting elements of a data structure from the standard library,
you should be using that data structure's size_type member. So, for
example, if you wish to represent an index in a vector<string>, you should
not use int, short, or the unsigned variants thereof. Instead, you should
use vector<string>::size_type. If you want to deal with the difference
between two vector<string> indices, which therefore might be negative, use
vector<string>::difference_type.

If you are counting elements of a built-in array, or another data structure
that will fit in memory, you should use size_t. If you need a number
commensurate with the size of memory that might be negative, use ptrdiff_t.

In short, if you are counting, you should usually not use the built-in types
directly.

Now, what about computation? Most of the time, you should be using long or
unsigned long unless you have a reason to do otherwise. After all, that's
the only way that you're assured of not being limited to 16 bits.

My experience is that integer computation is actually relatively rare. Most
of the time, integers are used for counting, and in that context, it is
better to use the library-defined synonyms for the integral types than it is
to use those types directly.
 
A

Andre Kostur

JKop said:
Andre Kostur posted:


But an "int" may possibly use more memory.

Possibly... but mimimum memory usage wasn't in your list of criteria.
And as someone else has pointed out (Karl), whatever memory savings you
thought you had, may be consumed by the compiler anyway in order to word-
align your variables (which int's already are).
Apart ofcourse from the "int" possibly using more memory.

Again, not in your list of criteria.
They way I look at it is that there's many integral types provided in
C++. The only difference between them in signedness and range. As
such, one's decision on which to choose can only be based upon those
two factors.

So by your own statement, you consider run-time efficiency, or memory
footprint to be completely irrelevant considerations. Interesting, since
as far as I know, both of these criteria are _very_ important criteria to
professional programmers (as to which is more important, it depends on
various constraints that the programmer may have to work in...)
 
G

Guest

Gernot Frisch said:
int is always at least as fast as short. Good compilers can optimize
your code immensly if you use int, since int is the bus width of the
processor you are targeting for.
The old DOS (until 6.x) was a 16 bit system. Windows 3.x was, too. So
an int was (propably - depending on your compiler) 16 bits (=short).
Nowadays it's 32 bits, but the 64 bit processor families are there and
new OSes will propably have compilers that define int to 64 bits,
since the processor can handle them faster than 32 bits.

I would guess that most compilers will keep int at 32 bits in order to
break as little code as possible (code that incorrectly assumed the
size of int).
 
?

=?ISO-8859-15?Q?Juli=E1n?= Albo

I would guess that most compilers will keep int at 32 bits in order to
break as little code as possible (code that incorrectly assumed the
size of int).

Many people used to think the same thing in the 16 bit era...
 
J

JKop

Karl Heinz Buchegger posted:
But usually you don't safe much memory by using short throughout
the program. The reason is: alignement.

If the compiler sees to consecutive short's, it may insert some
padding bytes between them to satisfy alignement requirements.

struct UseShort
{
short int A;
short int B;
};

struct UseInt
{
int A;
int B;
};

In most systems, even if sizeof(short) != sizeof(int), it will
happen that sizeof( UseShort ) == sizeof( UseInt ), because the
compiler inserted some extra bytes after UseShort::A to bring
UseShort::B onto an address which satisfies the alignement.

Since int represents is the 'natural' data type of a specific
architecture, it is safe to assume that it also fullfills the
alignment requirements without a need for padding bytes.

So in theory you are right: short may use less memory. But
you pay this price with more wasted memory due to padding.

PS: Of course most compilers allow you to change the alignement
by means of some pragma or compiler option. Usually you pay
for this by increased run time. There are eg. CPU's where eg memory
access *has to be* on an even address or else the CPU generates
an exception. A operating system function then kicks in, restarts
the load, but this time at a correctly aligned address, and uses
register manipulation to fetch the bytes you want.

Well if sizeof(short) < sizeof(int), then

sizeof(short[49]) < sizeof(int[49]) as they'll be no
padding.


-JKop
 
J

JKop

Andre Kostur posted:
Possibly... but mimimum memory usage wasn't in your list of criteria.
And as someone else has pointed out (Karl), whatever memory savings you
thought you had, may be consumed by the compiler anyway in order to
word- align your variables (which int's already are).
memory.

Again, not in your list of criteria.


So by your own statement, you consider run-time efficiency, or memory
footprint to be completely irrelevant considerations. Interesting,
since as far as I know, both of these criteria are _very_ important
criteria to professional programmers (as to which is more important, it
depends on various constraints that the programmer may have to work
in...)

memory was in fact in my criteria. The reason I pick the
smallest integral type with sufficent range is because
it's the one that uses the least memory and has sufficent
range.

-JKop
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,173
Messages
2,570,938
Members
47,481
Latest member
ElviraDoug

Latest Threads

Top