Use of signed integer types

  • Thread starter Frederick Gotham
  • Start date
F

Frederick Gotham

I'd like to discuss the use of signed integers types where unsigned integers
types would suffice.

A common example would be:

#include <cassert>
#include <cstddef>

int CountOccurrences(unsigned char const val,
unsigned char const p[],
std::size_t const len)
{
assert(len >= 2);

unsigned char const * const p_over = p + len;


int i = 0; /* USE OF SIGNED INTEGER TYPE */

do
{
if(val == *p++) ++i;
}
while(p != p_over);

return i;
}


Firstly, I tend to shy away from signed integer types wherever possible
primarily because overflow results in undefined behaviour.

Another aspect I'd like to discuss though is efficiency. If a machine uses
two's complement, then there should be no difference in adding positive
numbers and negative numbers.

However, for machines which use something other than two's complement, would
there be overhead in adding signed integers? For instance something like:

int i = 5;
int j = -4;

if( j >= 0 )
{
/* Add using a particular technique */
}
else
{
/* Add using a particular technique */
}


When adding unsigned integers, there would be no need for that.

Would I be right in thinking that there could be an efficieny issue with
using signed integer types on machines which don't use two's complement, or
would I be way off the mark?
 
B

Bo Persson

Frederick Gotham said:
I'd like to discuss the use of signed integers types where unsigned
integers
types would suffice.

A common example would be:

#include <cassert>
#include <cstddef>

int CountOccurrences(unsigned char const val,
unsigned char const p[],
std::size_t const len)
{
assert(len >= 2);

unsigned char const * const p_over = p + len;


int i = 0; /* USE OF SIGNED INTEGER TYPE */

do
{
if(val == *p++) ++i;
}
while(p != p_over);

return i;
}


Firstly, I tend to shy away from signed integer types wherever
possible
primarily because overflow results in undefined behaviour.

On the other hand, unsigned overflow is defined but usually pretty
useless in practice. Hoe do you use the "well defined" result?
Another aspect I'd like to discuss though is efficiency. If a
machine uses
two's complement, then there should be no difference in adding
positive
numbers and negative numbers.

However, for machines which use something other than two's
complement, would
there be overhead in adding signed integers?

No, that's hardware anyway. Trust the chip designer and compiler
implementor to fix that.
Would I be right in thinking that there could be an efficieny issue
with
using signed integer types on machines which don't use two's
complement, or
would I be way off the mark?

Way off! :)


The real problem comes when you mix signed and unsigned values. Pick
one of them, and stay with it.


Bo Persson
 
J

Jerry Coffin

I'd like to discuss the use of signed integers types where unsigned integers
types would suffice.

[ ... ]
However, for machines which use something other than two's complement, would
there be overhead in adding signed integers?

Generally speaking, no -- the overhead (when there is any) generally
happens when you compare them. For example, in ones complement, you
can represent zero in two different ways, so comparing for equality
requires a little more than just checking for all bits being
identical.

The penalty for this varies from minute to nonexistent. If the
hardware handles it automatically, there'll usually be no penalty. If
you have to handle it in software, you usually do it by adding 0 to a
number when appropriate. This is typically going to be a single cycle
wasted on relatively rare occasion.

[ ... ]
Would I be right in thinking that there could be an efficieny issue with
using signed integer types on machines which don't use two's complement, or
would I be way off the mark?

Mostly off the mark. At best it's in the realm of truly micro-
optimizations. Usually it's completely meaningless. At least IMO, the
reason to use signed or unsigned is because it reflects your intent
better, not because one might, on some rare occasion, be
microscopically faster than the other.
 
F

Frederick Gotham

Jerry Coffin posted:

Mostly off the mark. At best it's in the realm of truly micro-
optimizations. Usually it's completely meaningless. At least IMO, the
reason to use signed or unsigned is because it reflects your intent
better, not because one might, on some rare occasion, be
microscopically faster than the other.


Yes but there are MANY professional programmers who use "int" consistently
for all purpose, and they also tend to use "i++" consistently aswell.

Here's an example of a loop I might write:

for(unsigned i = 0; i != some_value; ++i)

instead of:

for(int i = 0; i < some_value; i++)
 
R

Richard Herring

Bo Persson <[email protected]> said:
The real problem comes when you mix signed and unsigned values. Pick
one of them, and stay with it.

And make it signed ;-)

For example, it may look like a good idea to make MyContainer::size_type
unsigned, since it obviously can't be negative. But... sooner or later
you'll want to track the difference of two sizes, and that can be
negative. So now you have to define MyContainer::diff_type as well, and
suddenly there are a whole host of complications that wouldn't have
occurred if you'd stuck with plain old signed int.
 
H

Howard

Frederick Gotham said:
Jerry Coffin posted:




Yes but there are MANY professional programmers who use "int" consistently
for all purpose, and they also tend to use "i++" consistently aswell.

And there's nothing wrong with either of those. Consistency is a good
thing. It's only an issue if you use one of those where something else is
more appropriate.

Some programmers prefer to always use ++i, for example, mostly (I assume) so
that they don't accidently use i++ where ++i would have been more
appropriate. But it doesn't mean that consistently using i++ is worse (or
better). As programmers, we're paid to _think_ about what we write, not
just do repetitive tasks.
Here's an example of a loop I might write:

for(unsigned i = 0; i != some_value; ++i)

instead of:

for(int i = 0; i < some_value; i++)

That's your choice to make. I doubt there's any performance difference at
all. You might try it out with a profiler, if you're curious.

-Howard
 
J

Jerry Coffin

[ ... ]
Yes but there are MANY professional programmers who use "int" consistently
for all purpose, and they also tend to use "i++" consistently aswell.

Here's an example of a loop I might write:

for(unsigned i = 0; i != some_value; ++i)

instead of:

for(int i = 0; i < some_value; i++)

Yup -- and that's exactly what they should do, IMO. There's a
difference between a specific intent that something BE unsigned, and
the more or less accidental situation that I simply need a variable,
and in this particular case its range _happens_ to be entirely
positive.

The difference here is a little bit like the difference between
checking for an error condition with an 'if' and using an 'assert'.
An assert should only be used when violating it would indicate a
problem with the fundamental logic of the code. An if should be used
any other time.

Likewise, an unsigned should be used _only_ when signedness would
indicate some sort of fundamental flaw in the logic, not just when a
variable varies within a range that happens to be positive.

In reality, I'd almost say that C and C++ would be better off without
any unsigned types. There are a few situations (e.g. modular
arithmetic) where a sign really doesn't make sense, and in that case
you really should use an unsigned variable. In nearly every other
situation, unsigned types can and will cause problems. Most things
that can be added can also be subtracted, and leading to signed
numbers. Mixing signed and unsigned types will give a far worse
hangover than mixing different types of drinks...
 
M

Markus Schoder

Frederick said:
Would I be right in thinking that there could be an efficieny issue with
using signed integer types on machines which don't use two's complement,
or would I be way off the mark?

Rather the other way round. E.g. IBM's System/360 architecture does not use
two's complement but it also has no built-in support for unsigned integers
at all which means unsigned operations cannot be done with single machine
instructions in general.
 
R

Robbie Hatley

Frederick Gotham said:
I'd like to discuss the use of signed integers types where unsigned integers
types would suffice.

A common example would be:

#include <cassert>
#include <cstddef>

int CountOccurrences(unsigned char const val,
unsigned char const p[],
std::size_t const len)
{
assert(len >= 2);

unsigned char const * const p_over = p + len;


int i = 0; /* USE OF SIGNED INTEGER TYPE */

do
{
if(val == *p++) ++i;
}
while(p != p_over);

return i;
}


Ewww, yuck.


I'd do it THIS way, instead:


#include <cassert>
#include <cstddef>
#include <string>

int
CountOccurrences
(
unsigned char const & val,
std::basic_string<unsigned char> const & p,
std::size_t const & len
)
{
assert(len >= 2);
unsigned char const * const p_over = p + len;
int i = 0; /* USE OF SIGNED INTEGER TYPE */
std::basic_string<unsigned char>::iterator Iter;
for (Iter = p.begin(); Iter != p.end(); ++Iter)
{
if(val == (*Iter)) ++i;
}
return i;
}


std::basic_string is good.

:)

Firstly, I tend to shy away from signed integer types wherever
possible primarily because overflow results in undefined behaviour.

I've seen far more overflows and other problems from usage of unsigned
variable where signed was called for. Like my stupid predecessor who
wrote the legacy code I'm maintaining. He used unsigned int for time.
Then he subtracts one time from another. Sometimes he subtracts, say:

unsigned long DeltaTime;
DeltaTime = 27:16:04UTC - 27:18:29UTC; // oops, user adjusted system clock
// Ooops, DeltaTime is now over 4 billion!!!

This was causing time accumulators in the system to jump up about
a million hours about once every two weeks. Turns out, some program
was automatically adjusting the system clock every two weeks. The
change was only about -2 minutes, but the expected time lapse of
DeltaTime was only a few seconds, so when a more recent time was
subtracted from an older time, we'd get about 4294967234 seconds,
when we were expecting maybe 27 seconds.

Which is why time_t is generally typedefed to signed long, not
unsigned long.

That's why I tend to lean toward signed integer types by default.
Would I be right in thinking that there could be an efficieny issue with
using signed integer types on machines which don't use two's complement, or
would I be way off the mark?

Dunno. I've not not played around with any unusual representations
of negative numbers.

--
Cheers,
Robbie Hatley
Tustin, CA, USA
lonewolfintj at pacbell dot net (put "[ciao]" in subject to bypass spam filter)
http://home.pacbell.net/earnur/
 
I

Ian Collins

Jerry said:
In reality, I'd almost say that C and C++ would be better off without
any unsigned types. There are a few situations (e.g. modular
arithmetic) where a sign really doesn't make sense, and in that case
you really should use an unsigned variable. In nearly every other
situation, unsigned types can and will cause problems. Most things
that can be added can also be subtracted, and leading to signed
numbers. Mixing signed and unsigned types will give a far worse
hangover than mixing different types of drinks...
But that (signed only) would restrict your size types to half the
natural range of the processor.
 
J

Jerry Coffin

[ ... ]
I'd do it THIS way, instead:


#include <cassert>
#include <cstddef>
#include <string>

int
CountOccurrences
(
unsigned char const & val,
std::basic_string<unsigned char> const & p,
std::size_t const & len
)
{
assert(len >= 2);
unsigned char const * const p_over = p + len;
int i = 0; /* USE OF SIGNED INTEGER TYPE */
std::basic_string<unsigned char>::iterator Iter;
for (Iter = p.begin(); Iter != p.end(); ++Iter)
{
if(val == (*Iter)) ++i;
}
return i;
}

There are a couple of things about that I don't understand. If I
understand the goal here, I think I'd do it more like this:

// save some typing...
typedef std::basic_string<unsigned char> ustring;

int CountOccurrences(
unsigned char val,
ustring const &p,
std::size_t len)
{
assert(len >=2);

ustring::iterator p_over =
len < p.length() ?
p.begin()+len :
p.end();

return std::count(p.begin(), p_over, val);
}

I'm not sure this provides enough to justify its own existence
though. I'd usually just use std::count directly. In the typical
case, something like this:

occurrences = CountOccurrences(ch, s, s.length());

would become something like this:

occurrences = std::count(s.begin(), s.end(), ch);

I, for one, would find the latter easier to read -- it's well enough
known that I can recognize its intent at a glance.
 
J

Jerry Coffin

But that (signed only) would restrict your size types to half the
natural range of the processor.

True -- thus the "almost" in what I said above. There are times that
practicality dictates using unsigned types because they provide range
you need, even though they're not really the "correct" type for what
you're dealing with.

The situation should be recognized for what it is though: picking the
least of the available evils, not anything like an ideal.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top