datatype for bit flags

G

Gr

I am not an expert C or C++ programmer but know enough to get by.

Currently studying some graphics algorithms as a part of CAD.

In the Cohen Sutherland Algorithm of line clipping, a code/flag is using
for denoting where an endpoint of the line lies as compared to the
clipping rectangle.

This code consists of 4 flags (1 each for left, right, below & above).

type code = 0;

if(point lies to the left of clipping window)
code = code & 1;
if(point lies to right)
code = code & 2;

if(point lies below)
code = code & 4;
if(point lies above)
code = code & 8;


Now in some books the type of code is declared
as
unsigned char code;
In others, it's
int code;

which is better?

unsigned char code saves space
(is the unsigned really necessary anyway?)

but I think int is probably faster.

Can someone explain what's a better way to do this if you were
writing a graphics library?
 
J

jacob navia

Le 15/09/11 09:07, Gr a écrit :
I am not an expert C or C++ programmer but know enough to get by.

Currently studying some graphics algorithms as a part of CAD.

In the Cohen Sutherland Algorithm of line clipping, a code/flag is using
for denoting where an endpoint of the line lies as compared to the
clipping rectangle.

This code consists of 4 flags (1 each for left, right, below & above).

type code = 0;

if(point lies to the left of clipping window)
code = code & 1;
if(point lies to right)
code = code & 2;

if(point lies below)
code = code & 4;
if(point lies above)
code = code & 8;


Now in some books the type of code is declared
as
unsigned char code;
In others, it's
int code;

which is better?

unsigned code;

would be better since you are doing bit operations with
the data.
unsigned char code saves space

It depends of the alignment requirements of the rest of the
data. If you want to make an array of 20 000 of those flags
then unsigned char would be better. For just a single flag
int would be better since the machine doesn't have to
clear a register and then load the character into it when
operating on it.

(is the unsigned really necessary anyway?)

It depends on the operations you are doing with the data.
By the way are you sure you want
code = code & 1
This sets to zero all bits but the last... If the last was zero
it remains zero.

Or maybe you intended
code = code | 1;

That *sets* the last bit to 1.
but I think int is probably faster.
No

Can someone explain what's a better way to do this if you were
writing a graphics library?

You would have to explain more what you want to do. Now, you are doing
nothing, i.e. if you have an int with all zeros, after your operations
the data remains the same: all zeroes.
 
G

Gr

Le 15/09/11 09:07, Gr a écrit :

unsigned code;

would be better since you are doing bit operations with
the data.

Are you saying that whenever I am doing bit operations, it's better to
use unsigned data types instead of signed ones.

I guess if you are left shifting or right shifting, then using unsigned
is better. How if you are just & (to check if a bit is set) & | (to set
a bit), does it make a diff if you use a signed datatype or an unsigned
one? I guess signed type would you give you one less bit to work with as
compared to an unsigned one, but other than that would it make a difference?
It depends of the alignment requirements of the rest of the
data. If you want to make an array of 20 000 of those flags
then unsigned char would be better. For just a single flag
int would be better since the machine doesn't have to
clear a register and then load the character into it when
operating on it.



It depends on the operations you are doing with the data.

Here is a link to what I am doing
http://en.wikipedia.org/wiki/Cohen–Sutherland_algorithm#Example_C.2FC.2B.2B_implementation

By the way are you sure you want
code = code & 1
This sets to zero all bits but the last... If the last was zero
it remains zero.

Or maybe you intended
code = code | 1;

That *sets* the last bit to 1.

Yes, true.

All the &'s in my psuedocode should have been |'s.
 
M

Malcolm McLean

unsigned char code;
In others, it's
int code;

which is better?

unsigned char code saves space
(is the unsigned really necessary anyway?)

but I think int is probably faster.

Can someone explain what's a better way to do this if you were
writing a graphics library?
It really doesn't matter, unless you are storing millions of these
codes, or the operation is right within the inner loop.

If storing millions, unsigned char will save space. If the operation
is in the inner loop, int might just save enough time to be
measurable, though it's doubtful.
 
J

jacob navia

Le 15/09/11 10:03, Gr a écrit :
Are you saying that whenever I am doing bit operations, it's better to
use unsigned data types instead of signed ones.

Yes, since the data is not an integer, just a collection of
flags. That way I am sure I can use CHAR_BIT*8 flags with
no problems if I do shifts, etc.


Obviously for ANDing the last 4 bits either declaration is OK.
I guess if you are left shifting or right shifting, then using unsigned
is better. How if you are just & (to check if a bit is set) & | (to set
a bit), does it make a diff if you use a signed datatype or an unsigned
one? I guess signed type would you give you one less bit to work with as
compared to an unsigned one, but other than that would it make a
difference?

Not at all.

[snip]
Yes, true.

All the &'s in my psuedocode should have been |'s.

OK, now it is clearer what you want to do.
 
B

BartC

Gr said:
In the Cohen Sutherland Algorithm of line clipping, a code/flag is using
for denoting where an endpoint of the line lies as compared to the
clipping rectangle.

This code consists of 4 flags (1 each for left, right, below & above).

type code = 0;

if(point lies to the left of clipping window)
code = code & 1;
if(point lies to right)
code = code & 2;

if(point lies below)
code = code & 4;
if(point lies above)
code = code & 8;

I've just looked at my code for this, and I'm using an 'or' operation ('|'
in C), to build the 'outcode'. In fact looking at your code, your 'code'
variable is only ever going to be zero.

As for what type to use for 'code', it probably doesn't really matter. Get
the whole project working first, then worry about whether that tiny aspect
is a bottleneck or not.
 
G

Gr

I've just looked at my code for this, and I'm using an 'or' operation
('|' in C), to build the 'outcode'. In fact looking at your code, your
'code' variable is only ever going to be zero.

Yes - I typed that out by mistake - I am using |, not &.
I indicated that in a followup to jacob navia.
As for what type to use for 'code', it probably doesn't really matter.
Get the whole project working first, then worry about whether that tiny
aspect is a bottleneck or not.

It's not a project. I am just a looking at the implementation in 2
different text books - one using an unsigned char & the other using an
int & hence wondered about the choices & if one is a better choice than
the other.
 
P

Phil Carmody

Gr said:
I am not an expert C or C++ programmer but know enough to get by.

Currently studying some graphics algorithms as a part of CAD.

In the Cohen Sutherland Algorithm of line clipping, a code/flag is
using for denoting where an endpoint of the line lies as compared to
the clipping rectangle.

This code consists of 4 flags (1 each for left, right, below & above).

type code = 0;

if(point lies to the left of clipping window)
code = code & 1;
if(point lies to right)
code = code & 2;

if(point lies below)
code = code & 4;
if(point lies above)
code = code & 8;


Now in some books the type of code is declared
as
unsigned char code;
In others, it's
int code;

which is better?

It depends.
unsigned char code saves space
(is the unsigned really necessary anyway?)

but I think int is probably faster.

Unthink that thought.

There is no "think code is faster", there is only
"fact - code is not sufficiently fast" and
"measure which alternative code is faster".
Can someone explain what's a better way to do this if you were
writing a graphics library?

If the code-size is similar and the speed is similar,
why are you fussing about such trivialities? I'm sure
there are actual important architectural issues that
need to be adressed with much higher priority. You're
wasting too much time deciding what colour to paint
the bike shed.

Phil
 
M

Malcolm McLean

Unthink that thought.

There is no "think code is faster", there is only
"fact - code is not sufficiently fast" and
"measure which alternative code is faster".
int should be "the natural integer type" for the platform. Which means
that usually it will be the fastest type. You should use int for an
integer unless there is a clear reason for using another type. Though
each instance will only save a few cycles, if you systematically use
int the speed increment may be worth having.
 
J

Jorgen Grahn

I partly disagree. You also need some vague idea about what's
efficient and what's not, so you don't spend 20% more here and 18%
more there. Distributed fat is hard to profile.
int should be "the natural integer type" for the platform. Which means
that usually it will be the fastest type.

If you write a compiler today for an architecture where 16-bit ints
would be a bit faster than 32-bit, you'd still use 32-bit because
otherwise too much code would break.

And even on 32-bit architectures that generalization is too ...
generalized. What if you need an array of a million numbers between 0
and 7 -- will int be as fast as unsigned char? Most likely not.
You should use int for an
integer unless there is a clear reason for using another type. Though
each instance will only save a few cycles, if you systematically use
int the speed increment may be worth having.

In this case the problem didn't call for signedness, so I'd use
'unsigned'.

If the compiler had enough information to do further optimizations,
I'd stop worrying after that and leave it to the compiler.

/Jorgen
 
T

Tony

Malcolm said:
You should use int for an
integer unless there is a clear reason for using another type. Though
each instance will only save a few cycles, if you systematically use
int the speed increment may be worth having.

Did you mean that just in the context of the OP/this thread, or in
general?
 
K

Keith Thompson

pete said:
I only use data types which are smaller than int,
when there is a special reason.

It's possible that small types may be aligned on int boundaries,
meaning that there may be unusable bytes in between them.

No, it's not. In an array of char, or an array of short, the elements
must be contiguous; unlike structs arrays have no padding. This is necessary
to make array indexing and pointer arithmetic work.
It's also possible that there may be masking operations
invloved with simple arithmetic operations on small types.
Small types tend to get converted to type int
in various operations.

That's true.
 
T

Tony

pete said:
I only use data types which are smaller than int,
when there is a special reason.

It's possible that small types may be aligned on int boundaries,
meaning that there may be unusable bytes in between them.

It's also possible that there may be masking operations
invloved with simple arithmetic operations on small types.
Small types tend to get converted to type int
in various operations.

I didn't cut out enough of your post. Here's what I wanted to know:
 
K

Keith Thompson

pete said:
It's possible that small types may exist outside of an array.

Of course.

In a struct, there can be arbitrary padding between any two members, or
after the last member. But in practice, I'd be very surprised to see
any padding between c0 and c1 in
struct {
char c0;
char c1;
};

As for individually declared objects, you're not likely to have
very many of them, so any padding between them should have only
minimal effect (and if it's there, it probably saves you more in code
space than it costs in data space).
 
M

Malcolm McLean

And even on 32-bit architectures that generalization is too ...
generalized. What if you need an array of a million numbers between 0
and 7 -- will int be as fast as unsigned char? Most likely not.
int should read and write on a single "operation". Accessing the
unsigned chars may create unaligned reads, or may require masking and
bitshifting, thus several "operations".

However ints will exhaust the cache faster. It can become difficult to
get an intuitive feel for speed. For instance are we stepping through
the array in sequence in the inner loop, or accessing it randomly?
 
J

Jorgen Grahn

int should read and write on a single "operation". Accessing the
unsigned chars may create unaligned reads, or may require masking and
bitshifting, thus several "operations".

In recent years, I have come to see a linear series of instructions as
cheap, compared to memory access. As a general rule. People sometimes
point out that precalculated tables aren't as useful today as they
were maybe 20 years ago -- it's often faster to recalculate than to hit
the table.
However ints will exhaust the cache faster. It can become difficult to
get an intuitive feel for speed. For instance are we stepping through
the array in sequence in the inner loop, or accessing it randomly?

True. I take back my "most likely not" above; the problem as stated by
me still has too little information.

/Jorgen
 
H

hanukas

I am not an expert C or C++ programmer but know enough to get by.

Currently studying some graphics algorithms as a part of CAD.

In the Cohen Sutherland Algorithm of line clipping, a code/flag is using
for denoting where an endpoint of the line lies as compared to the
clipping rectangle.

This code consists of 4 flags (1 each for left, right, below & above).

type code = 0;

if(point lies to the left of clipping window)
    code = code & 1;
if(point lies to right)
    code = code & 2;

if(point lies below)
   code = code & 4;
if(point lies above)
   code = code & 8;

Now in some books the type of code is declared
as
unsigned char code;
In others, it's
int code;

which is better?

unsigned char code saves space
(is the unsigned really necessary anyway?)

but I think int is probably faster.

Can someone explain what's a better way to do this if you were
writing a graphics library?

The outcodes are temporary variables, space savings are less critical
here than using natural register width; use unsigned int.

If your library does multiple passes over your data and "caches" the
outcodes, you should instead use design which encourages sharing of
endpoints; this allows efficient single pass and generally requires
less bandwidth.

It boils down to this..

if (codeA & codeB) -> trivial reject
else if (!(codeA | codeB)) -> trivial accept (= no clipping required)
else -> clip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top