Are these the best set of bitset macros in the world or what!?

S

Seebs


They weren't macros, but at one point long ago I wrote:

int
bitset(int in, int flag) { return !!(in & 1 << flag); }
void
setbit(int *in, int flag) { *in |= 1 << flag; }
void
resetbit(int *in, int flag) { *in &= ~(1 << flag); }

On reflection, I think these are actually probably worse. I was about 17
when I wrote them.

I guess I see two basic problems with trying to define "better". One is
that our hero views anything he does as intrinsically better than stuff
which is portable or standard-friendly. The other is that "better" is
very hard to define without some discussion of what we're trying to do.

In fact, the above functions illustrate this already: Note that one
of the two bit-setting functions merely returns the new value, and the
other sets the new value without returning it. I think that, were I
doing this today, I'd never have written the second; it would be cleaner
to write:
bits = bitset(bits, 5);
than to sometimes write:
newval = bitset(bits, 5);
and other times
setbit(bits, 5);

These days, if I were going to do these, I'd probably do them as static
inline functions so I didn't have to deal with macro junk such as
parentheses.

Anyway, for better bit macros:

#define BIT_N(u, n) (((u) * 0 + 1) << (n))
#define BIT_SET(u, n) ((u) | BIT_N((u), (n)))
#define BIT_CLEAR(u, n) ((u) & ~BIT_N((u), (n)))

People who've had their morning coffee are welcome to tell me whether
they think I've successfully made a macro which will safely do 1 << 63
in a 64-bit type, without creating unwanted promotions in smaller
types. It just occurred to me that it might be possible to do that,
which solves one of the common problems.

-s
 
K

Keith Thompson

Brian said:
pete said:
Brian said:
Keith Thompson wrote:
Eric Sosman wrote:
[...]
- What is the type of `1'?

I would assume it would be "coerced" to the type of n, yes? [...]

The type of 1 is int.

No matter what the type of n is?

Correct.

In ((a) << (n)),
the types of (a) and (n) are unrelated.

I find that non-regular and surprising and a good reason for hiding it
behind some macros.

Only because you fail to understand the underlying principles.

In almost all contexts in C, the type of an expression (which may
or may not be a subexpression) is determined entirely by the form
of the expression itself. It does not depend on the context in
which it appears. Implicit conversions can complicate this a bit,
but such conversions are applied to the result of the expression.

This can lead to somewhat surprising results in some cases, but
any other set of rules can do so as well.

There are languages in which context does affect the type of
an expression, but (a) such languages tend to have more complex
resolution rules than C has, and (b) I don't believe it's possible
to create such a language by applying a few macro definitions on
top of standard C.

Pause for a moment and consider the possibility that a group of
people who have been working with C for many years might actually
know something about it. Consider that you might be wrong about
something, or at least that you could learn something from someone
else.
 
S

Shao Miller

Brian said:
Keith said:
Brian said:
Eric Sosman wrote: [...]
- What is the type of `1'?
I would assume it would be "coerced" to the type of n, yes? [...]
The type of 1 is int.

No matter what the type of n is?

You might enjoy a draft with filename 'n1256.pdf'. Check out section
6.5.7. The '1' is promoted regardless of 'n'. Since plain '1' is an
'int', the result of the operator and its operands will be an 'int'.
Then 'n' is used to shift the promoted '1'. If 'n' is >= the width of
an 'int', the behaviour is undefined. The width of an integer type
includes the sign bit, whether or not there is one.

If we have 'int' (a signed integer type, per 6.2.5p4) with 15 value bits
and one sign bit, '1' might look like:

Pos: SEDCBA9876543210
Bit: 0000000000000001

Then '<<' is defined up to an 'n' (in your macro) of 15 (width - 1):

Pos: SEDCBA9876543210
Bit: 1000000000000000

This so happens to be the representation of a negative zero in a
sign-and-magnitude binary signed representation. It could actually be a
trap representation, causing trouble. So unless you can guarantee that
it's not a trap representation, you're only safe up to an 'n' of 14
(under these circumstances).

Regardless of that, if you shift any further, the result does not get
any extra bits added to it. That means that if you use it in a bitwise
operation with some other operand larger than an 'int', you mightn't get
what you expect. The program might even crash. Assuming it doesn't,
you could still have something like:

Pos: SEDCBA9876543210FEDCBA9876543210
Bit: ???????????????????????????????? (1 << 16)[see note]
Bit: 00000000000000010000000000000000 & (65536)
Bit: 000000000000000?0000000000000000 (result)

Hope that helps. Perhaps I've gotten it wrong.

[note] Undefined, then usual arithmetic conversions[6.3.1.8] applied for
'&' operator evaluation.
 
K

Keith Thompson

Shao Miller said:
Brian said:
Keith said:
Eric Sosman wrote:
[...]
- What is the type of `1'?
I would assume it would be "coerced" to the type of n, yes? [...]
The type of 1 is int.

No matter what the type of n is?

You might enjoy a draft with filename 'n1256.pdf'. Check out section
6.5.7. The '1' is promoted regardless of 'n'. Since plain '1' is an
'int', the result of the operator and its operands will be an 'int'.

Not quite.

The original expression was (n << 1).

"The integer promotions are performed on each of the operands."
This means that any operands narrower than int or unsigned int
are promoted to int or unsigned int. (By contrast, for most
arithmetic operators, "If both operands have arithmetic type, the
usual arithmetic conversions are performed on them." This promotes
both operands to a common type, which doesn't happen for the shift
operators.)

The result of (n << 1) is *not* necessarily int; it's the promoted
type of the left operand.
Then 'n' is used to shift the promoted '1'. If 'n' is >= the width of
an 'int', the behaviour is undefined. The width of an integer type
includes the sign bit, whether or not there is one.

If n is greater than or equal to the width *of the promoted left
operand*, the behavior is undefined.

[...]
 
S

Shao Miller

Keith said:
Shao Miller said:
Brian said:
Keith Thompson wrote:
Eric Sosman wrote:
[...]
- What is the type of `1'?
I would assume it would be "coerced" to the type of n, yes? [...]
The type of 1 is int.
No matter what the type of n is?
You might enjoy a draft with filename 'n1256.pdf'. Check out section
6.5.7. The '1' is promoted regardless of 'n'. Since plain '1' is an
'int', the result of the operator and its operands will be an 'int'.

Not quite.

The original expression was (n << 1).
... ... ...

That's my fault, sorry. I "snipped" too much context. I was looking at
'bit(n)' below, which is shown as '(1u<< (n))'. The other way around
from your response. Apologies.
Eric said:
Barry Schwarz wrote:
[...]
And what happens if you want to test the sign bit? You might avoid
syntax errors and undefined behavior with
#define bit(n) (1u<< (n))

While I don't see the need, some might even want 1lu or 1llu.
OK on 1u. But the compiler wouldn't really change to signed if an
unsigned was passed (guaranteed with use of the bitsetX defines
below), would it?
You're *still* not seeing the problem, are you? All, right,
let's go through it step by step:

- What is the type of `1'?
 
S

Shao Miller

Shao said:
That's my fault, sorry. I "snipped" too much context. I was looking at
'bit(n)' below, which is shown as '(1u<< (n))'. The other way around
from your response. Apologies.

Actually, I copied and pasted. It was you who "snipped" that context.
:) It was my fault for not re-injecting it since I meant to discuss it.
 
M

Marcin Grzegorczyk

James said:
This confused me for a while. But I now realize we're BOTH right!!
It all depends on how you number the bits.

Big-endian machines work "properly" according to the
above criterion with Bit 0 = MSB and so on.

Yes, but do you know any CPU architecture where bits are consistently
numbered that way? I know quite a few big-endian architectures, and on
all of them bit 0 is the LSb. (Motorola M68030 actually has 2 sets of
bit manipulation instructions, one of which numbers bits starting from
the LSb and the other from MSb!)

Moreover, the semantics of the bit shift operators in C also make it
natural to assume that bit 0 = LSb. (As in: which bit of 'foo' is set
by the expression "foo|=1<<n"?)

[snip]
My concern arose in an application where fastest possible
speed is my highest priority (though I *do* want "my cake
and to eat it too" for readability and portability as possible!)

In principle, the compiler could (one might say /should/) be able to
optimize constructs such as (given uint8_t *v)
(v[0] | (uint16_t)v[1] << 8 | (uint32_t)v[2] << 16
| (uint32_t)v[3] << 24)
into an equivalent of
*(uint32_t *)v
if appropriate. In practice, experience tells me that few, if any,
compilers can do that, so I sympathize with your concern. In a such
situation, I do use implementation-specific tricks (suitably #if'ed),
but I also do write a maximally portable implementation as a fallback.
While it's essentially duplicate effort, I consider it a good exercise
in thinking about portability issues :)
 
M

Marcin Grzegorczyk

Seebs wrote:
[snip]
Anyway, for better bit macros:

#define BIT_N(u, n) (((u) * 0 + 1) << (n))
#define BIT_SET(u, n) ((u) | BIT_N((u), (n)))
#define BIT_CLEAR(u, n) ((u) & ~BIT_N((u), (n)))

People who've had their morning coffee are welcome to tell me whether
they think I've successfully made a macro which will safely do 1 << 63
in a 64-bit type, without creating unwanted promotions in smaller
types.

Well I don't drink morning coffee (and even if I did, its effects would
have long worn out by now), but I think you have, except that the first
one should be

#define BIT_N(u, n) (((u) * 0 + 1u) << (n))

so the type is always unsigned as long as the type of 'u' is unsigned.
 
K

Keith Thompson

Shao Miller said:
Keith said:
Shao Miller said:
Brian wrote:
Keith Thompson wrote:
Eric Sosman wrote:
[...]
- What is the type of `1'?
I would assume it would be "coerced" to the type of n, yes? [...]
The type of 1 is int.
No matter what the type of n is?

You might enjoy a draft with filename 'n1256.pdf'. Check out section
6.5.7. The '1' is promoted regardless of 'n'. Since plain '1' is an
'int', the result of the operator and its operands will be an 'int'.

Not quite.

The original expression was (n << 1).
... ... ...

That's my fault, sorry. I "snipped" too much context. I was looking at
'bit(n)' below, which is shown as '(1u<< (n))'. The other way around
from your response. Apologies.

No, my fault; the original expression was (1 << n), not (n << 1).
Barry changed it to (1u << (n)).

Yes, in (1 << n), the type of the result is int, and it exhibits
UB if n >= the width of int. Sorry for the confusion.
Brian said:
Eric said:
On 9/15/2010 11:06 PM, Brian wrote:
Barry Schwarz wrote:
[...]
And what happens if you want to test the sign bit? You might avoid
syntax errors and undefined behavior with
#define bit(n) (1u<< (n))

While I don't see the need, some might even want 1lu or 1llu.
OK on 1u. But the compiler wouldn't really change to signed if an
unsigned was passed (guaranteed with use of the bitsetX defines
below), would it?
You're *still* not seeing the problem, are you? All, right,
let's go through it step by step:

- What is the type of `1'?

And if Brian had managed to answer that, we might have made some more
progress.
 
S

Seebs

Well I don't drink morning coffee (and even if I did, its effects would
have long worn out by now), but I think you have, except that the first
one should be
#define BIT_N(u, n) (((u) * 0 + 1u) << (n))
so the type is always unsigned as long as the type of 'u' is unsigned.

Not so!

6.3.1.8, paragraph 1, in the integer promotions section:

Otherwise, if the type of the operand with signed integer type
can represent all of the values of the type of the operand with
unsigned integer type, then the operand with unsigned integer
type is converted to the type of the operand with signed integer
type.

So if you are on a system where there is a signed type which can represent
every possible unsigned int, an object of that type put into:

((u) * 0 + 1u)

will still result in a signed object of that type, with the value 1.

-s
 
I

ImpalerCore

// This macro is key
//
#define bitn(n) (1 << n)

// Define some stuff to "document" code without documenting anything
("self-documenting code")!
//
#define bitset8  uint8
#define bitset16 uint16
#define bitset32 uint32
#define bitset64 uint64

// Gotta love "close to the metal"!
//
#define chkbit(x, n) (((x) &   bitn(n)) ne 0)
#define setbit(x, n)  ((x) |=  bitn(n))
#define clrbit(x, n)  ((x) &= ~bitn(n))
#define tglbit(x, n)  ((x) ^=  bitn(n))

// WINDOW_THICK_BORDER|WINDOW_MINMAXBUTTONS anyone?
//
#define chkmask(x, m) (((x) &   (m)) eq m)
#define setmask(x, m)  ((x) |=  (m))
#define clrmask(x, m)  ((x) &= (~m))
#define tglmask(x, m)  ((x) ^=  (m))

// some uses may require the following
#define bitn64(n) (1ui64 << n)

**********

Aside, I use n-1 rather than n in the bitn macros, knowing that there is
potential for error there. I prefer, in this case, great semantics rather
than submission to the language.

I also have a collection of bit manipulation macros, and I think
macros do have advantages over straight up code.

1. Readability - This one is somewhat contentious for some of the
simpler methods, but when you start creating more sophisticated bit
manipulation methods, one can get lost in the mechanics.

/*!
* \brief Define the integer type used to create a bitmask of all
ones.
*
* One of the basic constructs used in the \c bits macro library is to
* create a mask that is all ones of a given width starting from the
* least significant bit. This is implemented using the expression
* <tt>~((~0) << width)</tt>. This is used to isolate a region of
bits
* to be set or cleared within an integer. This concept is used to
* isolate a range of bits in an integer demonstrated with the set of
* \c BIT_FIELD macros.
*
* Ideally this should be configured via config.h through autoconf,
but
* for now its location is here. The \c C_BITS_ALL_ONES define can be
* \c ~0 for \c int sized masks, <tt>~UINT32_C(0)</tt> to enable 32-
bit
* support on 16-bit integer platforms, or <tt>~UINT64_C(0)</tt> to
enable
* 64-bit support.
*
* See \ref C_BITS_ONE for more information.
*/
#define C_BITS_ALL_ONES ( ~UINT32_C(0) )

/*!
* \brief Extract \c WIDTH bits from \c WORD at \c INDEX.
*
* The macro \c C_EXTRACT_BIT_FIELD returns a region within \c WORD
* that contains the bits at \c INDEX that is \c WIDTH long. The
* result is shifted so that the bit at \c INDEX is the least
* significant bit. The purpose of the macro is to provide another
* method to access bitfields packed into a word or integer without
* relying on pointer casting or defining a struct with bitfields.
*/
#define C_EXTRACT_BIT_FIELD(WORD, INDEX, WIDTH) ( (WORD >> (INDEX)) &
~((C_BITS_ALL_ONES) << (WIDTH)) )

An example would be processing ARINC 429 flight data in a generic
fashion, where data is encoded within the word at different locations
and widths.

/*
* The following example will use the ARINC 429 standard message
* structure used in avionics applications for transmitting data,
* which is shown in the structure definition below.
*
* \code
* struct arinc_429_message
* {
* uint32_t parity:1; // parity bit
* uint32_t ssm:2; // sign/status matrix
* uint32_t data:19; // data field
* uint32_t sdi:2; // source/destination identifier
* uint32_t label:8; // label field
* };
* \endcode
*
* The ARINC 429 standard is a point to point protocol consisting of a
* twisted pair that transmits 32-bit messages from avionics devices
on
* an aircraft. The data bus consists of one source with many
potential
* receivers. The following is a description of the meaning of the
* message fields.
*
* <ul>
* <li> \c label - This identifies the data type and the parameters
* associated with it for representing avionics
data.
* A single avionics device may have up to 256
different
* messages.
* <li> \c sdi - This field is used for multiple receivers to
identify
* which receiever is the recipient of the avionics
data.
* <li> \c data - The avionics data, represented in a number of
different
* formats. The common ones are binary coded
decimal,
* 2's complement binary, and bit fields.
* <li> \c ssm - This field contains hardware equipment
conditions,
* operational mode, or validity of data content.
* <li> \c parity - The most significant bit is the parity bit,
typically
* used to ensure the ARINC 429 message has odd
parity.
* Whether this bit is used depends on the avionic
system
* design.
* </ul>
*
* Let's look at a real ARINC 429 standard message from an Air Data
* Computer (ADC). The labels are represented in octal form.
*
* Total Air Temperature
* <ul>
* <li> Bit: 1 - 8 Name: Label (211)
* <li> Bit: 9 - 10 Name: SDI
* <li> Bit: 11 - 17 Name: Spare
* <li> Bit: 18 - 29 Name: Total Air Temperature Value \n
* <ul>
* <li> Element Size: 12
* <li> Type: 2's Complement
* <li> Units: Degrees C
* <li> Scale: .25
* <li> State: [-60, 100]
* </ul>
* <li> Bit: 30 - 31 Name: SSM
* <li> Bit: 32 Name: Parity
* </ul>
*/

To be able to translate this into human readable form, if I have a 32
bit arinc message, I can write something like this.

\code snippet
uint32_t arinc_429_message = 0x60540089;
uint8_t label;
int16_t data;

const int field_offset = 17;
const int field_size = 12;

label = C_EXTRACT_BIT_FIELD( arinc_429_message, 0, 8 );
data = C_EXTRACT_BIT_FIELD( arinc_429_message, field_offset,
field_size );
\endcode

In my opinion, this is much more readable and requires less effort to
understand what's going on than trying to do it raw (and I have worse
macros than this). It does require some trust that the macro
implementation does what it's supposed to do correctly, but if that
can be verified, the net gain is higher than having complicated bit
manipulation spread throughout the code.

2. Encapsulating the operation in one central location, the macro
definition, makes maintenance easier.

The biggest gain from using a macro to encapsulate the bit
manipulation is that mistakes are localized to a single definition.
If I make a logic error in writing some bit manipulation, and it's
spread out over many functions and files, the penalty for fixing that
error is much higher than fixing an error in a macro definition (I
experienced this directly). Inspecting raw bit manipulation code is
tedious, and if everyone involved understands the limitations of the
macro syntax and uses it correctly, less effort is needed to verify
code in contrast with doing everything by hand.

--

The main reason that I prefer a macro over a function is its ability
to apply the same functionality over a myriad of types without the
overhead of maintaining an arbitrary set of practically duplicate
functions needed to operate on each type. It does have drawbacks like
any macro does, and the issue with how to isolate bits in words of
varying bit widths is the worst (i.e. how to encode '1' in the
expression '1 << n').

/*!
* \brief Define the integer type used to create a bitmask of zeros
with
* the least-significant bit enabled.
*
* One of the basic constructs used in the \c bits macro library is to
* create a mask that is all zero except for a single bit at a given
* index. This is often implemented using the simple expression
* <tt>1 << index</tt>. This is used to isolate single bits to be set
* or cleared within an integer.
*
* There is a caveat to consider when using this expression. The
issue
* is that the value of \c 1 is evaluated to be of type integer. For
* environments where the integer type is 16-bit, this expression will
* fail when trying to set a bit with an index >= 16 in a 32-bit long
* integer. A similar problem arises for 32-bit environments when
* trying to use \c 1 to set a bit at index >= 32 in a 64-bit integer.
* If this is an issue, there will typically be a warning that says
* that the left shift count is larger than the width of the integer
* type. If this error is found, \c C_BITS_ONE will need to be
defined
* as a larger integer type.
*
* The method to increase the width used by the macro library is to
* specify the type explicitly. This can be specified using a
stdint.h
* constant like <tt>UINT64_C(1)</tt> to enable support for 64-bit
* integers. Keep in mind that increasing the width beyond the
default
* integer type for the processor may incur a performance penalty for
* all macros.
*
* Ideally this should be configured using Autoconf or a Makefile, but
* for now its location is here. The \c C_BITS_ONE define can be \c 1
* for \c int sized masks, <tt>UINT32_C(1)</tt> to enable 32-bit
support
* on 16-bit integer platforms, or <tt>UINT64_C(1)</tt> to enable 64-
bit
* support.
*/
#define C_BITS_ONE UINT32_C(1)

#define C_BIT(INDEX) (C_BITS_ONE << (INDEX))
#define C_SET_BIT(WORD, INDEX) (WORD |= (C_BITS_ONE << (INDEX)))
#define C_CLEAR_BIT(WORD, INDEX) (WORD &= ~(C_BITS_ONE << (INDEX)))

All of my bit macros use this C_BITS_ONE definition to isolate bits,
so in a sense, C_BITS_ONE hardwires the maximum bit width used. It's
not perfect, but I'm willing to deal with the limitations for what
advantages it does provide me. There may be better ways to do it too.

One drawback of this approach is that it may force a larger bit width
than is needed to perform the operation. For example, if I want to
support 64-bit flags on when the CPU natively supports 32-bit
integers, the performance of all bit operations that go through my
macros would suffer if I define C_BITS_ONE to be UINT64_C(1).
Fortunately, in my case the performance issue is not a big deal.

(In regard to your syntactic blurring with your own #defines, you're
going to be in a very small minority of people who advocate that.
Just keep that in mind if/when you program in a team environment.)

Best regards,
John D.
 
B

BruceS

Because he thinks he can do better.  As a result of which, he has been
forced to accept a few minor issues:

* He can never move to a different operating system or upgrade his
  compiler.
* He can never write code other people can use.
* He can never use code other people have written.

On the bright side, though, he's working in a language which he currently
feels is cleaner.  So he's got that going for him.

LOL. This usually means LQTM, but I mean it.
Am I the only one who thinks Brian is trolling the ng? This could be
Bill C. It amazes me how many people bite the hook.
 
J

James Dow Allen

Yes, but do you know any CPU architecture where bits are consistently
numbered that way?  I know quite a few big-endian architectures, and on
all of them bit 0 is the LSb.

Once upon a time the IBM 360 and 370 weren't considered exotic! :)
There 0 was always MSB. (Some say IBM deliberately did everything
opposite to the norm.) That machine of my youth made an impression
on me.

Anyway, if bytes are numbered left-to-right, so should be bits
intuitively.
A rippling bit
8000 -> ... -> 0800 -> 0400 -> 0200 -> 0100 -> 0080 -> 0040
looks correct in a way that
0100 -> ... -> 1000 -> 2000 -> 4000 -> 8000 -> 0001 -> 0002
does not. Perhaps you *could* argue that mine is just blind
prejudice.

James Dow Allen
 
S

Shao Miller

Eric said:
... ... ...
You're *still* not seeing the problem, are you? All, right,
let's go through it step by step:

- What is the type of `1'?

Keith said:
... ... ...
And if Brian had managed to answer that, we might have made some more
progress.

Heheheh. Well, it was probably a good idea of yours to quote (beyond a
mere reference) the common reference material that is available.

Here's the relevant portion of 'n1256.pdf', for anyone curious: Section
6.4.4.1, specifically, point 5.

"The type of an integer constant is the first of the corresponding
list in which its value can be represented."

(Followed by a nice list which leads to '1' being an 'int'.)
 
E

Eric Sosman

With no due respect, go **** yourself maggot bit twiddler.

Still giving advice on activities you know nothing about, I see.
Be patient: As your body matures towards puberty (hormone injections
will help), eventually your testicle may descend and you may become
able to learn about something that is today just a remote rumor. When
and if the day arrives, though, you'll want to be more careful about
details and less indulgent of undefined behavior, lest you attempt the
wrong hole. If your partner is a patient sort he may forgive you, but
if you insist on trying to hump his navel he'll eventually give up on
you. And then you'll be left to twiddle your lonely little bit by
yourself, wondering why it always comes up zero.
 
L

lawrence.jones

James Dow Allen said:
Once upon a time the IBM 360 and 370 weren't considered exotic! :)
There 0 was always MSB.

Actually, 1 was always MSB. That used to be a good heuristic: if the
bits started at 1 they were numbered big endian, if they start at 0 they
were numbered little endian.
 
N

Niklas Holsti

Marcin said:
Yes, but do you know any CPU architecture where bits are consistently
numbered that way? I know quite a few big-endian architectures, and on
all of them bit 0 is the LSb. (Motorola M68030 actually has 2 sets of
bit manipulation instructions, one of which numbers bits starting from
the LSb and the other from MSb!)

The MIL-STD-1750 architecture is big-endian (in a 32-bit integer, the
more significant 16-bit word comes first) and numbers bits starting from
the most significant bit, which is bit number 0.
 
B

Ben Bacarisse

Actually, 1 was always MSB. That used to be a good heuristic: if the
bits started at 1 they were numbered big endian, if they start at 0 they
were numbered little endian.

This sounded wrong and since I happen to have a copy of "IBM System/370
Principles of Operation" here I was able to check. The MSB is always
numbered 0 in that manual.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top