Richard Heathfield's setbits()

A

Alex Vinokur

Hi,

Here is Richard Heathfield's function from http://users.powernet.co.uk/eton/kandr2/krx206.html

#include <stdio.h>
unsigned setbits(unsigned x, int p, int n, unsigned y)
{
return (x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n))))) | ((y & ~(~0
<< n)) << (p + 1 - n));
}

setbits(x,p,n,y) returns x with the n bits that begin at position p
set to the rightmost n bits of y, leaving the other bits unchanged.

It seems that setbits (x, 31, n, y) may produce undefined behavior.

According to the C++ standard:
Behavior of shift operators << and >> is undefined if the right
operand is negative, or
greater than or equal to the length in bits of the promoted left
operand.

So, result ((~0 << (p + 1)) may be undefined.

Regards,

Alex Vinokur
 
B

Ben Bacarisse

Alex Vinokur said:
Here is Richard Heathfield's function from http://users.powernet.co.uk/eton/kandr2/krx206.html

#include <stdio.h>
unsigned setbits(unsigned x, int p, int n, unsigned y)
{
return (x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n))))) | ((y & ~(~0
<< n)) << (p + 1 - n));
}

setbits(x,p,n,y) returns x with the n bits that begin at position p
set to the rightmost n bits of y, leaving the other bits unchanged.

It seems that setbits (x, 31, n, y) may produce undefined behavior.

Yes, it does.
According to the C++ standard:

K&R is about C so you should probably quote the C standard. The intent
is that C and C++ remain synchronised about this sort of thing so won't
matter much, but a solution to a K&R exercise must be assumed to be C.
As it happens, modern C (C99) has diverged from C++ in the case of left
shifting negative numbers but the above is, presumably, not C99.
Behavior of shift operators << and >> is undefined if the right
operand is negative, or
greater than or equal to the length in bits of the promoted left
operand.

So, result ((~0 << (p + 1)) may be undefined.

It can be undefined for other reasons, though none that matter in the
context of K&R2. ~0 is often signed and negative in which case it is
undefined in C99 but not in C90 or in C++ (at least up to and including
2003). It's safer to shift unsigned types so I'd suggest ~0u.

I think it's hard to do this without knowing the width of the type. I'd
probably write:

unsigned width = CHAR_BIT * sizeof x;
unsigned mask = ~0u >> width - n << p - n + 1;
return x & ~mask | y & mask;

This goes wrong if there are padding bits, but at least we can check for
that (UINT_MAX will be == ~0u if there are none).

There is a bit-twiddling version that is one operation shorter:

return x ^ (x ^ a) & mask;

but you'd need to justify the "eh?" this might prompt in the reader.
 
J

Jonathan Lee

Here is Richard Heathfield's function
...
setbits(x,p,n,y) returns x with the n bits that begin at position p
set to the rightmost n bits of y, leaving the other bits unchanged.

It seems that setbits (x, 31, n, y) may produce undefined behavior.

Assuming p, n are in [0, INT_BIT)

unsigned mask = (~((~0U) << n)) << p;
return (x & ~mask) + ((y << p) & mask);
or
return x ^ ((x ^ (y << p)) & mask)

--Jonathan
 
E

Eric Sosman

[...]
~0 is often signed and negative [...]

In C, s/often/always/.
[...]
This goes wrong if there are padding bits, but at least we can check for
that (UINT_MAX will be == ~0u if there are none).

Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
type is an unsigned type, the expression ~E is equivalent to the
maximum value representable in that type minus E." As always, the
settings of any padding bits in the result of ~E (of any arithmetic
operation) are unspecified.
 
B

Ben Bacarisse

Eric Sosman said:
[...]
~0 is often signed and negative [...]

In C, s/often/always/.

I'll take your word for it! I was not sure about ~0 on a 1's complement
machine that supports negative zero. It's called "negative" but is it
less than zero for the purposes of a shift operation? I was not sure.
[...]
This goes wrong if there are padding bits, but at least we can check for
that (UINT_MAX will be == ~0u if there are none).

Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
type is an unsigned type, the expression ~E is equivalent to the
maximum value representable in that type minus E." As always, the
settings of any padding bits in the result of ~E (of any arithmetic
operation) are unspecified.

Ah, yes, of course. Do you know a good way to determine the width of an
unsigned type? By "good" I probably mean "other than the obvious"
iterative one.
 
E

Eric Sosman

Eric Sosman said:
[...]
~0 is often signed and negative [...]

In C, s/often/always/.

I'll take your word for it! I was not sure about ~0 on a 1's complement
machine that supports negative zero. It's called "negative" but is it
less than zero for the purposes of a shift operation? I was not sure.

Ah! Okay, "negative zero" might not be "negative" (since it
compares equal to "positive zero"). Point taken.
[...]
This goes wrong if there are padding bits, but at least we can check for
that (UINT_MAX will be == ~0u if there are none).

Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
type is an unsigned type, the expression ~E is equivalent to the
maximum value representable in that type minus E." As always, the
settings of any padding bits in the result of ~E (of any arithmetic
operation) are unspecified.

Ah, yes, of course. Do you know a good way to determine the width of an
unsigned type? By "good" I probably mean "other than the obvious"
iterative one.

Hallvard B. Furuseth came up with

"

/* Number of bits in inttype_MAX, or in any (1<<k)-1 where
* 0 <= k < 3.2E+10 */
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL
*30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 +
4-12/((m)%31+3))

Or if you are less paranoid about how large UINTMAX_MAX can get:

/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))

.." (Sorry about the line-wrapping.) Dunno whether you'd deem this
good, but it's certainly a jaw-dropper.
 
B

Ben Bacarisse

Eric Sosman said:
On 5/2/2010 2:15 PM, Ben Bacarisse wrote:

Hallvard B. Furuseth came up with

"

/* Number of bits in inttype_MAX, or in any (1<<k)-1 where
* 0 <= k < 3.2E+10 */
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL
%0x3fffffffL *30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 +
4-12/((m)%31+3))

Or if you are less paranoid about how large UINTMAX_MAX can get:

/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))

." (Sorry about the line-wrapping.) Dunno whether you'd deem this
good, but it's certainly a jaw-dropper.

I remember seeing that now... And I was worried about suggesting

x ^ (x ^ y) & mask
for
x & ~mask | y & mask

!!
 
B

Ben Bacarisse

Ben Bacarisse said:
Alex Vinokur <[email protected]> writes:
^^^^^^^^^^^^^^^^^^^^^
I missed this bit of the spec. You need y << p - n + 1 in:
unsigned width = CHAR_BIT * sizeof x;
unsigned mask = ~0u >> width - n << p - n + 1;
return x & ~mask | y & mask;

return x & ~mask | (y << p - n + 1) & mask;

but you should also use:

unsigned mask = ~(~0u << n) << p - n + 1;

as this does not need the width of the type.
 
B

Ben Bacarisse

Jonathan Lee said:
Here is Richard Heathfield's function
...
setbits(x,p,n,y) returns x with the n bits that begin at position p
set to the rightmost n bits of y, leaving the other bits unchanged.

It seems that setbits (x, 31, n, y) may produce undefined behavior.

Assuming p, n are in [0, INT_BIT)

unsigned mask = (~((~0U) << n)) << p;

This is a better way to make the mask but I think you've altered what p
means. Alex (based on Richard's code) seems to take p to be the
position of the most significant bit of those changed.

It's simpler (and consistent with the problem wording) to interpret p as
the least significant bit of the changed bits (as you have done) but
some people might be confused by this change of meaning.

It easy to switch to the other interpretation of the problem: substitute
p - n + 1 for p.
 
A

Alex Vinokur

Alex Vinokur said:
Hi,

Here is Richard Heathfield's function from
http://users.powernet.co.uk/eton/kandr2/krx206.html

#include <stdio.h>
unsigned setbits(unsigned x, int p, int n, unsigned y)
{
return (x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n))))) | ((y & ~(~0
<< n)) << (p + 1 - n));
}

setbits(x,p,n,y) returns x with the n bits that begin at position p
set to the rightmost n bits of y, leaving the other bits unchanged.

It seems that setbits (x, 31, n, y) may produce undefined behavior.

According to the C++ standard:
Behavior of shift operators << and >> is undefined if the right
operand is negative, or
greater than or equal to the length in bits of the promoted left
operand.

So, result ((~0 << (p + 1)) may be undefined.

Replace ((~0 << (p + 1)) with (((~0 << (p)) << 1)
 
A

Alex Vinokur

Here is templated setBits() based on fixed Richard Heathfield's
function in http://users.powernet.co.uk/eton/kandr2/krx206.html

setbits(x,p,n,y) returns x with the n bits that begin at position p
set to the rightmost n bits of y, leaving the other bits unchanged.

template<typename T>
T setBits(T x, std::size_t p, std::size_t n, T y)
{
BOOST_STATIC_ASSERT(std::numeric_limits<T>::is_integer);
BOOST_STATIC_ASSERT(!std::numeric_limits<T>::is_signed);
assert(n > 0);
assert(n < (sizeof(T) * CHAR_BIT));
assert(p < (sizeof(T) * CHAR_BIT));
assert(p >= (n - 1));

const T maxT = ~static_cast<T> (0);
return (x & ((static_cast<T>(maxT << p) << 1) | ~static_cast<T>(maxT
<< (p + 1 - n))) | static_cast<T>((y & ~static_cast<T>(maxT << n)) <<
(p + 1 - n)));

// ----------------------------------
// const T maxT = ~static_cast<T> (0);
// const T tmp1 = static_cast<T>(maxT << p);
// const T tmp2 = tmp1 << 1;
// const T tmp3 = ~static_cast<T>(maxT << (p + 1 - n));
// const T tmp4 = ~static_cast<T>(maxT << n);
// const T tmp5 = static_cast<T>((y & tmp4) << (p + 1 - n));

// return ( x & (tmp2 | tmp3)) | tmp5;
// ----------------------------------
}

Alex
 
P

Phil Carmody

Ben Bacarisse said:
Yes, it does.


K&R is about C so you should probably quote the C standard. The intent
is that C and C++ remain synchronised about this sort of thing so won't
matter much, but a solution to a K&R exercise must be assumed to be C.
As it happens, modern C (C99) has diverged from C++ in the case of left
shifting negative numbers but the above is, presumably, not C99.


It can be undefined for other reasons, though none that matter in the
context of K&R2. ~0 is often signed and negative in which case it is
undefined in C99 but not in C90 or in C++ (at least up to and including
2003). It's safer to shift unsigned types so I'd suggest ~0u.

I think it's hard to do this without knowing the width of the type. I'd
probably write:

Modulo caveats, so would I.
unsigned width = CHAR_BIT * sizeof x;
unsigned mask = ~0u >> width - n << p - n + 1;

Alas that can't portably inject a zero-length bitstring.
return x & ~mask | y & mask;

This goes wrong if there are padding bits, but at least we can check for
that (UINT_MAX will be == ~0u if there are none).

There is a bit-twiddling version that is one operation shorter:

return x ^ (x ^ a) & mask;

One C operation shorter, yes. One instruction deeper (3 rather than 2)
on architectures sufficiently rich to parallelise the '&'s in the former.
but you'd need to justify the "eh?" this might prompt in the reader.

Anyone who can't read that expression from left to right as
"change in x the bits that are different between x and a within the mask"
on the second reading shouldn't be reading code, but should be clicking
on buttons in some GUI instead. It's a perfectly standard idiom amongst
those who have used C as a high level assembler in every-tick-counts
embedded work. Or at least should be.

Phil
 
P

Phil Carmody

Ben Bacarisse said:
context of K&R2. ~0 is often signed and negative in which case it is
undefined in C99 but not in C90 or in C++ (at least up to and including
2003). It's safer to shift unsigned types so I'd suggest ~0u.

~0u may end up being converted to being signed on sufficiently
bizarre architectures which permit the usual arithmetic
conversions (6.3.1.8) to fall through to, and be caught by:

Otherwise, if the type of the operand with signed
integer type can represent all of the values of
the type of the operand with unsigned integer
type, then the operand with unsigned integer type
is converted to the type of the operand with
signed integer type.

But that probably only affects people in 33-bit la-la-land.

Phil
 
B

Ben Bacarisse

Phil Carmody said:
Alas that can't portably inject a zero-length bitstring.

True. I don't know if that matters or not (the specification is a
little loose) but I have already noted that JL's construction of the
mask is superior and, since it handles 0 naturally, it wins all round:

unsigned width = ~(~0u << n) << p - n + 1;
return x & ~mask | (y << p - n + 1) & mask;

If one goes on to interpret p as the least significant bit of those
injected then there is no need for p - n + 1; and the solution is
simpler still:

unsigned mask = ~(~0u << n) << p;
return x & ~mask | (y << p) & mask;

which is what he posted (modulo parentheses).

[I'm posting just to summarise what I think is the neatest solution.]

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top