bitwise & operator and counting set bits

S

sathyashrayan

Group,
Following function will check weather a bit
is set in the given variouble x.

int bit_count(long x)
{
int n = 0;
/*
** The loop will execute once for each bit of x
set,
*/
if (x)
do
n++;
while (0 != (x = x&(x-1)));
return (n);
}

But the same action in the following function is
confusing me a lot.

int bitcount(long i)
{
i = ((i & 0xAAAAAAAA) >> 1) + (i &
0x55555555);
i = ((i & 0xCCCCCCCC) >> 2) + (i &
0x33333333);
i = ((i & 0xF0F0F0F0) >> 4) + (i &
0x0F0F0F0F);
i = ((i & 0xFF00FF00) >> 8) + (i &
0x00FF00FF);
i = ((i & 0xFFFF0000) >> 16) + (i &
0x0000FFFF);
return (int)i;
}

All those hex number are magic numbers? Is the
above an implementation defined?
Please clear my doubt. I am posting all the two
codes which I have taken from C snippet
archives.

-------------------code-1------------------

#include <stdio.h>
#include <stdlib.h>
#define plural_text(n) &"s"[(1 == (n))]
int bitcount(long i)
{
i = ((i & 0xAAAAAAAA) >> 1) + (i &
0x55555555);
i = ((i & 0xCCCCCCCC) >> 2) + (i &
0x33333333);
i = ((i & 0xF0F0F0F0) >> 4) + (i &
0x0F0F0F0F);
i = ((i & 0xFF00FF00) >> 8) + (i &
0x00FF00FF);
i = ((i & 0xFFFF0000) >> 16) + (i &
0x0000FFFF);
return (int)i;
}
int main(int argc, char *argv[])
{
long n;
while(--argc)
{
int i;
n = atol(*++argv);
i = bitcount(n);
printf("%ld contains %d bit%s
set\n",n, i, plural_text(i));
}
return 0;
}

----------------code-2----------------------------
-
#include <stdio.h>
#include <stdlib.h>

#define plural_text(n) &"s"[(1 == (n))]

int bit_count(long x)
{
int n = 0;
/*
** The loop will execute once for each bit of x
set, this is in average
** twice as fast as the shift/test method.
*/
if (x)
do
n++;
while (0 != (x = x&(x-1)));
return (n);
}
int main(int argc, char *argv[])
{
long n;
while(--argc)
{
int i;
n = atol(*++argv);
i = bit_count(n);
printf("%ld contains %d bit%s
set\n",n, i, plural_text(i));
}
return 0;
}
 
W

Walter Roberson

But the same action in the following function is
confusing me a lot.
int bitcount(long i)
{
i = ((i & 0xAAAAAAAA) >> 1) + (i &
0x55555555);
i = ((i & 0xCCCCCCCC) >> 2) + (i &
0x33333333);
All those hex number are magic numbers?

Yes, they assume 32 bit longs, which is implementation dependant.
 
C

Chris Torek

Group,
Following function will check weather a bit
is set in the given variouble x.
int bit_count(long x)
[snippage]

Actually, it (non-portably -- x should have type "unsigned long")
counts the *number* of bits set.

I think it is time to re-post this Moldy Oldie :) The stuff at
the top is not Standard C, and the timing code as well, but the
bit-counting functions are at least commented. Note that I wrote
this shortly after the 1989 C Standard came out, before it was
widely available -- we were not even using GCC yet on most machines.


#ifndef lint
static char rcsid[] = "$Id: bct.c,v 1.5 90/10/13 08:44:12 chris Exp $";
#endif

/*
* bct - bitcount timing
*
* Runs a bunch of different functions-to-count-bits-in-a-longword
* (many assume 32 bits per long) with a timer around each loop, and
* tries to calcuate the time used.
*/
#include <stdio.h>
#include <sys/types.h>

#ifdef FD_SETSIZE
# define USE_GETRUSAGE
# include <sys/time.h>
# include <sys/resource.h>
#else
# include <sys/times.h>
#endif

#define SIZEOF(a) (sizeof(a)/sizeof(a[0]))


/*
* This function is used to calibrate the timing mechanism.
* This way we can subtract the loop and call overheads.
*/
int
calibrate(n)
register unsigned long n;
{
return 0;
}


/*
* This function counts the number of bits in a long.
* It is limited to 63 bit longs, but a minor mod can cope with 511 bits.
*
* It is so magic, an explanation is required:
* Consider a 3 bit number as being
* 4a+2b+c
* if we shift it right 1 bit, we have
* 2a+b
* subtracting this from the original gives
* 2a+b+c
* if we shift the original 2 bits right we get
* a
* and so with another subtraction we have
* a+b+c
* which is the number of bits in the original number.
* Suitable masking allows the sums of the octal digits in a
* 32 bit number to appear in each octal digit. This isn't much help
* unless we can get all of them summed together.
* This can be done by modulo arithmetic (sum the digits in a number by molulo
* the base of the number minus one) the old "casting out nines" trick they
* taught in school before calculators were invented.
* Now, using mod 7 wont help us, because our number will very likely have
* more than 7 bits set. So add the octal digits together to get base64
* digits, and use modulo 63.
* (Those of you with 64 bit machines need to add 3 octal digits together to
* get base512 digits, and use mod 511.)
*
* This is HACKMEM 169, as used in X11 sources.
*/
int
t0_hackmemmod(n)
register unsigned long n;
{
register unsigned long tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}


/*
* This is the same as the above, but does not use the % operator.
* Most modern machines have clockless division, and so the modulo is as
* expensive as, say, an addition.
*/
int
t1_hackmemloop(n)
register unsigned long n;
{
register unsigned long tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
tmp = (tmp + (tmp >> 3)) & 030707070707;
while (tmp > 63)
tmp = (tmp & 63) + (tmp >> 6);
return tmp;
}

/*
* Above, without using while loop. It takes at most 5 iterations of the
* loop, so we just do all 5 in-line. The final result is never 63
* (this is assumed above as well).
*/
int
t2_hackmemunrolled(n)
register unsigned long n;
{
register unsigned long tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
tmp = (tmp + (tmp >> 3)) & 030707070707;
tmp = (tmp & 63) + (tmp >> 6);
tmp = (tmp & 63) + (tmp >> 6);
tmp = (tmp & 63) + (tmp >> 6);
tmp = (tmp & 63) + (tmp >> 6);
tmp = (tmp & 63) + (tmp >> 6);
return (tmp);
}


/*
* This function counts the bits in a long.
*
* It removes the lsb and counting the number of times round the loop.
* The expression (n & -n) yields the lsb of a number,
* but it only works on 2's compliment machines.
*/
int
t3_rmlsbsub(n)
register unsigned long n;
{
register int count;

for (count = 0; n; n -= (n & -n))
count++;
return count;
}

int
t4_rmlsbmask(n)
register unsigned long n;
{
register int count;

for (count = 0; n; count++)
n &= n - 1; /* take away lsb */
return (count);
}

/*
* This function counts the bits in a long.
*
* It works by shifting the number down and testing the bottom bit.
*/
int
t5_testlsb(n)
register unsigned long n;
{
register int count;

for (count = 0; n; n >>= 1)
if (n & 1)
count++;
return count;
}

/*
* This function counts the bits in a long.
*
* It works by shifting the number left and testing the top bit.
* On many machines shift is expensive, so it uses a cheap addition instead.
*/
int
t6_testmsb(n)
register unsigned long n;
{
register int count;

for (count = 0; n; n += n)
if (n & ~(~(unsigned long)0 >> 1))
count++;
return count;
}

int
t7_testsignandshift(n)
register unsigned long n;
{
register int count;

for (count = 0; n; n <<= 1)
if ((long)n < 0)
count++;
return (count);
}


/*
* This function counts the bits in a long.
*
* It works by masking each bit.
* This is the second most intuitively obvious method,
* and is independent of the number of bits in the long.
*/
int
t8_testeachbit(n)
register unsigned long n;
{
register int count;
register unsigned long mask;

count = 0;
for (mask = 1; mask; mask += mask)
if (n & mask)
count++;
return count;
}


/*
* This function counts the bits in a long.
*
* It works by masking each bit.
* This is the most intuitively obvious method,
* but how do you a priori know how many bits in the long?
* (except for ''sizeof(long) * CHAR_BITS'' expression)
*/
int
t9_testeachbit1shl(n)
register unsigned long n;
{
register int count;
register int bit;

count = 0;
for (bit = 0; bit < 32; ++bit)
if (n & ((unsigned long)1 << bit))
count++;
return count;
}

static char nbits[256] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
};

static int inbits[256] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
};

int
tA_tableshift(n)
register unsigned long n;
{

return (nbits[n & 0xff] + nbits[(n >> 8) & 0xff] +
nbits[(n >> 16) & 0xff] + nbits[n >> 24]);
}

int
tB_tableuchar(n)
unsigned long n;
{
register unsigned char *p = (unsigned char *)&n;

return (nbits[p[0]] + nbits[p[1]] + nbits[p[2]] + nbits[p[3]]);
}

int
tC_tableshiftcast(n)
register unsigned long n;
{

return nbits[(unsigned char) n] +
nbits[(unsigned char) (n >> 8)] +
nbits[(unsigned char) (n >> 16)] +
nbits[(unsigned char) (n >> 24)];
}

int
tD_itableshift(n)
register unsigned long n;
{

return (inbits[n & 0xff] + inbits[(n >> 8) & 0xff] +
inbits[(n >> 16) & 0xff] + inbits[n >> 24]);
}

int
tE_itableuchar(n)
unsigned long n;
{
register unsigned char *p = (unsigned char *)&n;

return (inbits[p[0]] + inbits[p[1]] + inbits[p[2]] + inbits[p[3]]);
}

int
tF_itableshiftcast(n)
register unsigned long n;
{

return inbits[(unsigned char) n] +
inbits[(unsigned char) (n >> 8)] +
inbits[(unsigned char) (n >> 16)] +
inbits[(unsigned char) (n >> 24)];
}

/*
* Explanation:
* First we add 32 1-bit fields to get 16 2-bit fields.
* Each 2-bit field is one of 00, 01, or 10 (binary).
* We then add all the two-bit fields to get 8 4-bit fields.
* These are all one of 0000, 0001, 0010, 0011, or 0100.
*
* Now we can do something different, becuase for the first
* time the value in each k-bit field (k now being 4) is small
* enough that adding two k-bit fields results in a value that
* still fits in the k-bit field. The result is four 4-bit
* fields containing one of {0000,0001,...,0111,1000} and four
* more 4-bit fields containing junk (sums that are uninteresting).
* Pictorially:
* n = 0aaa0bbb0ccc0ddd0eee0fff0ggg0hhh
* n>>4 = 00000aaa0bbb0ccc0ddd0eee0fff0ggg
* sum = 0aaaWWWWiiiiXXXXjjjjYYYYkkkkZZZZ
* where W, X, Y, and Z are the interesting sums (each at most 1000,
* or 8 decimal). Masking with 0x0f0f0f0f extracts these.
*
* Now we can change tactics yet again, because now we have:
* n = 0000WWWW0000XXXX0000YYYY0000ZZZZ
* n>>8 = 000000000000WWWW0000XXXX0000YYYY
* so sum = 0000WWWW000ppppp000qqqqq000rrrrr
* where p and r are the interesting sums (and each is at most
* 10000, or 16 decimal). The sum `q' is junk, like i, j, and
* k above; but it is not necessarry to discard it this time.
* One more fold, this time by sixteen bits, gives
* n = 0000WWWW000ppppp000qqqqq000rrrrr
* n>>16 = 00000000000000000000WWWW000ppppp
* so sum = 0000WWWW000ppppp000sssss00tttttt
* where s is at most 11000 and t is it most 100000 (32 decimal).
*
* Now we have t = r+p = (Z+Y)+(X+W) = ((h+g)+(f+e))+((d+c)+(b+a)),
* or in other words, t is the number of bits set in the original
* 32-bit longword. So all we have to do is return the low byte
* (or low 6 bits, but `low byte' is typically just as easy if not
* easier).
*
* This technique is also applicable to 64 and 128 bit words, but
* 256 bit or larger word sizes require at least one more masking
* step.
*/
int
tG_sumbits(n)
register unsigned long n;
{

n = (n & 0x55555555) + ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
n = (n + (n >> 4)) & 0x0f0f0f0f;
n += n >> 8;
n += n >> 16;
return (n & 0xff);
}

/*
* build a long random number.
* The standard rand() returns at least a 15 bit number.
* We use the top 9 of 15 (since the lower N bits of the usual rand()
* repeat with a period of 2^N).
*/
unsigned long
bigrand()
{
#define randbits() ((unsigned long)((rand() >> 6) & 0777))
register unsigned long r;

r = randbits() << 27;
r |= randbits() << 18;
r |= randbits() << 9;
r |= randbits();
return (r);
}

/*
* Run the test many times.
* You will need a running time in the 10s of seconds
* for an accurate result.
*
* To give a fair comparison, the random number generator
* is seeded the same for each measurement.
*
* Return value is seconds per iteration.
*/
#ifndef REPEAT
#if defined(mips) || defined(sparc)
#define REPEAT (1L<<20)
#else
#define REPEAT (1L<<18)
#endif
#endif

double
measure(func)
register int (*func)();
{
#ifdef USE_GETRUSAGE
struct rusage ru0, ru1;
register long j;

srand(1);
(void) getrusage(RUSAGE_SELF, &ru0);
for (j = 0; j < REPEAT; ++j)
func(bigrand());
(void) getrusage(RUSAGE_SELF, &ru1);
ru1.ru_utime.tv_sec -= ru0.ru_utime.tv_sec;
if ((ru1.ru_utime.tv_usec -= ru0.ru_utime.tv_usec) < 0) {
ru1.ru_utime.tv_usec += 1000000;
ru1.ru_utime.tv_sec--;
}
return ((ru1.ru_utime.tv_sec + (ru1.ru_utime.tv_usec / 1000000.0)) /
(double)REPEAT);
#else
register long j;
struct tms start;
struct tms finish;

srand(1);
times(&start);
for (j = 0; j < REPEAT; ++j)
func(bigrand());
times(&finish);
return ((finish.tms_utime - start.tms_utime) / (double)REPEAT);
#endif
}

struct table {
char *name;
int (*func)();
double measurement;
int trial;
};

struct table table[] =
{
{ "hackmemmod", t0_hackmemmod },
{ "hackmemloop", t1_hackmemloop },
{ "hackmemunrolled", t2_hackmemunrolled },
{ "rmlsbsub", t3_rmlsbsub },
{ "rmlsbmask", t4_rmlsbmask },
{ "testlsb", t5_testlsb },
{ "testmsb", t6_testmsb },
{ "testsignandshift", t7_testsignandshift },
{ "testeachbit", t8_testeachbit },
{ "testeachbit1shl", t9_testeachbit1shl },
{ "tableshift", tA_tableshift },
{ "tableuchar", tB_tableuchar },
{ "tableshiftcast", tC_tableshiftcast },
{ "itableshift", tD_itableshift },
{ "itableuchar", tE_itableuchar },
{ "itableshiftcast", tF_itableshiftcast },
{ "sumbits", tG_sumbits },
};

main(argc, argv)
int argc;
char **argv;
{
double calibration, v, best;
int j, k, m, verbose;

verbose = argc > 1;
/*
* run a few tests to make sure they all agree
*/
srand(getpid());
for (j = 0; j < 10; ++j) {
unsigned long n;
int bad;

n = bigrand();
for (k = 0; k < SIZEOF(table); ++k)
table[k].trial = table[k].func(n);
bad = 0;
for (k = 0; k < SIZEOF(table); ++k)
for (m = 0; m < SIZEOF(table); ++m)
if (table[k].trial != table[m].trial)
bad = 1;
if (bad) {
printf("wrong: %08lX", n);
for (k = 0; k < SIZEOF(table); ++k)
printf(" %3d", table[k].trial);
printf("\n");
}
}

/*
* calibrate the timing mechanism
*/
calibration = measure(calibrate);
if (verbose)
printf("calibration: %g\n", calibration);

/*
* time them all, keeping track of the best (smallest)
*/
for (j = 0; j < SIZEOF(table); ++j) {
v = measure(table[j].func) - calibration;
if (verbose)
printf("%s: %g\n", table[j].name, v);
table[j].measurement = v;
if (!j || v < best)
best = v;
}
(void) printf("%-24s %-14sratio\n", "function", "time");
for (j = 0; j < SIZEOF(table); ++j) {
(void) printf("%-24s %#10.8g %6.3f\n",
table[j].name,
table[j].measurement,
table[j].measurement / best);
}
exit(0);
}
 
R

ranveerkunal

lets understand the magic numbers first
AAAAAAAA = 1010 1010 1010 1010
55555555 = 0101 0101 0101 0101
alternative 1s and zeros
CCCCCCCC = 1100 1100 1100 1100
33333333 = 0011 0011 0011 0011
2 ones and 2 zeros
then 4 ones 4 zeros
then 8 ones 8 zeros
then 16 ones 16 zeros
so lets go with
dec 99 -> hex 63 -> bin 0110 0011


now we can see that number of high bits is the sum of all the bit
fields
0+1+1+0+0+0+1+1 = 4
this is same as adding consecutive bits and then addin them together
(0+1)+(1+0)+(0+0)+(1+1)=0101 0010 and this is wat you get after first
step.
0110 0011 & 1010 1010 1010 1010 = 0010 0010 > 1= 0001 0001
0110 0011 & 0101 0101 0101 0101 = 0100 0001
add 0101 0010 :) (2 bits can have maximum two high bits and their sum
is 2(10) that can be accomodated in 2 bits)

now take 4 bits together
((0+1)+(1+0))+((0+0)+(1+1))=(01+01)+(00+10)=0010 0010 (4 bits can have
maximum four high bits and their sum is 4(100) that can be accomodated
in 4 bits)
same as wat you get after step 2
now take 8 bits together
(((0+1)+(1+0))+((0+0)+(1+1)))=((01+01)+(00+10))=0010+0010=0000 0100
wat you get after step 3
this is the answer 0000 0100 = 4

hope that clears yr doubt
 
S

sathyashrayan

lets understand the magic numbers first
AAAAAAAA = 1010 1010 1010 1010
55555555 = 0101 0101 0101 0101
alternative 1s and zeros
CCCCCCCC = 1100 1100 1100 1100
33333333 = 0011 0011 0011 0011
2 ones and 2 zeros
then 4 ones 4 zeros
then 8 ones 8 zeros
then 16 ones 16 zeros
so lets go with
dec 99 -> hex 63 -> bin 0110 0011


now we can see that number of high bits is the sum of all the bit
fields
0+1+1+0+0+0+1+1 = 4
this is same as adding consecutive bits and then addin them together
(0+1)+(1+0)+(0+0)+(1+1)=0101 0010 and this is wat you get after first
step.
0110 0011 & 1010 1010 1010 1010 = 0010 0010 > 1= 0001 0001
0110 0011 & 0101 0101 0101 0101 = 0100 0001
add 0101 0010 :) (2 bits can have maximum two high bits and their sum
is 2(10) that can be accomodated in 2 bits)

now take 4 bits together
((0+1)+(1+0))+((0+0)+(1+1))=(01+01)+(00+10)=0010 0010 (4 bits can have
maximum four high bits and their sum is 4(100) that can be accomodated
in 4 bits)
same as wat you get after step 2
now take 8 bits together
(((0+1)+(1+0))+((0+0)+(1+1)))=((01+01)+(00+10))=0010+0010=0000 0100
wat you get after step 3
this is the answer 0000 0100 = 4

hope that clears yr doubt

Thanks a lot. It does help


--
"combination is the heart of chess"

A.Alekhine

Mail to:
sathyashrayan AT gmail DOT com
 
R

Rufus V. Smith

sathyashrayan said:
Group,
Following function will check weather a bit
is set in the given variouble x.

int bit_count(long x)
{
int n = 0;
/*
** The loop will execute once for each bit of x
set,
*/
if (x)
do
n++;
while (0 != (x = x&(x-1)));
return (n);
}

But the same action in the following function is
confusing me a lot.

int bitcount(long i)
{
i = ((i & 0xAAAAAAAA) >> 1) + (i &
0x55555555);
i = ((i & 0xCCCCCCCC) >> 2) + (i &
0x33333333);
i = ((i & 0xF0F0F0F0) >> 4) + (i &
0x0F0F0F0F);
i = ((i & 0xFF00FF00) >> 8) + (i &
0x00FF00FF);
i = ((i & 0xFFFF0000) >> 16) + (i &
0x0000FFFF);
return (int)i;
}

All those hex number are magic numbers? Is the
above an implementation defined?
Please clear my doubt. I am posting all the two
codes which I have taken from C snippet
archives.

They are not magic numbers. They are bitmasks, progressively masking
off consecutive bits, 1, 2, 4, 8, 16.

It may have been clearer had it been written:

i = ((i >> 1) & 0x55555555)
+ (i & 0x55555555);

i = ((i >> 2) & 0x33333333)
+ (i & 0x33333333);

i = ((i >> 4) & 0x0F0F0F0F)
+ (i & 0x0F0F0F0F);

i = ((i >> 8) & 0x00FF00FF)
+ (i & 0x00FF00FF);

i = ((i >> 16)& 0x0000FFFF) >> 16)
+ (i & 0x0000FFFF);
return (int)i;

This way you can see that the first expression is 16 parallel adds of single
bits into 2-bit result fields. (The bit counts of 2 bit fields)

The next is 8 parallel adds of 2 bits into a 4 bit result field.
(the sums of adjacent prior bit counts)

etc...

The final result being the sum of sums (total bit count for a 32 bit number)

Rufus
 
R

Rufus V. Smith

They are not magic numbers. They are bitmasks, progressively masking
off consecutive bits, 1, 2, 4, 8, 16.

It may have been clearer had it been written:
i = ((i >> 16)& 0x0000FFFF) >> 16)
+ (i & 0x0000FFFF);
return (int)i;

Oops, forgot to get rid of that second shift during cut and paste.
Should be:
i = ((i >> 16)& 0x0000FFFF)
+ (i & 0x0000FFFF);
return (int)i;

Sorry about that.

Rufus
 
R

Rufus V. Smith

Chris Torek said:
/*
* Explanation:
* First we add 32 1-bit fields to get 16 2-bit fields.
* Each 2-bit field is one of 00, 01, or 10 (binary).
* We then add all the two-bit fields to get 8 4-bit fields.
* These are all one of 0000, 0001, 0010, 0011, or 0100.
*
* Now we can do something different, becuase for the first
* time the value in each k-bit field (k now being 4) is small
* enough that adding two k-bit fields results in a value that
* still fits in the k-bit field. The result is four 4-bit
* fields containing one of {0000,0001,...,0111,1000} and four
* more 4-bit fields containing junk (sums that are uninteresting).
* Pictorially:
* n = 0aaa0bbb0ccc0ddd0eee0fff0ggg0hhh
* n>>4 = 00000aaa0bbb0ccc0ddd0eee0fff0ggg
* sum = 0aaaWWWWiiiiXXXXjjjjYYYYkkkkZZZZ
* where W, X, Y, and Z are the interesting sums (each at most 1000,
* or 8 decimal). Masking with 0x0f0f0f0f extracts these.
*
* Now we can change tactics yet again, because now we have:
* n = 0000WWWW0000XXXX0000YYYY0000ZZZZ
* n>>8 = 000000000000WWWW0000XXXX0000YYYY
* so sum = 0000WWWW000ppppp000qqqqq000rrrrr
* where p and r are the interesting sums (and each is at most
* 10000, or 16 decimal). The sum `q' is junk, like i, j, and
* k above; but it is not necessarry to discard it this time.
* One more fold, this time by sixteen bits, gives
* n = 0000WWWW000ppppp000qqqqq000rrrrr
* n>>16 = 00000000000000000000WWWW000ppppp
* so sum = 0000WWWW000ppppp000sssss00tttttt
* where s is at most 11000 and t is it most 100000 (32 decimal).
*
* Now we have t = r+p = (Z+Y)+(X+W) = ((h+g)+(f+e))+((d+c)+(b+a)),
* or in other words, t is the number of bits set in the original
* 32-bit longword. So all we have to do is return the low byte
* (or low 6 bits, but `low byte' is typically just as easy if not
* easier).
*
* This technique is also applicable to 64 and 128 bit words, but
* 256 bit or larger word sizes require at least one more masking
* step.
*/
int
tG_sumbits(n)
register unsigned long n;
{

n = (n & 0x55555555) + ((n >> 1) & 0x55555555);
n = (n & 0x33333333) + ((n >> 2) & 0x33333333);
n = (n + (n >> 4)) & 0x0f0f0f0f;
n += n >> 8;
n += n >> 16;
return (n & 0xff);
}

Nice optimization job on that parallel adder, Chris!

I'm sure I'll find other gems as I read that posting!

For myself, and on behalf of others, thanks for
that post!

Rufus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top