ffs16

Joe keane · Apr 22, 2012

case A

int ffs16a(int x)
{
int res;

if ((x & 0xFF) == 0)
res = kk_ffstab[x & 0xFF];
else
res = kk_ffstab[x >> 8 & 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x & 0xFF) != 0;
resl = kk_ffstab[x & 0xFF];
resh = kk_ffstab[x >> 8 & 0xFF] + 8;
res = (fla - 1) & resl | (0 - fla) & resh;
return res;
}

Keith Thompson · Apr 22, 2012

case A

int ffs16a(int x)
{ [7 lines deleted]
}

case B

int ffs16b(int x)
{ [9 lines deleted]
}

Yes, and did you have a question?

Eric Sosman · Apr 23, 2012

case A

int ffs16a(int x)
{
int res;

if ((x& 0xFF) == 0)
res = kk_ffstab[x& 0xFF];
else
res = kk_ffstab[x>> 8& 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x& 0xFF) != 0;
resl = kk_ffstab[x& 0xFF];
resh = kk_ffstab[x>> 8& 0xFF] + 8;
res = (fla - 1)& resl | (0 - fla)& resh;
return res;
}

As a practical matter, no, even though a strict reading
of the C Standard would permit it. Nonetheless, unlikely not
to malfunction on all but a minority of non-conforming C99 or
C11 (not necessarily C90) implementations, usually.

Eric Sosman · Apr 24, 2012

So you've been looking into Find First Set bit for 16-bit (effectively
unsigned) inputs?

Can you think of any possible kk_ffstab[] content that would
make his computations produce "First Set Bit," for any supportable
definition of "First?"

Philip Lantz · Apr 24, 2012

Eric said:
case A

int ffs16a(int x)
{
int res;

if ((x& 0xFF) == 0)
res = kk_ffstab[x& 0xFF];
else
res = kk_ffstab[x>> 8& 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x& 0xFF) != 0;
resl = kk_ffstab[x& 0xFF];
resh = kk_ffstab[x>> 8& 0xFF] + 8;
res = (fla - 1)& resl | (0 - fla)& resh;
return res;
}

Click to expand...

As a practical matter, no, even though a strict reading
of the C Standard would permit it. Nonetheless, unlikely not
to malfunction on all but a minority of non-conforming C99 or
C11 (not necessarily C90) implementations, usually.

Eric,

I hope you will enlighten us as to what strict reading could allow this
to work as written, and in what way the version of C standard applied
affects it.

I'm guessing it has something to do with negative zeroes. If not, I'm
completely lost.

The first function will not work on any implementation, of course,
without the obvious correction to the test.

Thanks!

Philip

Joe keane · Apr 24, 2012

[read '(x & 0xFF) != 0' for '(x & 0xFF) == 0' and '(x & 0xFF) == 0' for
'(x & 0xFF) != 0']

Should it be fast?
AFAP

Small?

I wouldn't consider using tables much larger than a kilobyte, but that's
because i think it won't be fast. Code size is no issue.

There are two problems, FHS and FLS. I thought we needed both, but
maybe one is sufficient. I got far in coding 'FFS' before i realized
that it"s the upside-down of what i had done a long time back.

Portable?

A long time back, i considered using floating-point tricks, but i
discarded it. I don't remember why:

a) it actually tested slower
b) the gain in performance was not sufficient to justify the more
non-obvious code
c) it's best to keep it portable[1]

Anyway it's a general question, is it ever better to say

cond = COND(x);
val0 = VAL0(x);
val1 = VAL1(x);
res = (cond - 1) & val0 | (0 - cond) & val1;

? [and i think you said 'no']

Maybe it is the same because the compiler will do it anyway?

[1] we assume that every machine is IEEE 754, and if the code is proof
against endianness, it will work on any machine i care about

Eric Sosman · Apr 25, 2012

Eric said:
Eric said:

case A

int ffs16a(int x)
{
int res;

if ((x& 0xFF) == 0)
res = kk_ffstab[x& 0xFF];
else
res = kk_ffstab[x>> 8& 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x& 0xFF) != 0;
resl = kk_ffstab[x& 0xFF];
resh = kk_ffstab[x>> 8& 0xFF] + 8;
res = (fla - 1)& resl | (0 - fla)& resh;
return res;
}

Click to expand...

As a practical matter, no, even though a strict reading
of the C Standard would permit it. Nonetheless, unlikely not
to malfunction on all but a minority of non-conforming C99 or
C11 (not necessarily C90) implementations, usually.

Click to expand...

Eric,

I hope you will enlighten us as to what strict reading could allow this
to work as written, and in what way the version of C standard applied
affects it.

I'm guessing it has something to do with negative zeroes. If not, I'm
completely lost.

The first function will not work on any implementation, of course,
without the obvious correction to the test.

That's what I was alluding to: The two code fragments cannot
possibly be equivalent unless kk_ffstab[] holds 256 identical
values.

Then there's the potential non-portability that `fla-1' or
`0-fla' might be represented by some other bit pattern than all-1,
like 111...110 for ones' complement or 100...001 for signed
magnitude. Nowadays these are mostly theoretical concerns: strict
portability demands that such representations be considered, but
ignoring them diminishes portability only a trifle.

But mostly I was annoyed at the O.P. for (1) asking no question,
(2) failing to describe his purpose, (3) being too clever by half,
and (4) failing at (3).

Eric Sosman · Apr 25, 2012

Eric said:
Eric said:

So you've been looking into Find First Set bit for 16-bit (effectively
unsigned) inputs?

Click to expand...

Can you think of any possible kk_ffstab[] content that would
make his computations produce "First Set Bit," for any supportable
definition of "First?"

Click to expand...

Sure:

Count the bits from 16 down to 1, with 0 for no set bit?

return (x & 0xff00)? table[x >> 8] + 8 : table[x];

Not my favorite definition, but still workable.

"Case A" is a botch: It returns the same value for x==256 as
for x==32768, for example, or for x==1 and x==44033.

Philip Lantz · Apr 25, 2012

Eric said:
Eric said:

On 4/22/2012 5:36 PM, Joe keane wrote:
case A

int ffs16a(int x)
{
int res;

if ((x& 0xFF) == 0)
res = kk_ffstab[x& 0xFF];
else
res = kk_ffstab[x>> 8& 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x& 0xFF) != 0;
resl = kk_ffstab[x& 0xFF];
resh = kk_ffstab[x>> 8& 0xFF] + 8;
res = (fla - 1)& resl | (0 - fla)& resh;
return res;
}

As a practical matter, no, even though a strict reading
of the C Standard would permit it. Nonetheless, unlikely not
to malfunction on all but a minority of non-conforming C99 or
C11 (not necessarily C90) implementations, usually.

Click to expand...

Eric,

I hope you will enlighten us as to what strict reading could allow this
to work as written, and in what way the version of C standard applied
affects it.

I'm guessing it has something to do with negative zeroes. If not, I'm
completely lost.

The first function will not work on any implementation, of course,
without the obvious correction to the test.

Click to expand...

That's what I was alluding to: The two code fragments cannot
possibly be equivalent unless kk_ffstab[] holds 256 identical
values.

Then there's the potential non-portability that `fla-1' or
`0-fla' might be represented by some other bit pattern than all-1,
like 111...110 for ones' complement or 100...001 for signed
magnitude. Nowadays these are mostly theoretical concerns: strict
portability demands that such representations be considered, but
ignoring them diminishes portability only a trifle.

But mostly I was annoyed at the O.P. for (1) asking no question,
(2) failing to describe his purpose, (3) being too clever by half,
and (4) failing at (3).

Oh, I agree. The only reason I found it worth commenting on at all was
your response, which I first dismissed as nonsense, but when I realized
who wrote it, I took it as a puzzle to figure out what you meant.

I'm still unfortunately not clear on what a strict reading of the C
standard permits with respect to this code, and how the differences in
the various C standards affects it, but if your original response was
really just meant as a non-response to the original non-question, rather
than an attempt to share some subtle insight, I'm happy to withdraw the
question.

Philip

Joe keane · Apr 25, 2012

This is sort of the same as the scan for first/last set bit: You might
very well have hw opcodes to do this efficiently, and the compiler might
be capable of generating it if you use the proper idiom?

That's the problem. I don't see how to write it so the compiler will
say 'oh he's trying to do a FFS' and spit out appropriate code.

James Van Buskirk · Apr 25, 2012

That's the problem. I don't see how to write it so the compiler will
say 'oh he's trying to do a FFS' and spit out appropriate code.

Click to expand...

In Fortran, the idiom is LEADZ/TRAILZ; in gcc it's CNTLZ/CNTTZ.

Joe keane · Apr 25, 2012

When did you last time the uint16->float->exponent trick?

just now

time in us

'server'
239 testfx
270 testfy

'PC'
211 testfx
186 testfy

-- fhsx.c
static const char fhstab[256] =
{
99, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
[...]
};

extern int kk_fhsx(int mask)
{
int ret;

ret = (mask & 0xFF00) != 0 ? fhstab[mask >> 8 & 0xFF] + 8 :
fhstab[mask & 0xFF];
return ret;
}

-- fhsy.c
union u
{
float f;
unsigned int u;
};

extern int kk_fhsy(int mask)
{
union u t;
int bex;
int ret;

t.f = (float) mask;
bex = t.u >> 23;
ret = bex - 127;
return ret;
}

-- testfx.c
#include <stdio.h>

static void testfx(int *sump);
extern int kk_fhsx(int mask);

[main]

static void testfx(int *sump)
{
int m;

for (m = 0x1; m <= 0xFFFF; m++)
{
int mask;
int ret;

mask = 31 * m & 0xFFFF;
ret = kk_fhsx(mask);
/* *sump += ret; */
}
}

-- testfy.c
#include <stdio.h>

static void testfy(int *sump);
extern int kk_fhsy(int mask);

[main]

static void testfy(int *sump)
{
int m;

for (m = 0x1; m <= 0xFFFF; m++)
{
int mask;
int ret;

mask = 31 * m & 0xFFFF;
ret = kk_fhsy(mask);
/* *sump += ret; */
}
}

James Van Buskirk · Apr 25, 2012

In Fortran, the idiom is LEADZ/TRAILZ; in gcc it's CNTLZ/CNTTZ.

I just found the gcc syntax on some random web page and didn't test.
gcc does, however, have __builtin_clz which seems to generate a
bsr instruction on x86.

Joe keane · Apr 26, 2012

Problem three: How common will zero be?

It's an error. The calling code needs to check for this case, and do
something besides call this macro/function.

KK_FFS16(mask, pos);
assert((mask & (1 << pos)) != 0);
/* code can now assume this */

Joe keane · Apr 26, 2012

Problem two: The tests have a really funky way to randomize the tests
(multiply by 31), which means that the table lookup version will have
very close to 100% branch prediction hits for the pattern of switching
between the top and low byte.

The big problem is that the distribution is wrong. I think we want
values of the *output* to be about equally probable, so the branch is
taken 50%. And of course we don't want it to be too predictable.

Of course there's the case where it runs smoothly with some stupid toy
test program, but goes haywire when it's inside a 'real' program. Miss
rate higher than 10% scares me; mispredicted branches are -bad-.

That explains my interest in 'branchless' code. I don't care if it
tests worse for a simple case, if it can avoid some bad behavior.

Factorial case 13 and 16	1	Sep 29, 2022
i implemented a simple count sort please find problem with it	1	Aug 10, 2021
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 23, 2021
Javascript programming in TheThingsNetwork	1	May 12, 2022
Change Calculator	3	Jan 25, 2023
How to try a range of hex values in C# code ?	0	Nov 19, 2022
Qsort() messing with my entire Code	0	Apr 25, 2022
Adding adressing of IPv6 to program	1	Feb 16, 2023

ffs16

Joe keane

Keith Thompson

Eric Sosman

Eric Sosman

Philip Lantz

Joe keane

Eric Sosman

Eric Sosman

Philip Lantz

Joe keane

James Van Buskirk

Joe keane

James Van Buskirk

Joe keane

Joe keane

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads