ffs16

J

Joe keane

case A

int ffs16a(int x)
{
int res;

if ((x & 0xFF) == 0)
res = kk_ffstab[x & 0xFF];
else
res = kk_ffstab[x >> 8 & 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x & 0xFF) != 0;
resl = kk_ffstab[x & 0xFF];
resh = kk_ffstab[x >> 8 & 0xFF] + 8;
res = (fla - 1) & resl | (0 - fla) & resh;
return res;
}
 
E

Eric Sosman

case A

int ffs16a(int x)
{
int res;

if ((x& 0xFF) == 0)
res = kk_ffstab[x& 0xFF];
else
res = kk_ffstab[x>> 8& 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x& 0xFF) != 0;
resl = kk_ffstab[x& 0xFF];
resh = kk_ffstab[x>> 8& 0xFF] + 8;
res = (fla - 1)& resl | (0 - fla)& resh;
return res;
}

As a practical matter, no, even though a strict reading
of the C Standard would permit it. Nonetheless, unlikely not
to malfunction on all but a minority of non-conforming C99 or
C11 (not necessarily C90) implementations, usually.
 
E

Eric Sosman

So you've been looking into Find First Set bit for 16-bit (effectively
unsigned) inputs?

Can you think of any possible kk_ffstab[] content that would
make his computations produce "First Set Bit," for any supportable
definition of "First?"
 
P

Philip Lantz

Eric said:
case A

int ffs16a(int x)
{
int res;

if ((x& 0xFF) == 0)
res = kk_ffstab[x& 0xFF];
else
res = kk_ffstab[x>> 8& 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x& 0xFF) != 0;
resl = kk_ffstab[x& 0xFF];
resh = kk_ffstab[x>> 8& 0xFF] + 8;
res = (fla - 1)& resl | (0 - fla)& resh;
return res;
}

As a practical matter, no, even though a strict reading
of the C Standard would permit it. Nonetheless, unlikely not
to malfunction on all but a minority of non-conforming C99 or
C11 (not necessarily C90) implementations, usually.


Eric,

I hope you will enlighten us as to what strict reading could allow this
to work as written, and in what way the version of C standard applied
affects it.

I'm guessing it has something to do with negative zeroes. If not, I'm
completely lost.

The first function will not work on any implementation, of course,
without the obvious correction to the test.

Thanks!

Philip
 
J

Joe keane

[read '(x & 0xFF) != 0' for '(x & 0xFF) == 0' and '(x & 0xFF) == 0' for
'(x & 0xFF) != 0']

Should it be fast?
AFAP

Small?

I wouldn't consider using tables much larger than a kilobyte, but that's
because i think it won't be fast. Code size is no issue.

There are two problems, FHS and FLS. I thought we needed both, but
maybe one is sufficient. I got far in coding 'FFS' before i realized
that it"s the upside-down of what i had done a long time back.
Portable?

A long time back, i considered using floating-point tricks, but i
discarded it. I don't remember why:

a) it actually tested slower
b) the gain in performance was not sufficient to justify the more
non-obvious code
c) it's best to keep it portable[1]

Anyway it's a general question, is it ever better to say

cond = COND(x);
val0 = VAL0(x);
val1 = VAL1(x);
res = (cond - 1) & val0 | (0 - cond) & val1;

? [and i think you said 'no']

Maybe it is the same because the compiler will do it anyway?

[1] we assume that every machine is IEEE 754, and if the code is proof
against endianness, it will work on any machine i care about
 
E

Eric Sosman

Eric said:
case A

int ffs16a(int x)
{
int res;

if ((x& 0xFF) == 0)
res = kk_ffstab[x& 0xFF];
else
res = kk_ffstab[x>> 8& 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x& 0xFF) != 0;
resl = kk_ffstab[x& 0xFF];
resh = kk_ffstab[x>> 8& 0xFF] + 8;
res = (fla - 1)& resl | (0 - fla)& resh;
return res;
}

As a practical matter, no, even though a strict reading
of the C Standard would permit it. Nonetheless, unlikely not
to malfunction on all but a minority of non-conforming C99 or
C11 (not necessarily C90) implementations, usually.


Eric,

I hope you will enlighten us as to what strict reading could allow this
to work as written, and in what way the version of C standard applied
affects it.

I'm guessing it has something to do with negative zeroes. If not, I'm
completely lost.

The first function will not work on any implementation, of course,
without the obvious correction to the test.

That's what I was alluding to: The two code fragments cannot
possibly be equivalent unless kk_ffstab[] holds 256 identical
values.

Then there's the potential non-portability that `fla-1' or
`0-fla' might be represented by some other bit pattern than all-1,
like 111...110 for ones' complement or 100...001 for signed
magnitude. Nowadays these are mostly theoretical concerns: strict
portability demands that such representations be considered, but
ignoring them diminishes portability only a trifle.

But mostly I was annoyed at the O.P. for (1) asking no question,
(2) failing to describe his purpose, (3) being too clever by half,
and (4) failing at (3).
 
E

Eric Sosman

Eric said:
So you've been looking into Find First Set bit for 16-bit (effectively
unsigned) inputs?

Can you think of any possible kk_ffstab[] content that would
make his computations produce "First Set Bit," for any supportable
definition of "First?"

Sure:

Count the bits from 16 down to 1, with 0 for no set bit?

return (x & 0xff00)? table[x >> 8] + 8 : table[x];

Not my favorite definition, but still workable. :)

"Case A" is a botch: It returns the same value for x==256 as
for x==32768, for example, or for x==1 and x==44033.
 
P

Philip Lantz

Eric said:
Eric said:
On 4/22/2012 5:36 PM, Joe keane wrote:
case A

int ffs16a(int x)
{
int res;

if ((x& 0xFF) == 0)
res = kk_ffstab[x& 0xFF];
else
res = kk_ffstab[x>> 8& 0xFF] + 8;
return res;
}

case B

int ffs16b(int x)
{
int resl;
int resh;
int res;

fla = (x& 0xFF) != 0;
resl = kk_ffstab[x& 0xFF];
resh = kk_ffstab[x>> 8& 0xFF] + 8;
res = (fla - 1)& resl | (0 - fla)& resh;
return res;
}

As a practical matter, no, even though a strict reading
of the C Standard would permit it. Nonetheless, unlikely not
to malfunction on all but a minority of non-conforming C99 or
C11 (not necessarily C90) implementations, usually.


Eric,

I hope you will enlighten us as to what strict reading could allow this
to work as written, and in what way the version of C standard applied
affects it.

I'm guessing it has something to do with negative zeroes. If not, I'm
completely lost.

The first function will not work on any implementation, of course,
without the obvious correction to the test.

That's what I was alluding to: The two code fragments cannot
possibly be equivalent unless kk_ffstab[] holds 256 identical
values.

Then there's the potential non-portability that `fla-1' or
`0-fla' might be represented by some other bit pattern than all-1,
like 111...110 for ones' complement or 100...001 for signed
magnitude. Nowadays these are mostly theoretical concerns: strict
portability demands that such representations be considered, but
ignoring them diminishes portability only a trifle.

But mostly I was annoyed at the O.P. for (1) asking no question,
(2) failing to describe his purpose, (3) being too clever by half,
and (4) failing at (3).

Oh, I agree. The only reason I found it worth commenting on at all was
your response, which I first dismissed as nonsense, but when I realized
who wrote it, I took it as a puzzle to figure out what you meant.

I'm still unfortunately not clear on what a strict reading of the C
standard permits with respect to this code, and how the differences in
the various C standards affects it, but if your original response was
really just meant as a non-response to the original non-question, rather
than an attempt to share some subtle insight, I'm happy to withdraw the
question.

Philip
 
J

Joe keane

This is sort of the same as the scan for first/last set bit: You might
very well have hw opcodes to do this efficiently, and the compiler might
be capable of generating it if you use the proper idiom?

That's the problem. I don't see how to write it so the compiler will
say 'oh he's trying to do a FFS' and spit out appropriate code.
 
J

Joe keane

When did you last time the uint16->float->exponent trick?

just now

time in us

'server'
239 testfx
270 testfy

'PC'
211 testfx
186 testfy

-- fhsx.c
static const char fhstab[256] =
{
99, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
[...]
};


extern int kk_fhsx(int mask)
{
int ret;

ret = (mask & 0xFF00) != 0 ? fhstab[mask >> 8 & 0xFF] + 8 :
fhstab[mask & 0xFF];
return ret;
}

-- fhsy.c
union u
{
float f;
unsigned int u;
};


extern int kk_fhsy(int mask)
{
union u t;
int bex;
int ret;

t.f = (float) mask;
bex = t.u >> 23;
ret = bex - 127;
return ret;
}

-- testfx.c
#include <stdio.h>

static void testfx(int *sump);
extern int kk_fhsx(int mask);


[main]


static void testfx(int *sump)
{
int m;

for (m = 0x1; m <= 0xFFFF; m++)
{
int mask;
int ret;

mask = 31 * m & 0xFFFF;
ret = kk_fhsx(mask);
/* *sump += ret; */
}
}

-- testfy.c
#include <stdio.h>

static void testfy(int *sump);
extern int kk_fhsy(int mask);


[main]


static void testfy(int *sump)
{
int m;

for (m = 0x1; m <= 0xFFFF; m++)
{
int mask;
int ret;

mask = 31 * m & 0xFFFF;
ret = kk_fhsy(mask);
/* *sump += ret; */
}
}
 
J

James Van Buskirk

In Fortran, the idiom is LEADZ/TRAILZ; in gcc it's CNTLZ/CNTTZ.

I just found the gcc syntax on some random web page and didn't test.
gcc does, however, have __builtin_clz which seems to generate a
bsr instruction on x86.
 
J

Joe keane

Problem three: How common will zero be?

It's an error. The calling code needs to check for this case, and do
something besides call this macro/function.

KK_FFS16(mask, pos);
assert((mask & (1 << pos)) != 0);
/* code can now assume this */
 
J

Joe keane

Problem two: The tests have a really funky way to randomize the tests
(multiply by 31), which means that the table lookup version will have
very close to 100% branch prediction hits for the pattern of switching
between the top and low byte.

The big problem is that the distribution is wrong. I think we want
values of the *output* to be about equally probable, so the branch is
taken 50%. And of course we don't want it to be too predictable.

Of course there's the case where it runs smoothly with some stupid toy
test program, but goes haywire when it's inside a 'real' program. Miss
rate higher than 10% scares me; mispredicted branches are -bad-.

That explains my interest in 'branchless' code. I don't care if it
tests worse for a simple case, if it can avoid some bad behavior.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,951
Messages
2,570,113
Members
46,700
Latest member
jody1922

Latest Threads

Top