Efficient division/remainder in C

Chuck F. · Jan 18, 2006

pete said:
Jordan said:

I'm surprised no one was curious about the fast C-code multiply.
It typically takes shorts in, long out -- a common case.
If you've not seen the trick before, consider it a puzzle
to reverse engineer it from the fragment:
c = arr[x+y] - arr[x-y]; /* c = x * y */
It gives competitive performance even on some machines with
blazingly fast multiplies.

Click to expand...

OK, i'll bite - what goes in the array?

Click to expand...

.... snip ...

int main(void)
{
unsigned x, y, arr[256];

for (x = 0; 255 > x; ++x) {
arr[x] = x * x / 4;
}

And how do you conclude that x*x is divisible by 4? (or even x)?

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>

pete · Jan 18, 2006

Chuck said:
Jordan said:

James Dow Allen <[email protected]> wrote:

Click to expand...

I'm surprised no one was curious about the fast C-code multiply.
It typically takes shorts in, long out -- a common case.
If you've not seen the trick before, consider it a puzzle
to reverse engineer it from the fragment:
c = arr[x+y] - arr[x-y]; /* c = x * y */
It gives competitive performance even on some machines with
blazingly fast multiplies.

OK, i'll bite - what goes in the array?

Click to expand...

... snip ...

int main(void)
{
unsigned x, y, arr[256];

for (x = 0; 255 > x; ++x) {
arr[x] = x * x / 4;
}

Click to expand...

And how do you conclude that x*x is divisible by 4? (or even x)?

I simply choose not to care,
and the problem goes away by itself,
just like everything else in C programming:

8 * 3 ==

8 + 3 == 11
8 - 3 == 5

arr[11] == 121 / 4 == 30
arr[ 5] == 25 / 4 == 6

30 - 6 == 24

8 * 3 == 24

Chuck F. · Jan 18, 2006

pete said:
Chuck said:

pete said:

Jordan Abel wrote:

I'm surprised no one was curious about the fast C-code
multiply. It typically takes shorts in, long out -- a
common case. If you've not seen the trick before,
consider it a puzzle to reverse engineer it from the
fragment: c = arr[x+y] - arr[x-y]; /* c = x * y */ It
gives competitive performance even on some machines with
blazingly fast multiplies.

OK, i'll bite - what goes in the array?
... snip ...

int main(void)
{
unsigned x, y, arr[256];

for (x = 0; 255 > x; ++x) {
arr[x] = x * x / 4;
}

Click to expand...

And how do you conclude that x*x is divisible by 4? (or even x)?

Click to expand...

I simply choose not to care, and the problem goes away by
itself, just like everything else in C programming:

8 * 3 ==

8 + 3 == 11
8 - 3 == 5

arr[11] == 121 / 4 == 30
arr[ 5] == 25 / 4 == 6

30 - 6 == 24

8 * 3 == 24

Are you trying to force me to write modular diophantine equations
and prove this?

I can conceive it may compensate, and may look
later.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>

James Dow Allen · Jan 19, 2006

x*x need not be divisible by 4, but it is always of residue class 0 or
1!
The residues thus aren't big enough to cause trouble!!

I simply choose not to care,
and the problem goes away by itself,
just like everything else in C programming:

I'm also a fan of C, but didn't realize it was *that* good!

James

Netocrat · Jan 19, 2006

x*x need not be divisible by 4, but it is always of residue class 0 or
1!
The residues thus aren't big enough to cause trouble!!

More precisely: if the only possible remainders are 0 and 0.25, this
implies that the remainder must be identical for the two expressions (it's
no doubt possible to prove this more directly), so that removing the
remainders in the expressions through integer conversion has the same
result as cancelling them in the subtraction.

(those expressions being (a+b)**2 / 4, from which (a-b)**2 / 4 is
subtracted, where ** is exponentiation)

In any case, it was an interesting technique that you shared. Doing some
quick calculations, assuming a short int of 16 bits and int of 32 bits,
and allowing for signed multiplication, the complete table would consume
approx 524,292 bytes (halved if you add a conditional to reuse a positive
index when the index is negative). So you can exchange half a meg memory
for faster multiplication of short ints on platforms where the technique
is actually fast enough to warrant it.

I'm also a fan of C, but didn't realize it was *that* good!

Perhaps pete's symbiosis with C is as poetic as his postings.

Richard Bos · Jan 19, 2006

Chuck F. said:
pete said:

Chuck said:

pete wrote:
Jordan Abel wrote:

I'm surprised no one was curious about the fast C-code
multiply. It typically takes shorts in, long out -- a
common case. If you've not seen the trick before,
consider it a puzzle to reverse engineer it from the
fragment: c = arr[x+y] - arr[x-y]; /* c = x * y */ It
gives competitive performance even on some machines with
blazingly fast multiplies.

OK, i'll bite - what goes in the array?

... snip ...

int main(void)
{
unsigned x, y, arr[256];

for (x = 0; 255 > x; ++x) {
arr[x] = x * x / 4;
}

And how do you conclude that x*x is divisible by 4? (or even x)?

Click to expand...

I simply choose not to care, and the problem goes away by
itself, just like everything else in C programming:

8 * 3 ==

8 + 3 == 11
8 - 3 == 5

arr[11] == 121 / 4 == 30
arr[ 5] == 25 / 4 == 6

30 - 6 == 24

8 * 3 == 24

Click to expand...

Are you trying to force me to write modular diophantine equations
and prove this? I can conceive it may compensate, and may look
later.

If x==2a, y==2b, then
x+y==2a+2b, ((x+y)**2)/4 == (4aa+8ab+4bb)/4 == aa+2ab+bb;
x-y==2a-2b, ((x-y)**2)/4 == (4aa-8ab+4bb)/4 == aa-2ab+bb;
((x+y)**2)/4 - ((x-y)**2)/4 == aa+2ab+bb - (aa-2ab-bb) == 4ab == xy.

If x==2a+1, y==2b+1, then
x+y==2a+2b+2, ((x+y)**2)/4 == (4aa+8ab+4bb+8a+8b+4)/4 ==
aa+2ab+bb+2a+2b+2;
x-y==2a-2b, ((x-y)**2)/4 == (4aa-8ab+4bb)/4 == aa-2ab+bb;
((x+y)**2)/4 - ((x-y)**2)/4 == aa+2ab+bb+2a+2b+1 - (aa-2ab+bb) ==
4ab+2a+2b+1 == (2a+1)*(2b+1) == xy.

If x==2a+1, y==2b, then
x+y==2a+2b+1, ((x+y)**2)/4 == (4aa+8ab+4bb+4a+4b+1)/4 ==
aa+2ab+bb+a+b;
x-y==2a-2b+1, ((x-y)**2)/4 == (4aa-8ab+4bb+4a-4b+1)/4 ==
aa-2ab-bb+a-b;
((x+y)**2)/4 - ((x-y)**2)/4 == aa+2ab+bb+a+b - (aa-2ab+bb+a-b) ==
4ab+2b == (2a+1)*2b == xy.

If x==2a, y==2b-1, then
x+y==2a+2b+1, ((x+y)**2)/4 == (4aa+8ab+4bb+4a+4b+1)/4 ==
aa+2ab+bb+a+b;
x-y==2a-2b-1, ((x-y)**2)/4 == (4aa-8ab+4bb-4a+4b+1)/4 ==
aa-2ab-bb-a+b;
((x+y)**2)/4 - ((x-y)**2)/4 == aa+2ab+bb+a+b - (aa-2ab+bb-a+b) ==
4ab+2a == 2a*(2b+1) == xy.

Note: integer divisions throughout, rounding down, as in C.
Note on the note: this only matters if _either_ x or y is odd, not if
both are.

Richard

pete · Jan 19, 2006

Chuck said:
diophantine

you me friend good

pete · Jan 19, 2006

Netocrat said:
Perhaps pete's symbiosis with C is as poetic as his postings.

I was just kidding around.

James Dow Allen · Jan 20, 2006

Netocrat said:
More precisely: if the only possible remainders are 0 and 0.25, ...

A residue of 3 would also not cause trouble, as long as you remember
to make the remainder -0.25 instead of 0.75. My blood pressure goes
up a little (when I forget my meds) whenever I divide a negative
number because of the sadness that >>2 and /4 no longer
work the same on 2's-complement machines. Fortunately there is
no hypertension risk here, since x*x is never negative.

In any case, it was an interesting technique that you shared. Doing some
quick calculations, assuming a short int of 16 bits and int of 32 bits,
and allowing for signed multiplication, the complete table would consume
approx 524,292 bytes...

That's assuming your multipliers use the full 16-bit range; often
you'll get away
with much less. The technique may be obsolescent now since hard-wired
multipliers have become so frisky.

I *did* use the technique on The World's Fastest Jpeg Compressor (tm),
but only on Sun's early Sparc with its bizarrely slow multiplication.

James

Michael Wojcik · Jan 20, 2006

That's assuming your multipliers use the full 16-bit range; often
you'll get away with much less. The technique may be obsolescent
now since hard-wired multipliers have become so frisky.

I suspect that's true for general-purpose CPUs, with their fast integer
multiplication and cache sensitivities (which weigh against table-based
approaches in general). But it might still be useful for some embedded
applications with simple processors, particularly if as you say the
domain is restricted.

Thanks for posing this puzzle, by the way. I spent 10 minutes or so
working out values for the array on paper (using simultaneous
equations; I didn't bother digging out my number theory textbooks,
so I took the simple route). Interestingly, I came up with a
different way to compute the series: {(i/2)**2, (i/2)**2 + i/2, ...}
for i from 1 to N. That is, the first, third, and so on elements
are the squares, and the second, fourth, etc are the squares plus
their roots.

Anyway, it was an amusing exercise.

--
Michael Wojcik (e-mail address removed)

Some seem to live on credit as naturally as they breathe, and I remember
the surprise of one of these: "What! You don't owe anybody anything! Good
Lord! man, lend me half a sovereign." -- Arthur Ransome

division problem	13	Dec 22, 2006
[SUMMARY] Long Division (#180)	0	Oct 26, 2008
String of digits, certain radix, perform division	5	Oct 25, 2008
display the process of division in c	8	Dec 23, 2005
[QUIZ] Long Division (#180)	6	Oct 17, 2008
Division with ieee.numeric_std	5	Sep 21, 2006
Integer division with / - request explanation of behavior	12	Sep 30, 2006
Calculating longword pointer, which method is faster ?	4	May 27, 2010

Efficient division/remainder in C

Chuck F.

pete

Chuck F.

James Dow Allen

Netocrat

Richard Bos

pete

pete

James Dow Allen

Michael Wojcik

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads