avoiding division

nisheethg · Jan 21, 2007

Hi
I got to implement (A*x + B*y)/(x+y).
A and B can take value from 0 to 255. x and y are non-negative
integer i.e. 0 and above.
(x+y) can be max. of 8.
For example if (x+y) is 7. and x is 2
(A*2 + B*5)/7

Is there any way to avoid division by scaling using left right
shift or somethin...??

thanks,

HT-Lab · Jan 22, 2007

Hi
I got to implement (A*x + B*y)/(x+y).
A and B can take value from 0 to 255. x and y are non-negative
integer i.e. 0 and above.
(x+y) can be max. of 8.
For example if (x+y) is 7. and x is 2
(A*2 + B*5)/7

Is there any way to avoid division by scaling using left right
shift or somethin...??

thanks,

Google for restoring and non-restoring dividers, they only require shift and
addition/subtraction. If you want a faster converging method then google for
Newton-Raphson and Goldschmidt, they do require multiplication.

Hans
www.ht-lab.com

Ben Jones · Jan 22, 2007

Hi
I got to implement (A*x + B*y)/(x+y).
A and B can take value from 0 to 255. x and y are non-negative
integer i.e. 0 and above.
(x+y) can be max. of 8.
For example if (x+y) is 7. and x is 2
(A*2 + B*5)/7

So x, y, and x+y are extremely small integers (0-8), about 3/4 bits, right?

Why not simply do a reciprocal lookup for (x+y), which is a ROM with 3-input
address, then multiply?

Or perhaps even better, re-arrange your formula to give:

A * (x/(x+y)) + B * (y/(x+y))

Given x and y you can have a lookup table for (x/(x+y)) and (y/(x+y)), just
6-8 address bits. If you want a multi-cycle resource shared implementation
you only need one multiplier, one adder and one lookup ROM. You'll have to
decide (i.e. work out) what precision you need for the lookup though.

Good luck,

-Ben-

Wolfgang Grafen · Jan 23, 2007

Ben said:
So x, y, and x+y are extremely small integers (0-8), about 3/4 bits, right?

Why not simply do a reciprocal lookup for (x+y), which is a ROM with 3-input
address, then multiply?

Or perhaps even better, re-arrange your formula to give:

A * (x/(x+y)) + B * (y/(x+y))

Given x and y you can have a lookup table for (x/(x+y)) and (y/(x+y)), just
6-8 address bits. If you want a multi-cycle resource shared implementation
you only need one multiplier, one adder and one lookup ROM. You'll have to
decide (i.e. work out) what precision you need for the lookup though.

While I couldn't follow Ben's proposal completely I have a possible
solution based on his LUT idea reduced to 4 address bits

If
1 <= x+y <= 8 and
0 <= x <= 8 and
0 <= y <= 8

there are a number of unique combinations for (a + b) with
a=min(x,y) and
b=max(x,y)

a+b: 8 7 6 5 4 3 2 1
-------------------------------------------------
1: 0+8 0+7 0+6 0+5 0+4 0+3 0+2 0+1
2: 1+7 1+6 1+5 1+4 1+3 1+2 1+1
3: 2+6 2+5 2+4 2+3 2+2
4: 3+5 3+4 3+3
5: 4+4

The number of combinations for the LUT can be further reduced:

(.) for the first row which which always results into (0, b)
(*) for the last row when the sum (a+b) is even we get
always a==b with the result of (0.5, 0.5)
(-) (1+3) can be mapped to (2+6) with the same result
(-) (1+2) can be mapped to (2+4) with the same result

thus we obtain the reduced table with 10 entries

a+b: 8 7 6 5 4 3 2 1
-------------------------------------------------
1: . . . . . . . .
2: 1+7 1+6 1+5 1+4 - - *
3: 2+6 2+5 2+4 2+3 *
4: 3+5 3+4 *
5: *

which can be mapped into a 4 bit address range.

Now I assume we have access to an 4 bit ROM based LUT with 16 possible
entries. By trial and error we find that leftshifting a for 2 postitions
and extending the bitpositions of a[1] = a[0] with '1' and a exor
operation of a and b will give 10 unique address positions

so for the LUT
a is [1, 2, 3] and
b is [3, 4, 5, 6, 7]

a_shifted and bitwise ored with 3 yields to:
a_shift_filled is [0x7, 0xb, 0xf]

A little trial and error and then we map and exor a_shift_filled and b
for the addresses and get

a+b => a_shift_filled xor b = address_for_LUT
---------------------------------------------
1+7 => 0x7^7 = 0
1+6 => 0x7^6 = 1
1+5 => 0x7^5 = 2
1+4 => 0x7^4 = 3
2+6 => 0xb^6 = 13
2+5 => 0xb^5 = 14
2+4 => 0xb^4 = 15
2+3 => 0xb^3 = 8
3+5 => 0xf^5 = 10
3+4 => 0xf^4 = 11

the address mapping for the LUT. The results c_a and c_b have to be
stored into the LUT, for (1+7) at address 0 and (2+5) at address 14 etc.

In the following the algorithm implemented in a working and verified
Python program:

------ BEGIN: division_example.py ----------------

#!/usr/bin/env python

na = "NA"

LUT = [ # address_for_LUT
(1./8., 7./8.), # 0
(1./7., 6./7.), # 1
(1./6., 5./6.), # 2
(1./5., 4./5.), # 3
( na , na ), # 4
( na , na ), # 5
( na , na ), # 6
( na , na ), # 7
(2./5., 3./5.), # 8
( na , na ), # 9
(3./8., 5./8.), # 10
(3./7., 4./7.), # 11
( na , na ), # 12
(2./8., 6./8.), # 13 also 1/4
(2./7., 5./7.), # 14
(2./6., 4./6.), # 15 also 1/3
]

def calculate(x, y):
a = min(x,y)
b = max(x,y)

if x>y:
is_xy_swapped = True

else:
is_xy_swapped = False

x_plus_y = x + y

if x_plus_y == 0:
c_a = ZeroDivisionError
c_b = ZeroDivisionError
is_calculated=True

elif a == 0:
c_a = 0
c_b = 1
is_calculated=True

elif (x_plus_y % 2) == 0: # is_even(x_plus_y)
if a == b:
c_a = c_b = 0.5
is_calculated = True

else:
is_calculated = False

else:
is_calculated = False

if not is_calculated:
if x_plus_y == 3: # (1/3)
a_equivalent = 2
b_equivalent = 4

elif x_plus_y == 4: # (1/4)
a_equivalent = 2
b_equivalent = 6

else:
a_equivalent = a
b_equivalent = b

address_for_LUT = ((a_equivalent<<2)+3)^b_equivalent
c_a, c_b = LUT[address_for_LUT]

if is_xy_swapped:
c_x = c_b
c_y = c_a

else:
c_x = c_a
c_y = c_b

return c_x, c_y

------ BEGIN: division_example.py ----------------

regards

wolfgang

HT-Lab · Jan 23, 2007

Wolfgang Grafen said:
While I couldn't follow Ben's proposal completely I have a possible
solution based on his LUT idea reduced to 4 address bits

...snip..

wow... I hope this was not a homework assignment

Hans
www.ht-lab.com

Runtime Error with __gcd? (floating point exception)	1	Nov 28, 2024
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Problem with codewars.	5	Dec 4, 2023
Programming math challenge gives wrong answer	2	Aug 6, 2023
Graph of quadratic function with CanvasRenderingContext2D	2	May 9, 2024
Docplex package in python	0	Nov 8, 2022
Win screen error	0	Jul 28, 2024
How to use ufixed when it involves multiplication a number of times?(VHDL question)	0	Aug 22, 2016

avoiding division

nisheethg

HT-Lab

Ben Jones

Wolfgang Grafen

HT-Lab

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads