Ben said:
So x, y, and x+y are extremely small integers (0-8), about 3/4 bits, right?
Why not simply do a reciprocal lookup for (x+y), which is a ROM with 3-input
address, then multiply?
Or perhaps even better, re-arrange your formula to give:
A * (x/(x+y)) + B * (y/(x+y))
Given x and y you can have a lookup table for (x/(x+y)) and (y/(x+y)), just
6-8 address bits. If you want a multi-cycle resource shared implementation
you only need one multiplier, one adder and one lookup ROM. You'll have to
decide (i.e. work out) what precision you need for the lookup though.
While I couldn't follow Ben's proposal completely I have a possible
solution based on his LUT idea reduced to 4 address bits
If
1 <= x+y <= 8 and
0 <= x <= 8 and
0 <= y <= 8
there are a number of unique combinations for (a + b) with
a=min(x,y) and
b=max(x,y)
a+b: 8 7 6 5 4 3 2 1
-------------------------------------------------
1: 0+8 0+7 0+6 0+5 0+4 0+3 0+2 0+1
2: 1+7 1+6 1+5 1+4 1+3 1+2 1+1
3: 2+6 2+5 2+4 2+3 2+2
4: 3+5 3+4 3+3
5: 4+4
The number of combinations for the LUT can be further reduced:
(.) for the first row which which always results into (0, b)
(*) for the last row when the sum (a+b) is even we get
always a==b with the result of (0.5, 0.5)
(-) (1+3) can be mapped to (2+6) with the same result
(-) (1+2) can be mapped to (2+4) with the same result
thus we obtain the reduced table with 10 entries
a+b: 8 7 6 5 4 3 2 1
-------------------------------------------------
1: . . . . . . . .
2: 1+7 1+6 1+5 1+4 - - *
3: 2+6 2+5 2+4 2+3 *
4: 3+5 3+4 *
5: *
which can be mapped into a 4 bit address range.
Now I assume we have access to an 4 bit ROM based LUT with 16 possible
entries. By trial and error we find that leftshifting a for 2 postitions
and extending the bitpositions of a[1] = a[0] with '1' and a exor
operation of a and b will give 10 unique address positions
so for the LUT
a is [1, 2, 3] and
b is [3, 4, 5, 6, 7]
a_shifted and bitwise ored with 3 yields to:
a_shift_filled is [0x7, 0xb, 0xf]
A little trial and error and then we map and exor a_shift_filled and b
for the addresses and get
a+b => a_shift_filled xor b = address_for_LUT
---------------------------------------------
1+7 => 0x7^7 = 0
1+6 => 0x7^6 = 1
1+5 => 0x7^5 = 2
1+4 => 0x7^4 = 3
2+6 => 0xb^6 = 13
2+5 => 0xb^5 = 14
2+4 => 0xb^4 = 15
2+3 => 0xb^3 = 8
3+5 => 0xf^5 = 10
3+4 => 0xf^4 = 11
the address mapping for the LUT. The results c_a and c_b have to be
stored into the LUT, for (1+7) at address 0 and (2+5) at address 14 etc.
In the following the algorithm implemented in a working and verified
Python program:
------ BEGIN: division_example.py ----------------
#!/usr/bin/env python
na = "NA"
LUT = [ # address_for_LUT
(1./8., 7./8.), # 0
(1./7., 6./7.), # 1
(1./6., 5./6.), # 2
(1./5., 4./5.), # 3
( na , na ), # 4
( na , na ), # 5
( na , na ), # 6
( na , na ), # 7
(2./5., 3./5.), # 8
( na , na ), # 9
(3./8., 5./8.), # 10
(3./7., 4./7.), # 11
( na , na ), # 12
(2./8., 6./8.), # 13 also 1/4
(2./7., 5./7.), # 14
(2./6., 4./6.), # 15 also 1/3
]
def calculate(x, y):
a = min(x,y)
b = max(x,y)
if x>y:
is_xy_swapped = True
else:
is_xy_swapped = False
x_plus_y = x + y
if x_plus_y == 0:
c_a = ZeroDivisionError
c_b = ZeroDivisionError
is_calculated=True
elif a == 0:
c_a = 0
c_b = 1
is_calculated=True
elif (x_plus_y % 2) == 0: # is_even(x_plus_y)
if a == b:
c_a = c_b = 0.5
is_calculated = True
else:
is_calculated = False
else:
is_calculated = False
if not is_calculated:
if x_plus_y == 3: # (1/3)
a_equivalent = 2
b_equivalent = 4
elif x_plus_y == 4: # (1/4)
a_equivalent = 2
b_equivalent = 6
else:
a_equivalent = a
b_equivalent = b
address_for_LUT = ((a_equivalent<<2)+3)^b_equivalent
c_a, c_b = LUT[address_for_LUT]
if is_xy_swapped:
c_x = c_b
c_y = c_a
else:
c_x = c_a
c_y = c_b
return c_x, c_y
------ BEGIN: division_example.py ----------------
regards
wolfgang