changes for synthesizable code

V

vizziee

Hi.

I am working on parallel implementation of merge-sort (infact, the last
step of merge-sort where I need to only merge two sorted arrays of
equal lengths to form a bigger sorted array) in VHDL. I adapted an
existing program for my implementation and it works well during
simulation. However, it doesn't get synthesized. What changes should I
make in the design to make it synthesizable, without changing the
architecture. Or even if any change in the architecture is required,
only a few latencies are acceptable. Following is my code and the
errors I received while synthesis in Quartus II 5.0 (I think the
'Warning' may be ignored, my issue is more towards the 'Error' part).


Perhaps to reduce the logic element consumption, I can play in some
part of the code. But that should not cost higher latencies.

*****************************************************************************************

1 library ieee;
2 use ieee.std_logic_1164.all;
3
4 entity msort_c is
5 generic (
6 n: integer := 3;
7 w: integer := 4
8 );
9
10 port (
11 clock: in std_logic;
12
13 reset: in std_logic;
14
15 valid: in std_logic;
16
17 data_array1: in bit_vector (((n*w)-1) downto 0); -- This is the
first sorted data
18 -- array input port of the sorting unit
19
20 data_array2: in bit_vector (((n*w)-1) downto 0); -- This is the
second sorted data
21 -- array input port of the sorting unit
22
23 sorted_array: out bit_vector ((((2*n)*w)-1) downto 0)
24 );
25 end entity msort_c;
26
27
28 architecture msort_c_arch of msort_c is
29 begin
30 process (reset, clock, valid)
31 variable x1, x2: natural range 0 to n;
32 variable pick2: boolean;
33 begin
34 if (reset = '1') then
35 sorted_array <= (others => '0');
36 elsif (clock = '1' and clock'event) then
37 if (valid = '1') then
38
39 x1 := 0;
40 x2 := 0;
41
42 for i in 0 to ((2*n)-1) loop
43
44 -- Decide which list to pick from...
45 pick2 := FALSE;
46 if (x1 = n) then
47 pick2 := TRUE;
48 elsif (x2 /= n) then
49 pick2 := data_array1((((x1+1)*w)-1) downto (x1*w))
data_array2((((x2+1)*w)-1) downto (x2*w));
50 end if;
51
52 -- Pick
53 if pick2 then
54 sorted_array((((i+1)*w)-1) downto (i*w)) <=
data_array2((((x2+1)*w)-1) downto (x2*w));
55 x2 := x2 + 1;
56 else
57 sorted_array((((i+1)*w)-1) downto (i*w)) <=
data_array1((((x1+1)*w)-1) downto (x1*w));
58 x1 := x1 + 1;
59 end if; --end of 'pick' if loop
60 end loop; --end of i-choice 'for' loop
61 else
62 sorted_array <= (others => '0');
63 end if;--end of 'if-valid' loop
64 end if;--end of 'if-reset' loop
65 end process;
66
67 end architecture msort_c_arch;



***************************************************************************************************************************************************************************************************************************************************************************
*****************************************************************************************

Error: VHDL error at msort_c.vhd(49): left bound of range must be a
constant
Warning: VHDL Process Statement warning at msort_c.vhd(30): signal or
variable "sorted_array" may not be assigned a new value in every
possible path through the Process Statement. Signal or variable
"sorted_array" holds its previous value in every path with no new value
assignment, which may create a combinational loop in the current
design.
Error: Can't elaborate top-level user hierarchy
Error: Quartus II Analysis & Synthesis was unsuccessful. 2 errors, 1
warning
Error: Processing ended: Thu Jul 21 09:01:04 2005
Error: Elapsed time: 00:00:01
Error: Quartus II Full Compilation was unsuccessful. 2 errors, 1 warning
 
R

Ralf Hildebrandt

However, it doesn't get synthesized. ....
30 process (reset, clock, valid)

You don't need valid in the sensitivity list - but does not solve the
synthesis problem.

49 pick2 := data_array1((((x1+1)*w)-1) downto (x1*w)) ... ....
Error: VHDL error at msort_c.vhd(49): left bound of range must be a
constant

You know, that the upper and lower bound of data_array1 are connected
but for the synthesis tool these are just two numbers making the upper
und lower bound independend from each other. You have to find another
way a way to describe it.
A for-loop describing just the same as you wrote may be the solution,
because then for every bit the synthesis tool gets an exact description,
what to do.

But this may be not the final solution. In line 49 you compare two
vectors, but the length of the vectors is variable. What would that be
in hardware? I guess /several/ comparators - one for every vector width.
(Ignore ressource sharing at the moment.) I don't expect, that a
synthesis tool is capable of computing how many of such comparators are
needed - because at the moment I do not. Do you know it? ;-)
=> Model hardware - don't program VHDL! Model such comparators in an
explicit way. (You may use for-generate or a suitable nice way, but you
should define in a clear and exact way, what hardware you want. If you
do so, the synthesis tool will do it's job.)


Ralf
 
V

vizziee

Thanks Ralf for your inputs.
But this may be not the final solution. In line 49 you compare two
vectors, but the length of the vectors is variable. What would that be
in hardware? I guess /several/ comparators - one for every vector width.
(Ignore ressource sharing at the moment.) I don't expect, that a
synthesis tool is capable of computing how many of such comparators are
needed - because at the moment I do not. Do you know it? ;-)
The number of comparators is fixed. Since we always compare two numbers
of size w bits (so number of comarators = w). It is only the position
of these numbers in the two arrays that changes as per the 'pick'
value. The position can not be known before hand because the arrays can
have randomly any value.
=> Model hardware - don't program VHDL! Model such comparators in an
explicit way. (You may use for-generate or a suitable nice way, but you
should define in a clear and exact way, what hardware you want. If you
do so, the synthesis tool will do it's job.)
You are very correct. But I know the number of comparators (same
reasoning again). What I don't know is which input to feed in these
caomparators. That is decided by the 'pick' value.

Is there a known hardware architecture for this last step of
merge-sort?

vizziee.
 
S

sunny

The number of comparators is fixed. Since we always compare two numbers
of size w bits (so number of comarators = w). It is only the position
of these numbers in the two arrays that changes as per the 'pick'
value. The position can not be known before hand because the arrays can
have randomly any value.
If you know the number of comparators, that really solves the problem
to some extent. Since your hardware is fixed in that case. Perhaps I
sub-optimal way to solve the problem could be this: The problem with
the code is the varying position of the numbers (which, as you say, is
decided by other variable). If you can fix the position in your code,
the problem gets solved. So after every comparison, you can shift your
array so that the next number to be compared occupies the first
position. So you will always compare the first elements of the two
arrays. However, I don't know how much of the hardware this approach
will consume. But certainly, it could be the solution.
Is there a known hardware architecture for this last step of
merge-sort?
No idea. Even I would like to know one.

Sunny.
 
R

Ralf Hildebrandt

The number of comparators is fixed. Since we always compare two numbers
of size w bits (so number of comarators = w). It is only the position
of these numbers in the two arrays that changes as per the 'pick'
value. The position can not be known before hand because the arrays can
have randomly any value.

Ok, then you gave yourself the answer: Model a w-bit wide comparator and
several multiplexers to select the appropriate data.

Synthesis complains about style like sig(x downto y) where x and y are
some not-fixed integers, because then the synthesis tool assumes, that x
and y are independend. You have to model such a selection descriping are
strong dependency between the two limits like sig(x+8 downto x).
Sometimes a suitable way of modelling can be found using a for-loop.

It may be a good idea to split the description of the comparator from
the one of the mux. This makes it often more clear what exactly must be
the desired hardware. Furthermore the mux should be described carefully
using every detail you know.

Some details: If I look at the following pieces of code
42 for i in 0 to ((2*n)-1) loop ....
54 sorted_array((((i+1)*w)-1) downto (i*w))<= ...

then the obviously the limits of the selected part of the array are
fixed, but I guess it will be much easier for the synthesis tool, if you
model this for

sorted_array(1*w-1 dowto 0*w)<=...
sorted_array(2*w-1 dowto 1*w)<=...
....

Furthermore you should seperate all these statements into several
processes. To avoid a lot of typing you may use a for-generate
statement, but I would do this later, if everything is clear.

If you separate all these partial vectors you have to model the
selectors in your code (x1, x2, pick1, pick2...) in a different way, but
I guess this approach leads to a more clean desription of the desired
hardware.

What I don't know is which input to feed in these
caomparators. That is decided by the 'pick' value.

Yes, I see.
My comments ignore the behavior of the algorithm. (I have to say, that I
did not even think about it.) I have just looked at the code and tried
to see the related hardware, that is described. At the moment I do not
know it, because the code is complex.

But I think if you can decribe the selectors of the muxes in a very
clean way (this means, that you have to know it for yourself and you
have to model it in a way, that can be understood by the synthesis tool)
you have the solution. And this:
Is there a known hardware architecture for this last step of
merge-sort?

is exactly the point. If I am not wrong we have figured out three main
parts of this architecture: the comparators, the muxes and the
mux-selectors (controlled by something we do not know at the moment).

I suggest to follow this top-down approach: Model every known component
in an explicit way and then the unknown parts will be more visible.

Ralf
 
V

vizziee

sunny said:
If you know the number of comparators, that really solves the problem
to some extent. Since your hardware is fixed in that case. Perhaps I
sub-optimal way to solve the problem could be this: The problem with
the code is the varying position of the numbers (which, as you say, is
decided by other variable). If you can fix the position in your code,
the problem gets solved. So after every comparison, you can shift your
array so that the next number to be compared occupies the first
position. So you will always compare the first elements of the two
arrays. However, I don't know how much of the hardware this approach
will consume. But certainly, it could be the solution.
Thanx sunny, I tried your idea and it worked both in simulation as well
as synthesis. Ofcourse LE usage has gone up exponentially. But probably
that's the price. Any idea of bringing it down here.
 
V

vizziee

Thanx a lot Ralf for your help.

Ralf said:
(e-mail address removed) wrote:


then the obviously the limits of the selected part of the array are
fixed, but I guess it will be much easier for the synthesis tool, if you
model this for

sorted_array(1*w-1 dowto 0*w)<=...
sorted_array(2*w-1 dowto 1*w)<=...
...
But won't this be a bit complicated compared to the simple shifting
that sunny suggested.
Furthermore you should seperate all these statements into several
processes. To avoid a lot of typing you may use a for-generate
statement, but I would do this later, if everything is clear.
But one process is enough. And why for-generate, a for-loop would also
do instead in a process.
If I am not wrong we have figured out three main
parts of this architecture: the comparators, the muxes and the
mux-selectors (controlled by something we do not know at the moment).
Yep...that's right. As for mux-selector logic, its decided by the
output of comparators because that's what decides the 'pick' value. But
this architecture is certianly not the optimum one. It consumes lot of
LEs. Any idea of bringing it down.

KVM.
 
M

Mike Treseler

Thanx sunny, I tried your idea and it worked both in simulation as well
as synthesis.

Consider posting the working version.
Ofcourse LE usage has gone up exponentially. But probably
that's the price. Any idea of bringing it down here.

Right now there is a full pass on every clock tick.
Using a process variable this could be slower.
Maybe 2, 4 or 8 clocks per pass.

-- Mike Treseler
 
M

Mike Treseler

But one process is enough. And why for-generate, a for-loop would also
do instead in a process.

I agree, but this is a matter of
style, not substance.

If we exclude tri-state drivers,
anything that can done by generating
processes can also be done
procedurally in a single process.
These are just alternate descriptions of the
same hardware.

A single process thread is easier to
read for the data_structures+algorithms
crowd. Multiple processes feels more like
a schematic netlist.

-- Mike Treseler
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top