How to describe a pipeline structure in VHDL

Ingmar Seifert · Aug 12, 2003

Hello,

I want to implement an algorithm, that is based on multiplication and
afterwards accumulation. The last step is to store the result from the
addition to registers.

I don't produce one "product" on this pipeline. I produce different
"products", so I have a multifunctional pipeline.
The different calculations are for example:
seq. MUL ADD STORE result in
1 a*b product+x m00
2 product*b product+y m01

So my question is, how to desribe the structure of the pipeline and the
control FSM.
I have thought about a FSM, that has a state for each "product" and
gives out control signals, that are delayed by D-FF to reach each
pipeline stage at the right time and to say, which operation has to be done.

Each pipeline stage lasts one clock cycle.

I would be very happy about a short piece of code or some hints.
Thanks in advance for your help.

Ingmar Seifert

VhdlCohen · Aug 12, 2003

I want to implement an algorithm, that is based on multiplication and

afterwards accumulation. The last step is to store the result from the
addition to registers.

I don't produce one "product" on this pipeline. I produce different
"products", so I have a multifunctional pipeline.
The different calculations are for example:
seq. MUL ADD STORE result in
1 a*b product+x m00
2 product*b product+y m01

So my question is, how to desribe the structure of the pipeline and the
control FSM.
I have thought about a FSM, that has a state for each "product" and
gives out control signals, that are delayed by D-FF to reach each
pipeline stage at the right time and to say, which operation has to be done.

Each pipeline stage lasts one clock cycle.

I would be very happy about a short piece of code or some hints.

How about the following (unchecked for syntax)
architecture y of x is
type opr_enum is (MULT, ADD, STORE);
type Reg_array_typ is array(0 to 4) of std_logic_vector(31 downto 0);
signal reg1_array : Reg_array_typ;
signal reg2_array : Reg_array_typ;
signal reg3array : Reg_array_typ;

begin -- architecture y
Comp_proc : process (clk, rst_n) is
begin -- process Comp_proc
if rst_n = '0' then -- asynchronous reset (active low)

elsif clk'event and clk = '1' then -- rising clock edge
for i in reg1_array'range loop
case i is -- i is the stage number
when 0 => reg1_array(0) <= reg2_array(0)* reg3array(0); --Mult
when 1 => reg1_array(1) <= reg2_array(1)+ reg3array(1); --add
when 2 => mem(addr) <= reg1_array(1); -- store
when others => null;
end case;
end loop; -- i
end if;
end process Comp_proc;
end architecture y;
----------------------------------------------------------------------------
Ben Cohen Publisher, Trainer, Consultant (310) 721-4830
http://www.vhdlcohen.com/ (e-mail address removed)
Author of following textbooks:
* Using PSL/SUGAR with Verilog and VHDL
Guide to Property Specification Language for ABV, 2003 isbn 0-9705394-4-4
* Real Chip Design and Verification Using Verilog and VHDL, 2002 isbn
0-9705394-2-8
* Component Design by Example ", 2001 isbn 0-9705394-0-1
* VHDL Coding Styles and Methodologies, 2nd Edition, 1999 isbn 0-7923-8474-1
* VHDL Answers to Frequently Asked Questions, 2nd Edition, isbn 0-7923-8115
------------------------------------------------------------------------------

Ingmar Seifert · Aug 13, 2003

VhdlCohen said:
How about the following (unchecked for syntax)
architecture y of x is
type opr_enum is (MULT, ADD, STORE);
type Reg_array_typ is array(0 to 4) of std_logic_vector(31 downto 0);
signal reg1_array : Reg_array_typ;
signal reg2_array : Reg_array_typ;
signal reg3array : Reg_array_typ;

begin -- architecture y
Comp_proc : process (clk, rst_n) is
begin -- process Comp_proc
if rst_n = '0' then -- asynchronous reset (active low)

elsif clk'event and clk = '1' then -- rising clock edge
for i in reg1_array'range loop
case i is -- i is the stage number
when 0 => reg1_array(0) <= reg2_array(0)* reg3array(0); --Mult
when 1 => reg1_array(1) <= reg2_array(1)+ reg3array(1); --add
when 2 => mem(addr) <= reg1_array(1); -- store
when others => null;
end case;
end loop; -- i
end if;
end process Comp_proc;
end architecture y;

Why isn't reg1_array(0) (which is obviously the product) used as an
operand to the adder in the next stage? Is it only a mistake?
I think I understood, what will be synthsized.

But my problem is that I have to calculate different things on this
pipeline and store then the result in different registers.
Later I want to use the multiplier and adder as seperate units (not
correlated through the pipeline).
To illustrate the problem:

run1 a*b --> product+c --> store in m00
run2 a*product --> product+d --> store in m01
runx ... --> ... --> ...

a*product is done to exponentiate, that is the reason why I can't use a
MAC from a core generator.

How can I desribe/control the behaviour of such a pipeline?

Regards
Ingmar Seifert

VhdlCohen · Aug 13, 2003

Why isn't reg1_array(0) (which is obviously the product) used as an

operand to the adder in the next stage? Is it only a mistake?
I think I understood, what will be synthsized.

But my problem is that I have to calculate different things on this
pipeline and store then the result in different registers.
Later I want to use the multiplier and adder as seperate units (not
correlated through the pipeline).
To illustrate the problem:

run1 a*b --> product+c --> store in m00
run2 a*product --> product+d --> store in m01
runx ... --> ... --> ...

a*product is done to exponentiate, that is the reason why I can't use a
MAC from a core generator.

How can I desribe/control the behaviour of such a pipeline?

-- The regX_array array represents the registers with a pipe depth
-- equal to regX_array'range. Thus, regX_array(0) is the first pipe,
-- regX_array(1), the 2nd pipe. in the loop previously described,
-- the i represents the pipe depth. As you know, all registers, regardless of
--where in the pipe can be read. All registers can be written, but you need
--to ensure that that are written by a single source in each cycle
-- (i.e., regx <= xxx; -- at clock x
-- regx <= yyy; -- at same clock x is a NO NO, unless
-- you want the last assignment in the same process to win).
--
-- Thus, if you have different cases or runs:
-- Below is junk code that demonstrates the concept.
elsif clk'event and clk = '1' then -- rising clock edge
case run_mode is
when run1 =>
for i in reg1_array'range loop
case i is -- i is the stage number
when 0 =>
reg1_array(0) <= reg2_array(0)* reg3array(0); --Mult
reg3array(0) <= whatever;
when 1 => reg1_array(1) <= reg2_array(0)+ reg3array(0); --add
when 2 => mem(addr) <= reg1_array(1); -- store
when others => null;
end case;
end loop; -- i;
when run2 =>
for i in reg1_array'range loop
case i is -- i is the stage number
when 0 =>
reg1_array(0) <= reg2_array(2)+ reg3array(2); --Mult
reg3array(0) <= whateverelse;
when 1 => reg1_array(0) <= reg2_array(0)+ reg3array(0); --add
when 2 => mem(addr) <= reg1_array(1); -- store
when others => null;
end case;
end if;
--

----------------------------------------------------------------------------
Ben Cohen Publisher, Trainer, Consultant (310) 721-4830
http://www.vhdlcohen.com/ (e-mail address removed)
Author of following textbooks:
* Using PSL/SUGAR with Verilog and VHDL
Guide to Property Specification Language for ABV, 2003 isbn 0-9705394-4-4
* Real Chip Design and Verification Using Verilog and VHDL, 2002 isbn
0-9705394-2-8
* Component Design by Example ", 2001 isbn 0-9705394-0-1
* VHDL Coding Styles and Methodologies, 2nd Edition, 1999 isbn 0-7923-8474-1
* VHDL Answers to Frequently Asked Questions, 2nd Edition, isbn 0-7923-8115
------------------------------------------------------------------------------

Thomas Stanka · Aug 13, 2003

Ingmar Seifert said:
But my problem is that I have to calculate different things on this
pipeline and store then the result in different registers.
Later I want to use the multiplier and adder as seperate units (not
correlated through the pipeline).
To illustrate the problem:

run1 a*b --> product+c --> store in m00
run2 a*product --> product+d --> store in m01
runx ... --> ... --> ...

It seems to me you need a structure with a multiplier followed by an
adder with control logic to allow you to use different inputregister
and store in different ozutput register.
I wonder if you need
y(t)<=a*b;
y(t+1)<=a*y(t);z<=y(t)+c;
Or
y(t)<=a*b+c
y(t+1)<=a*y(t)+c;

I don't want to produce the whole code for this but it seems to me,
you need an adder, a multiplier and a lot of registers plus
multiplexer but not really a pipeline (except for timing purpose).

bye Thomas

Ingmar Seifert · Aug 13, 2003

Thomas said:
It seems to me you need a structure with a multiplier followed by an
adder with control logic to allow you to use different inputregister
and store in different ozutput register.
I wonder if you need
y(t)<=a*b;
y(t+1)<=a*y(t);z<=y(t)+c;
Or
y(t)<=a*b+c
y(t+1)<=a*y(t)+c;

I need the first one.

I don't want to produce the whole code for this but it seems to me,
you need an adder, a multiplier and a lot of registers plus
multiplexer but not really a pipeline (except for timing purpose).

Yes indeed. It isn't a typical pipeline as we have learnt at university
but the second stage gets the result of the first one.

I post a short part of the code of my control FSM, I have written until
now. It works correctly (in HW too), but I'm not that happy with it. It
isn't easy to extend and doesn't look that good.

WHEN R1 =>
state <= R2;
factor1 <= EXT(m00_row,11);
factor2 <= EXT(y,3);
summand1a <= EXT(product,19);
summand2a <= EXT(m20_img,19);
--
m10_row <= EXT(sum1,m10_row'LENGTH);

WHEN R2 =>
state <= R3;
factor1 <= EXT(product,11);
factor2 <= EXT(y,3);
summand1a <= EXT(product,19);
summand2a <= EXT(m01_img,19);
--
m20_img <= EXT(sum1,m20_img'LENGTH);

I have described the input "registers" of the multplier and the adder by
signals that get their value at rising clock_edge. I had to do this,
because the synthesize tool synthesized more than one mult. and adder.
This operand registers are used by 2 processes
(product<=factor1*factor2; and sum1<=summand1a+summand2a

The problem is that in clock 1 I'm in state R1 and have to set the
operands for the multiplier. They get their values in the next clock
cycle. In this cycle I'm in state R2 and have to control what will be
done with the product that is now ready.
So the two operations belong together.
In the solution I posted above I set the addition registers in R2, even
if they belong to R1.

Now my question: is it a good idea to set the signals, that belong
togehter: the mul operand choice, the add operand choice and the store
register choice in state R1 and delay these control signals for the
adder by 1 cycle and the signal for a (to be done) registerbank by 2 cycles?
Is there a common way to solve such a problem? Is it good idea to do so
or are there other solutions?

Thanks in advance for some hints on this topic
Ingmar Seifert

Thomas Stanka · Aug 14, 2003

Ingmar Seifert said:
This operand registers are used by 2 processes
(product<=factor1*factor2; and sum1<=summand1a+summand2a

The problem is that in clock 1 I'm in state R1 and have to set the
operands for the multiplier. They get their values in the next clock
cycle. In this cycle I'm in state R2 and have to control what will be
done with the product that is now ready.
So the two operations belong together.
In the solution I posted above I set the addition registers in R2, even
if they belong to R1.

Now my question: is it a good idea to set the signals, that belong
togehter: the mul operand choice, the add operand choice and the store
register choice in state R1 and delay these control signals for the
adder by 1 cycle and the signal for a (to be done) registerbank by 2 cycles?
Is there a common way to solve such a problem? Is it good idea to do so
or are there other solutions?

Well I think thats a matter of style. I would prefer to set all
signals just-in-time, but maybe you think it's better to verify your
code with delayed signals.

BTW you don't need the pipeline anyway

.
architecture.....
begin
P<=m1*m1;
S<=s1+s2;

s1<=P when normalmode else
Muxin;
process(Clk)
if Rising_Edge(Clk) then
case state
when s0 =>
normalmode<=false;
m1<=a;
m2<=b;
when s1=>
normalmode<=true;
m1<=a;
m2<=P; -- value of last multiplication
s2<=c;
.....

bye Thomas

satyans7 · Oct 8, 2009

vhdl code for 4 bit pipelined adder and multiplier with testbench

i am new to vhdl...
can any one help me

satyans7 · Oct 10, 2009

i want to simulate a 4-bit pipeline adder and a 4-bit pipeline multiplier...

can anybody send me the code with testbench.....

thanking you in advance

swatig29 · Nov 4, 2009

could some one please help me with my project.
I need to implement linear equation solver (Ax =B) in VHDL using any linear equation solving method like Jacobi or PRA etc.
I am new in VHDL so I am really not getting how to approach this problem.
I will be really thankful if someone could help me out here.

swatig29 · Nov 4, 2009

Also i need to implement a circuit in VHDl that adds 3 unsigned four bit numbers using only four bit adders.
Please help me with the code.

How to use Densenet121 in monai	0	Feb 16, 2024
vhdl for data forwarding in a pipeline machine	1	Oct 9, 2003
How to model a buffer in VHDL	6	Apr 5, 2006
i need a help to solve a problem in VHDL	1	Nov 21, 2006
Delay of control signals	0	Aug 18, 2003
How to inline assembly in a C program?	4	Mar 3, 2013
How to implement a combo Web and Desktop app in python.	3	Sep 14, 2012
how to declare structure pointer in within a class	5	Dec 12, 2006

How to describe a pipeline structure in VHDL

Ingmar Seifert

VhdlCohen

Ingmar Seifert

VhdlCohen

Thomas Stanka

Ingmar Seifert

Thomas Stanka

satyans7

satyans7

swatig29

swatig29

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads