C
Carlos
Hello,
This is a bit of a general question concerning coding styles.
Suppose that you want to implement the following equation in a fully pipelined manner:
((a + b) + (c + d)*e)*f
Ideally, in order to achieve good timing (more or less), I would tend to perform almost every arithmetic operation on its own into an intermediate result.
Afterwards I would combine intermediate results, for example as shown below (this is just some pseudo code, I might have made some errors).
process (my_clk)
begin
if rising_edge(my_clk) then
if my_reset = '1' then
ab1 <= (others => '0');
ab2 <= (others => '0');
cd1 <= (others => '0');
cde2 <= (others => '0');
abcde3 <= (others => '0');
abcdef4 <= (others => '0');
e1 <= (others => '0');
f1 <= (others => '0');
f2 <= (others => '0');
f3 <= (others => '0');
else
-- Stage (1)
ab1 <= a + b;
cd1 <= c + d;
e1 <= e; -- Delay
f1 <= f; -- Delay
-- Stage (2)
cde2 <= cd1 * e1;
f2 <= f1; -- Delay
ab2 <= ab1; -- Delay
-- Stage (3)
abcde3 <= ab2 + cde2;
f3 <= f2; -- Delay
-- Stage (4)
abcdef4 <= abcde3 * f3;
end if;
end if;
end process;
For now, I keep a "number" to indicate the pipeline stage I'm in, since I don't want to add an intermediate result with an "older" one.
This might be fine, but assuming we are doing some large interpolation formula which consists of 20 arithmetic terms, this might become quite cumbersome, especially that you have to give a meaningful name for the intermediate values.
My questions are the following:
#1- Is there a better coding style to keep the code clean and easily modifiable? Maybe use variables instead of signals etc... (I'm interested in knowing how you guys would implement that).
#2- Is there a way to just insert the whole equation and give enough pipelining registers for the synthesis tool to use/duplicate/balance/insert in order to achieve something neat.
(In this case latency is not an issue, we are more concerned with throughput).
For example, could we do something like, the below code, and let the synthesis tool do its magic?
#3- If #2 is doable, will this code be more or less portable to other devices, or will I have to manually tweak the synthesis parameters for different devices.
my_eq1 <= ((a + b) + (c + d)*e)*f;
my_eq2 <= my_eq1;
my_eq3 <= my_eq2;
my_eq4 <= my_eq3;
-- ...
In reality we are only going to read the signal "my_eq4" the others are just pipeline stages that we don't care about.
Notes:
- I am assuming that all vector widths are correctly set.
- I also understand that, depending on the platform and available resources, an architecture might be better suited than another.
Any cool coding tips would be appreciated. Thanks!
This is a bit of a general question concerning coding styles.
Suppose that you want to implement the following equation in a fully pipelined manner:
((a + b) + (c + d)*e)*f
Ideally, in order to achieve good timing (more or less), I would tend to perform almost every arithmetic operation on its own into an intermediate result.
Afterwards I would combine intermediate results, for example as shown below (this is just some pseudo code, I might have made some errors).
process (my_clk)
begin
if rising_edge(my_clk) then
if my_reset = '1' then
ab1 <= (others => '0');
ab2 <= (others => '0');
cd1 <= (others => '0');
cde2 <= (others => '0');
abcde3 <= (others => '0');
abcdef4 <= (others => '0');
e1 <= (others => '0');
f1 <= (others => '0');
f2 <= (others => '0');
f3 <= (others => '0');
else
-- Stage (1)
ab1 <= a + b;
cd1 <= c + d;
e1 <= e; -- Delay
f1 <= f; -- Delay
-- Stage (2)
cde2 <= cd1 * e1;
f2 <= f1; -- Delay
ab2 <= ab1; -- Delay
-- Stage (3)
abcde3 <= ab2 + cde2;
f3 <= f2; -- Delay
-- Stage (4)
abcdef4 <= abcde3 * f3;
end if;
end if;
end process;
For now, I keep a "number" to indicate the pipeline stage I'm in, since I don't want to add an intermediate result with an "older" one.
This might be fine, but assuming we are doing some large interpolation formula which consists of 20 arithmetic terms, this might become quite cumbersome, especially that you have to give a meaningful name for the intermediate values.
My questions are the following:
#1- Is there a better coding style to keep the code clean and easily modifiable? Maybe use variables instead of signals etc... (I'm interested in knowing how you guys would implement that).
#2- Is there a way to just insert the whole equation and give enough pipelining registers for the synthesis tool to use/duplicate/balance/insert in order to achieve something neat.
(In this case latency is not an issue, we are more concerned with throughput).
For example, could we do something like, the below code, and let the synthesis tool do its magic?
#3- If #2 is doable, will this code be more or less portable to other devices, or will I have to manually tweak the synthesis parameters for different devices.
my_eq1 <= ((a + b) + (c + d)*e)*f;
my_eq2 <= my_eq1;
my_eq3 <= my_eq2;
my_eq4 <= my_eq3;
-- ...
In reality we are only going to read the signal "my_eq4" the others are just pipeline stages that we don't care about.
Notes:
- I am assuming that all vector widths are correctly set.
- I also understand that, depending on the platform and available resources, an architecture might be better suited than another.
Any cool coding tips would be appreciated. Thanks!