Style of coding complex logic (particularly state machines)

M

Mike Treseler

rickman said:
Personally I think most problems in using HDLs in this way come not
directly from the way signals or variables are used, but rather from
the use of an HDL to describe the solution in an abstract way.

The problem is not that it can't be done
but rather a lack of tradition and good examples
of what the present generation of synthesis
tools can do if I let them.

-- Mike Treseler
 
R

rickman

I agreed with you completely. What I am trying to say is that variable
may not be synthesizable if you write the code with a "C
mentality."

I'm not sure I agree that variables are the problem at all. There are
many ways to write code that is not synthesizable. This can be done
with signals as well as variables. The difference between signals and
variables is just that the value of a variable is updated immediately
just like a 'C' variable. Signals are only updated at the end of the
process. So if you make an assignment to a variable and then use that
value in a calculation in the same process, the new value will be used.
If you do the same thing with a signal, the old value of the signal
will be used. I don't know of a way that this can be unsynthesizable.
Variables can not exist outside of a process, IIRC. So the variable
must be assigned to a signal in order for it to affect anything outside
the process. So in reality, it can only be used as an intermediate
value in an assignment to a signal.

Do you have an example of unsynthesizable code using a variable that
would be synthesizable with a signal?
 
M

Martin Thompson

(e-mail address removed) writes:

In synthesis, the problem is normally the abuse of sequential
statements, rather than the use of variable. I have seen people trying
to convert C segment into a VHDL process (you can have variables, for
loop, while loop, if, case, and even break inside a process) and
expecting synthesis software to figure out everything.

Why not do this? Synthesis software is good at figuring all this
out. If it does what you need it to and meets timing, you're done.
Move on to the next problem.

Personally, I have seen people spend far too long doing very explicit
coding of detailed stuff, giving the synth tool very little to do,
which for a relatively low-performance (still in the 10s of MHz
though) design, was a waste of effort. The so-called "naive" approach
of writing code in a natural "softwary" way and letting the synth sort
it out would have left us more time to sort out the one nitty-gritty
bit of code which did have a performance problem.

Sure, if you are pushing the performance envelope, you're going to
have to put more work in. If you are doing a high-volume design then
you might get in a smaller part and save some money by putting the
effort in. But that's just an engineering-tradeoff like any other.
Softies do it all the time, optimising their hardcore interrupt
handlers, leave the rest to the tools. I assume civil engineers do
similar things with their bridges as well :)
My 2 cents.

My tuppence :)

Cheers,
Martin
 
R

rickman

Martin said:
(e-mail address removed) writes:



Why not do this? Synthesis software is good at figuring all this
out. If it does what you need it to and meets timing, you're done.
Move on to the next problem.

Personally, I have seen people spend far too long doing very explicit
coding of detailed stuff, giving the synth tool very little to do,
which for a relatively low-performance (still in the 10s of MHz
though) design, was a waste of effort. The so-called "naive" approach
of writing code in a natural "softwary" way and letting the synth sort
it out would have left us more time to sort out the one nitty-gritty
bit of code which did have a performance problem.

Sure, if you are pushing the performance envelope, you're going to
have to put more work in. If you are doing a high-volume design then
you might get in a smaller part and save some money by putting the
effort in. But that's just an engineering-tradeoff like any other.
Softies do it all the time, optimising their hardcore interrupt
handlers, leave the rest to the tools. I assume civil engineers do
similar things with their bridges as well :)

Is TRW still around? I thought they were bought by Northrop Grumman.
I guess some part of TRW was not part of that deal? I used to word in
Defense Systems in McLean or whatever they called it that week.

I guess I am too old school to feel good about using 'C' like code.
Sure if it works, do it. But I always think in terms of hardware and
like to know what I am building before I let the tool build it. I
guess I would not want to debug a design where I didn't know what the
tool was doing. Then I would be debugging software and not hardware.
Maybe that works for some people, but I like to know the hardware I am
building so I know exactly how to debug it. That also includes
avoiding certain types of bugs that are caused by poorly designed
hardware. If the tool generated the hardware then I can't say it
doesn't have race conditions and such.
 
M

mikegurche

Martin said:
(e-mail address removed) writes:



Why not do this? Synthesis software is good at figuring all this
out. If it does what you need it to and meets timing, you're done.
Move on to the next problem.

If the synthesis software is really this capable, there is no need for
hardware engineers. Everyone can do hardware design after taking C
programming 101 and we all will become unemployed :(

Let me give an example. Assume that we want to design a sorting
circuit that sorts a register of 1000 8-bit word with minimal hardware.
For simplicity, let us use the bubble sort algorithm:

n=100
for (i=0; i<n-1; i++) {
for (j=0; j<n-1-i; j++)
if (a[j+1] < a[j]) { /* compare the two neighbors */
tmp = a[j]; /* swap a[j] and a[j+1] */
a[j] = a[j+1];
a[j+1] = tmp;
}
}

The hardware designer's approach is to develop a control FSM to mimic
the algorithm. It can be done with one 8-bit comparator in
0.5*1000*1000 clock cycles.

If we ignore the underlying hardware structure and just translate C
constructs to corresponding VHDL constructs directly (the C
programmer's approach), we can still derive correct VHDL code:

process(clock)
variable a: std_logic_vector(999 downto 0) of
std_logic_vector(7 dwonto 0);
variable tmp: std_logic_vector(7 dwonto 0);
begin
if (clock'event and clock='1') then
-- register
q <= d;
a := q;
-- combinational sorting circuit based on
-- one-to-one mapping of C constructs
for i in 0 to N-2 loop
for j in 0 to N-2-i loop
if (a(j+1) <a(j)) then
tmp := a(j);
a(j) := a(j+1);
a(j+1) := tmp;
end if;
end loop;
end loop;
-- result to register input
d <= a;
end process;

The resulting circuit can complete sorting in one clock cycle but
requires 0.5*1000*1000 8-bit comparators. We need a extremely large
target device to accommodate the synthesized circuit. It will be very
demanding for synthesis software to convert this code into a circuit
with only one comparator. I think my job is still safe, for now :)

Mike G.
 
M

mikegurche

I'm not sure I agree that variables are the problem at all. There are
many ways to write code that is not synthesizable. This can be done
with signals as well as variables. The difference between signals and
variables is just that the value of a variable is updated immediately
just like a 'C' variable. Signals are only updated at the end of the
process. So if you make an assignment to a variable and then use that
value in a calculation in the same process, the new value will be used.
If you do the same thing with a signal, the old value of the signal
will be used. I don't know of a way that this can be unsynthesizable.
Variables can not exist outside of a process, IIRC. So the variable
must be assigned to a signal in order for it to affect anything outside
the process. So in reality, it can only be used as an intermediate
value in an assignment to a signal.

I guess my wording is not very clear. Let me elaborate on the
statement. A variable in C is a symbolic memory location in a computer
and its function is close to a register in hardware. A statement like

sum =0;
for (i=0, i<10000, i++)
sum =sum + a;

implies that the addition is done 1024 times sequentially. With the
"C mentality" and with no knowledge of the underlying hardware
structure, the C code can be translated directly to VHDL variable and
sequential statements:

sum:= 0;
for i in 0 to 9999 do
sum := sum + a;
end loop;

When synthesized, this will infer 9999 adders. The problem with this
code is that it can be simulated correctly but leads to excessive
logic. Few statements like this make the circuit too complex to be
synthesized. To derive the right code, we have to think hardware and
then use a register for sum and derive an FSM to add a sequentially
in 9999 cycles.

Replacing the variable with a signal in the previous code cannot solve
the problem and actually renders the code incorrect. It at least
forces us to seek an alternative and think more about hardware.
Variables/sequential statements themselves do not lead to a good or bad
design, but provide a mechanism to describe the circuit in a very
abstract fashion. Careless use of these constructs may lead to
description that is too far away from the underlying hardware structure

Mike G.
 
K

KJ

Let me give an example. Assume that we want to design a sorting
circuit that sorts a register of 1000 8-bit word with minimal hardware.
For simplicity, let us use the bubble sort algorithm:

n=100
for (i=0; i<n-1; i++) {
for (j=0; j<n-1-i; j++)
if (a[j+1] < a[j]) { /* compare the two neighbors */
tmp = a[j]; /* swap a[j] and a[j+1] */
a[j] = a[j+1];
a[j+1] = tmp;
}
}

The hardware designer's approach is to develop a control FSM to mimic
the algorithm. It can be done with one 8-bit comparator in
0.5*1000*1000 clock cycles.

If we ignore the underlying hardware structure and just translate C
constructs to corresponding VHDL constructs directly (the C
programmer's approach), we can still derive correct VHDL code:

process(clock)
variable a: std_logic_vector(999 downto 0) of
std_logic_vector(7 dwonto 0);
variable tmp: std_logic_vector(7 dwonto 0);
begin
if (clock'event and clock='1') then
-- register
q <= d;
a := q;
-- combinational sorting circuit based on
-- one-to-one mapping of C constructs
for i in 0 to N-2 loop
for j in 0 to N-2-i loop
if (a(j+1) <a(j)) then
tmp := a(j);
a(j) := a(j+1);
a(j+1) := tmp;
end if;
end loop;
end loop;
-- result to register input
d <= a;
end process;

The resulting circuit can complete sorting in one clock cycle but
requires 0.5*1000*1000 8-bit comparators. We need a extremely large
target device to accommodate the synthesized circuit. It will be very
demanding for synthesis software to convert this code into a circuit
with only one comparator.
What is missing is some form of control to say, how many clock cycles
it can take to implement the algorithm. Given that additional control
on the synthesis the designer can make tradeoffs on how much logic
versus how much latency is acceptable and the surrounding circuitry can
be designed appropriately. Neither C nor VHDL inherently have this.

I don't think this is equivalent to what is generally meant when
synthesis tools do 'register retiming/balancing' either since even
after this task they still will meet the overall clock cycle latency of
the function. Recognizing that the overall clock cycle latency is
really an up front design tradeoff to be made and correctly
synthesizing code using that as an input parameter would be a leap
forward.
I think my job is still safe, for now :)
Yep, mine too.

KJ
 
A

Andy Ray

I don't think this is equivalent to what is generally meant when
synthesis tools do 'register retiming/balancing' either since even
after this task they still will meet the overall clock cycle latency of
the function. Recognizing that the overall clock cycle latency is
really an up front design tradeoff to be made and correctly
synthesizing code using that as an input parameter would be a leap
forward.


Hi,

I think that's the intention of mentor's Catapult C synthesis program,
though I haven't used it myself.

Cheers,

Andy
 
K

KJ

Andy said:
I think that's the intention of mentor's Catapult C synthesis program,
though I haven't used it myself.
Cool, thanks for the tip. Sounds like it's worth investigating.

KJ
 
M

Martin Thompson

rickman said:
Martin Thompson wrote:
Is TRW still around? I thought they were bought by Northrop Grumman.
I guess some part of TRW was not part of that deal? I used to word in
Defense Systems in McLean or whatever they called it that week.

Northrop bought the whole shebang and then the Automotive bit spun off
as separate entity.
I guess I am too old school to feel good about using 'C' like code.
Sure if it works, do it. But I always think in terms of hardware and
like to know what I am building before I let the tool build it. I
guess I would not want to debug a design where I didn't know what the
tool was doing. Then I would be debugging software and not hardware.
Maybe that works for some people, but I like to know the hardware I am
building so I know exactly how to debug it. That also includes
avoiding certain types of bugs that are caused by poorly designed
hardware. If the tool generated the hardware then I can't say it
doesn't have race conditions and such.

We're not yet at the stage of "chuck any old algorithm at the tools",
but I think synthesis tools get less cred for figuring things out than
they sometimes should.

Regarding tools - when I write software, I like to know what I'm
aiming for in terms of which bits of the processor are going to be
used as well. I wouldn't like to not know what my C-compiler is
likely to be doing either. However, plenty of people write software
without that understanding and in future plenty of people will write
hardware similarly. For a lot of applications, it won't matter IMHO.

It's all just a case of getting things done in the end :)

Cheers,
Martin
 
M

Martin Thompson

If the synthesis software is really this capable, there is no need for
hardware engineers. Everyone can do hardware design after taking C
programming 101 and we all will become unemployed :(

Well, eventually, it's going to happen. We'll be abstracted far
enough away from the hardware to still be productive.
Let me give an example. Assume that we want to design a sorting
circuit that sorts a register of 1000 8-bit word with minimal
hardware.

OK, but I notice you've said minimal hardware there - my point was
"gets the job done"... maybe I have an enormous FPGA.
For simplicity, let us use the bubble sort algorithm:

n=100
for (i=0; i<n-1; i++) {
for (j=0; j<n-1-i; j++)
if (a[j+1] < a[j]) { /* compare the two neighbors */
tmp = a[j]; /* swap a[j] and a[j+1] */
a[j] = a[j+1];
a[j+1] = tmp;
}
}

The hardware designer's approach is to develop a control FSM to mimic
the algorithm. It can be done with one 8-bit comparator in
0.5*1000*1000 clock cycles.

Yes. Maybe in 1 process, and maybe in 2, or maybe three :)
If we ignore the underlying hardware structure and just translate C
constructs to corresponding VHDL constructs directly (the C
programmer's approach), we can still derive correct VHDL code:

process(clock)
variable a: std_logic_vector(999 downto 0) of
std_logic_vector(7 dwonto 0);
variable tmp: std_logic_vector(7 dwonto 0);
begin
if (clock'event and clock='1') then
-- register
q <= d;
a := q;
-- combinational sorting circuit based on
-- one-to-one mapping of C constructs
for i in 0 to N-2 loop
for j in 0 to N-2-i loop
if (a(j+1) <a(j)) then
tmp := a(j);
a(j) := a(j+1);
a(j+1) := tmp;
end if;
end loop;
end loop;
-- result to register input
d <= a;
end process;

I wasn't suggesting directly mapping C-code to VHDL like this, just that
the "describe everything to the synthesizer" approach can be a bit
heavy-handed and time consuming.
The resulting circuit can complete sorting in one clock cycle but
requires 0.5*1000*1000 8-bit comparators. We need a extremely large
target device to accommodate the synthesized circuit. It will be very
demanding for synthesis software to convert this code into a circuit
with only one comparator. I think my job is still safe, for now :)

I don't suggest that the synth *can* make those sort of
transformations. It doesn't have the constraints currently to know.

As an aside, I think that FpgaC will do it with one comparator and a
memory, (maybe fpga_toys will chip in here).

In the future, we *will* be moving away from low-level stuff like this
(personally, I don't think *C* is high enough level, but that's another
debate :)

Cheers,
Martin
 
M

Mike Treseler

KJ said:
What is missing is some form of control to say, how many clock cycles
it can take to implement the algorithm.

Hmmm
One way to do that is to
declare i and/or j as variables
rather than loop constants.
Then throttle back to fewer loops per tick...

-- Mike Treseler
 
A

Andy

Take a look at Mentor's Catapult C. It translates untimed C models
into synthesizable VHDL or verilog. You must specify the clock period.
It assumes loop iterations take one clock, but individual instructions
do not. You can tell it to unroll loops to perform their statements in
parallel, or partially or fully pipeline the model, with varying clock
cycles per pipe state (i.e. the pipeline need not accept new data every
clock). The nice thing about it is that it takes care of the control
logic to turn the untimed (i.e. one-clock) description into a useable
hardware design that could target a variety of space-performance
trade-offs.

They have a very good interface synthesis approach. They can access
the inputs and outputs in parallel, streaming in order, or pick them
out of an external (to their model, not necessarily to the FPGA) single
or dual port ram. Internal arrays can also be stored in registers or
single or dual port rams. They also assume the whole hardware model
operates with one clock input and one reset input (asynchronous or
synchronous, per your choice).

The bottom line is that you wouldn't use it to design a whole chip,
but you would use it to design a hardware implementation of a software
algorithm. Then the chip designer, using verilog or vhdl, would
develop the rest of the chip around it.

They also provide the hooks for testing the hardware description from
original c code used to test the c model (at the transaction level),
using system-c concurrently with the vhdl or verilog model in modelsim.

In addition, they provide cycle-accurate vhdl and verilog models of the
hardware design. I chastised them on the fact that if they worked on
their synthesizable vhdl, they could use the same code for synthesis
and cycle based simulation (using variables and a single clocked
processes, etc.).

I took a 3 day training class with it a couple of weeks ago, and I, a
die-hard vhdl designer, was impressed!

Andy


Martin said:
If the synthesis software is really this capable, there is no need for
hardware engineers. Everyone can do hardware design after taking C
programming 101 and we all will become unemployed :(

Well, eventually, it's going to happen. We'll be abstracted far
enough away from the hardware to still be productive.
Let me give an example. Assume that we want to design a sorting
circuit that sorts a register of 1000 8-bit word with minimal
hardware.

OK, but I notice you've said minimal hardware there - my point was
"gets the job done"... maybe I have an enormous FPGA.
For simplicity, let us use the bubble sort algorithm:

n=100
for (i=0; i<n-1; i++) {
for (j=0; j<n-1-i; j++)
if (a[j+1] < a[j]) { /* compare the two neighbors */
tmp = a[j]; /* swap a[j] and a[j+1] */
a[j] = a[j+1];
a[j+1] = tmp;
}
}

The hardware designer's approach is to develop a control FSM to mimic
the algorithm. It can be done with one 8-bit comparator in
0.5*1000*1000 clock cycles.

Yes. Maybe in 1 process, and maybe in 2, or maybe three :)
If we ignore the underlying hardware structure and just translate C
constructs to corresponding VHDL constructs directly (the C
programmer's approach), we can still derive correct VHDL code:

process(clock)
variable a: std_logic_vector(999 downto 0) of
std_logic_vector(7 dwonto 0);
variable tmp: std_logic_vector(7 dwonto 0);
begin
if (clock'event and clock='1') then
-- register
q <= d;
a := q;
-- combinational sorting circuit based on
-- one-to-one mapping of C constructs
for i in 0 to N-2 loop
for j in 0 to N-2-i loop
if (a(j+1) <a(j)) then
tmp := a(j);
a(j) := a(j+1);
a(j+1) := tmp;
end if;
end loop;
end loop;
-- result to register input
d <= a;
end process;

I wasn't suggesting directly mapping C-code to VHDL like this, just that
the "describe everything to the synthesizer" approach can be a bit
heavy-handed and time consuming.
The resulting circuit can complete sorting in one clock cycle but
requires 0.5*1000*1000 8-bit comparators. We need a extremely large
target device to accommodate the synthesized circuit. It will be very
demanding for synthesis software to convert this code into a circuit
with only one comparator. I think my job is still safe, for now :)

I don't suggest that the synth *can* make those sort of
transformations. It doesn't have the constraints currently to know.

As an aside, I think that FpgaC will do it with one comparator and a
memory, (maybe fpga_toys will chip in here).

In the future, we *will* be moving away from low-level stuff like this
(personally, I don't think *C* is high enough level, but that's another
debate :)

Cheers,
Martin
 
M

Martin Thompson

Andy said:
Take a look at Mentor's Catapult C.

Thanks - I shall take another look, it's been a while since I looked
at that. Sounds a lot like what AccelDSP does for Matlab scripts.

Cheers,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,170
Messages
2,570,927
Members
47,469
Latest member
benny001

Latest Threads

Top