Are HDLs Misguided?

A

Andy

The last time I had to cram too much logic into too little FPGA, I
optimized the architecture, not the implementation. Among other
changes, I took a half dozen separate UARTs, and replaced them with a
six channel time-division-multiplexed UART using distributed RAM (it
could have been a sixteen channel UART for virtually the same amount
of resources excluding IO). Each channel still had independent baud
rate, word length, parity, #stop bits, interrupts, etc. The design
placed and routed in no time, with plenty of resources to share (it
had been about 125% utilization).

The reason I bring this up is that often we have challenges that are
better addressed at the architectural level, rather than the
implementation level. In these cases, a description farther above the
gates and flops is preferable, becuase it is easier to re-work. You
can bet I did not instantiate any RAMs in that design!

That is not to say that it is never necessary to deal with the gates
and flops (and carry chains, etc.). Sometimes we do, but it is
relatively rare. So we rarely need a language that is "closer to the
end product", and we often need a language that allows us to design at
a higher level.

I'm not really sure exactly what you want in your new flavor of HDL.
Do you have any specific ideas? How would you make it closer to the
end product?

Andy
 
J

Jan Decaluwe

Andy said:
The last time I had to cram too much logic into too little FPGA, I
optimized the architecture, not the implementation. Among other
changes, I took a half dozen separate UARTs, and replaced them with a
six channel time-division-multiplexed UART using distributed RAM (it
could have been a sixteen channel UART for virtually the same amount
of resources excluding IO). Each channel still had independent baud
rate, word length, parity, #stop bits, interrupts, etc. The design
placed and routed in no time, with plenty of resources to share (it
had been about 125% utilization).

The reason I bring this up is that often we have challenges that are
better addressed at the architectural level, rather than the
implementation level. In these cases, a description farther above the
gates and flops is preferable, becuase it is easier to re-work. You
can bet I did not instantiate any RAMs in that design!

That is not to say that it is never necessary to deal with the gates
and flops (and carry chains, etc.). Sometimes we do, but it is
relatively rare. So we rarely need a language that is "closer to the
end product", and we often need a language that allows us to design at
a higher level.

Finally a breath of fresh air :)

Thanks for sharing a great story that should remind all of us
where the true opportunities for optimization are to be found.

--
Jan Decaluwe - Resources bvba - http://www.jandecaluwe.com
Python as a HDL: http://www.myhdl.org
VHDL development, the modern way: http://www.sigasi.com
Analog design automation: http://www.mephisto-da.com
World-class digital design: http://www.easics.com
 
R

rickman

The last time I had to cram too much logic into too little FPGA, I
optimized the architecture, not the implementation. Among other
changes, I took a half dozen separate UARTs, and replaced them with a
six channel time-division-multiplexed UART using distributed RAM (it
could have been a sixteen channel UART for virtually the same amount
of resources excluding IO). Each channel still had independent baud
rate, word length, parity, #stop bits, interrupts, etc. The design
placed and routed in no time, with plenty of resources to share (it
had been about 125% utilization).

The reason I bring this up is that often we have challenges that are
better addressed at the architectural level, rather than the
implementation level. In these cases, a description farther above the
gates and flops is preferable, becuase it is easier to re-work. You
can bet I did not instantiate any RAMs in that design!

That is not to say that it is never necessary to deal with the gates
and flops (and carry chains, etc.). Sometimes we do, but it is
relatively rare. So we rarely need a language that is "closer to the
end product", and we often need a language that allows us to design at
a higher level.

I'm not really sure exactly what you want in your new flavor of HDL.
Do you have any specific ideas? How would you make it closer to the
end product?

Andy

You keep saying things like "a description farther above the gates and
flops is preferable". I have not said I want to place LUTs and FFs.
I would like to *know* what the logic will be for a given piece of
code without having to run it through the tool and wading through a
machine drawn schematic.

I am all in favor of a language that allows higher level constructs.
But that does not require that visibility to and knowledge of the
lower levels be blocked.

The current languages are much like C compilers in that the take
entire design, break it down to its lowest level and then builds it
back up in a way that it chooses. I think that philosophy is wrong.
I don't think I need to work with total abstraction to be effective as
a designer. Ray Andraka has made a career of designing efficient
designs in an efficient way. He uses VHDL, but he uses a lot of
instantiation, which to him is very much like written schematics.
That is not what I want.

I wish I could tell you how I'd like to see this solved. If I could
do that, I'd likely start a company doing it. I guess it would be
something like instantiation but without all the hassle of
instantiation at the VHDL level. Heck, maybe what I am trying to
describe is C? or maybe even Forth? I have thought before that Forth
would make an interesting HDL. It is very hierarchical and it most
likely could support very efficient simulation by compiling the
original program to native machine code, but with extremely fast
compiles too.

I was hoping to get some input from others on this rather than a lot
of reasons why the current HDLs are so good. I know their strengths,
but I can also see their weaknesses. That is what I am discussing
here, the weaknesses.

Rick
 
M

Mike Treseler

Problem: this might not have been a full bit-time since you started
sending the '1' stop bit! You never actually guarantee to wait for the
full stop bit to pass before accepting new data from the application in
the transmit data register!

Or am I missing something?

Note the source comment:
-- reads anytime, expects smart,handshaking reader

Click on waves or zoom for the sim details

-- Mike
 
C

Christopher Head

Note the source comment:
-- reads anytime, expects smart,handshaking reader

Click on waves or zoom for the sim details

-- Mike

Sorry, I still don't see it. Looking at the write of 0x83 (the second
write in the zoom example), I see the data going out over serialout_s,
and I watch the txstate_v. There's a low between time 9900 and 10us for
the start bit, then two bits of high time for bits 0 and 1, then a pile
of low time, then you see serialout_s go high just before time 10200.
That's bit 7 (MSb), which for data 0x83 is high. Looking down at
txstate_v, it changes from "send" to "stop" right at the start of that
bit, then changes from "stop" to "idle" one bit time later. But that
wasn't the stop bit, that was data bit 7! The stop bit has only just
started, right where txstate_v became "idle". If the application polls
the status register right then, because txstate_v=idle, it will see
TxReady asserted, indicating the transmitter is willing to accept
another data byte. If it immediately sends that next byte, the start
bit will happen only a cycle or two later. With only three clock cycles
per bit, the system probably can't actually turn around fast enough for
the start bit to trample on the stop bit, but it could with a faster
system clock or lower baud rate.

Or did the idea of a "smart, handshaking reader" mean something else? I
took it to mean that the application should poll the status register
and wait for TxReady before sending data.

Chris
 
T

Tricky

You keep saying things like "a description farther above the gates and
flops is preferable".  I have not said I want to place LUTs and FFs.
I would like to *know* what the logic will be for a given piece of
code without having to run it through the tool and wading through a
machine drawn schematic.

I am all in favor of a language that allows higher level constructs.
But that does not require that visibility to and knowledge of the
lower levels be blocked.

The current languages are much like C compilers in that the take
entire design, break it down to its lowest level and then builds it
back up in a way that it chooses.  I think that philosophy is wrong.
I don't think I need to work with total abstraction to be effective as
a designer.  Ray Andraka has made a career of designing efficient
designs in an efficient way.  He uses VHDL, but he uses a lot of
instantiation, which to him is very much like written schematics.
That is not what I want.

I wish I could tell you how I'd like to see this solved.  If I could
do that, I'd likely start a company doing it.  I guess it would be
something like instantiation but without all the hassle of
instantiation at the VHDL level.  Heck, maybe what I am trying to
describe is C?  or maybe even Forth?  I have thought before that Forth
would make an interesting HDL.  It is very hierarchical and it most
likely could support very efficient simulation by compiling the
original program to native machine code, but with extremely fast
compiles too.

I was hoping to get some input from others on this rather than a lot
of reasons why the current HDLs are so good.  I know their strengths,
but I can also see their weaknesses.  That is what I am discussing
here, the weaknesses.

Rick

Personally, I still dont see why this is needed, or how it would be
done. HDLs offer the user the option to either abstract their designs
with behavioural code or create a schematic like design with
instantiations, or a mixture of both. The big problem with what you
are suggesting is that Different vendors, and different parts across
the same vendor, have LUTs, FFS, multipliers etc that behave
differently, with different number of IOs. To cover all cases, the
only way to ensure the same code works across a range is with
behavioural code. The only way to know what will come out in advance
is either with experience or instantiation. To get what you are
suggesting would require all vendors to produce their FPGAs with the
same base parts, but that will never happen.

Vendors provide code templates for various things, so you should know
what you are getting with these. Recently, the only things I have had
to directly instantiate are Dual clock dual port RAMs, because Altera
still cannot infer them from code. Im pretty sure that one day they
will.
 
A

Alessandro Basili


Am I missing something, or is the transmitter slightly flawed in this
code? I seem to see the following: [snip]
4. From this moment, if the application queries the status register,
you will see that TxState_v is IDLE and hence report transmitter ready.

I believe only 1 clock after TxState_v is IDLE, hence you have a 1 bit stop.
The application could thus immediately strobe another byte of data into
the transmit data register. Then tx_state will transition to TxState_v
= START and, on the next clock, set serial_out_v to '0'.

The application needs to read the 0x04 as status register and if you
check the zoom file you find that read_data_v is 0x04 only one clock
after the TxState_v is idle. Hence the 1 bit stop is guaranteed.

Al
 
A

Andy

To be able to "know" what hardware you are going to get requires
having a language and description that, from the same code, will
always produce the exact same logic, regardless of required
performance, space, and target architecture. Unfortunately it also
means having a language and description which will not produce
reasonably optimal hardware over all combinations of those same
constraints, given the same code. You can't have it both ways.

I think we may have a disconnect with regards to the meaning of
"higher level constructs". If you mean things like hierarchy, etc. I
agree that it should not get in the way of a deterministic hardware
description (and I don't think it does, though your methods of using
the language may not be optimal for this).

However, I take "higher level constructs" as being those that support
a behavioral description ("what does the circuit do?", not "what is
the circuit?"), with necessarily less direct mapping to implemented
hardware. It is the use of these constructs that provides the
tremendous increase in productivity over past methods (more primitive
languages and schematics). You would not consider creating a word
processor or spreadsheet in assembler, so why would you create a
complex FPGA design with what amounts to a netlist?

Andy
 
R

rickman

To be able to "know" what hardware you are going to get requires
having a language and description that, from the same code, will
always produce the exact same logic, regardless of required
performance, space, and target architecture. Unfortunately it also
means having a language and description which will not produce
reasonably optimal hardware over all combinations of those same
constraints, given the same code. You can't have it both ways.

I don't think you conclusions come from the facts. An HDL can
unambiguously describe a counter with a carry out and allow that
counter and carry to implemented in any manner the device can
support. If I am using a device that can't implement a counter, why
would I attempt to describe a counter? How could any HDL code that
counts possibly be portable in a system that can't implement a
counter?

I think we may have a disconnect with regards to the meaning of
"higher level constructs". If you mean things like hierarchy, etc. I
agree that it should not get in the way of a deterministic hardware
description (and I don't think it does, though your methods of using
the language may not be optimal for this).

Not only should hierarchical design not get in the way of
deterministic design, it is mandatory, otherwise you *would* need to
design every logic element.

However, I take "higher level constructs" as being those that support
a behavioral description ("what does the circuit do?", not "what is
the circuit?"), with necessarily less direct mapping to implemented
hardware. It is the use of these constructs that provides the
tremendous increase in productivity over past methods (more primitive
languages and schematics). You would not consider creating a word
processor or spreadsheet in assembler, so why would you create a
complex FPGA design with what amounts to a netlist?

I think you are drawing too fine a line between behavioral and
structural. I am not saying I want to design gates. I am saying I
want to clearly know how my code will be implemented. No, I don't
know exactly how this would be done, but like I've said before, when I
figure it out, I'll start a company and get rich from it.

The first step to developing something new is to realize that there is
an opportunity. Some folks are trying to convince me that my editor
is not adequate and I should consider another tool that "understands"
the language. Before I will evaluate that, I have to recognize that
I'm not happy with my editor. I think a lot of people are very
complacent about their tools. I see things a bit differently.

Rick
 
A

Andy

I don't think you conclusions come from the facts.  An HDL can
unambiguously describe a counter with a carry out and allow that
counter and carry to implemented in any manner the device can
support.  If I am using a device that can't implement a counter, why
would I attempt to describe a counter?  How could any HDL code that
counts possibly be portable in a system that can't implement a
counter?

I don't think you understand that knowing whether a carry output is
used or not is almost worthless if you don't know which way that carry
output should be created, which then gets us back to performance,
resources, and target architectures. You have a pre-conceived notion
that using a carry output will, in all cases of performance,
resources, and target architecture, be an optimal solution. This is
simply not true.

Using my method (count - 1 < 0), I have seen the same synthesis tool,
for the same target architecture, use the the built-in carry-chain
output (shared), use combination of sub-carry outputs and LUTS, and
just LUTS, depending on what it thought would be optimal (and when I
checked them, I found it had made very good choices!) I would prefer
the synthesis tools figure out for themselves that if I am testing the
output of a down counter with count=0 (which is more readable/
understandable than count-1<0), it could use a carry out if it was
optimal. I would really prefer that it figure out that, if count is
not used anywhere else, it could translate a behavior of counting from
1 to n (which is even more readable/understandable) into a down
counter from n-1 to zero (if doing so would be more optimal). But
these are synthesis tool issues, not language issues.

Andy
 
C

Christopher Head


Am I missing something, or is the transmitter slightly flawed in
this code? I seem to see the following: [snip]
4. From this moment, if the application queries the status register,
you will see that TxState_v is IDLE and hence report transmitter
ready.

I believe only 1 clock after TxState_v is IDLE, hence you have a 1
bit stop.
The application could thus immediately strobe another byte of data
into the transmit data register. Then tx_state will transition to
TxState_v = START and, on the next clock, set serial_out_v to '0'.

The application needs to read the 0x04 as status register and if you
check the zoom file you find that read_data_v is 0x04 only one clock
after the TxState_v is idle. Hence the 1 bit stop is guaranteed.

Al

Each bit is actually three clocks wide. As I pointed out in my other
message, you can't actually stomp on the stop bit at these particular
choices of timing, but the UART code nowhere suggests that it's only
usable with values of (clocks/baud) <= 3. Why else would tic_per_bit_g
be a generic parameter instead of a constant?

Chris
 
A

Andy

I haven't seen the code, but this would be an excellent place to use a
range constraint on the generic tic_per_bit_g. Failing that, a
concurrent assertion that verifies useable values of the generic(s)
would work. It could actually work better than a range constraint if
there are interdependencies between the values of multiple generics.

Andy
 
M

Mike Treseler

Each bit is actually three clocks wide. As I pointed out in my other
message, you can't actually stomp on the stop bit at these particular
choices of timing, but the UART code nowhere suggests that it's only
usable with values of (clocks/baud)<= 3. Why else would tic_per_bit_g
be a generic parameter instead of a constant?

This example works fine for baud rates slower than two ticks per bit.
I left it a constant so I could play with it without having to explain
it. Chris, you have the source and a testbench, so feel free to make
any changes you like. My goal was only to demonstrate that variables
may be used for synthesis of luts and flops.

-- Mike Treseler
 
C

Christopher Head

This example works fine for baud rates slower than two ticks per bit.
I left it a constant so I could play with it without having to
explain it. Chris, you have the source and a testbench, so feel free
to make any changes you like. My goal was only to demonstrate that
variables may be used for synthesis of luts and flops.

-- Mike Treseler

Oh, sure, it didn't cause any trouble. I read the code out of
curiosity, liked the style, and then noticed what I thought looked like
a possible problem and wondered if anyone had seen it. So, no big
deal :)

Chris
 
P

Paul Colin Gloster

Jan Decaluwe <[email protected]> sent on December 14th, 2010:

|---------------------------------------------------------------------|
|"[..] |
| |
|I think Verilog will suit you better as a language, you really should|
|consider switching one of these days. However, there is no reason why|
|it would help you with the issues that you say you are seeing here." |
|---------------------------------------------------------------------|


He did try Verilog many months ago, but he has resumed with VHDL.
 
P

Paul Colin Gloster

Andy <[email protected]> sent on December 15th, 2010:

|----------------------------------------------------------------------|
|"[..] |
| |
|[..] |
|[..] You have a pre-conceived notion |
|that using a carry output will, in all cases of performance, |
|resources, and target architecture, be an optimal solution. This is |
|simply not true. |
| |
|Using my method (count - 1 < 0), I have seen the same synthesis tool, |
|for the same target architecture, use the the built-in carry-chain |
|output (shared), use combination of sub-carry outputs and LUTS, and |
|just LUTS, depending on what it thought would be optimal (and when I |
|checked them, I found it had made very good choices!) [..] |
|[..] |
|[..] But |
|these are synthesis tool issues, not language issues. |
| |
|Andy" |
|----------------------------------------------------------------------|


Yes, these are good examples of why Rick is naive to expect to know in
advance the efficient output of a synthesizer. If one variable is
always the sum of two other particular variables and these "variables"
are always constant, then the corresponding hardware (if any) should
naturally be less than for three changing variables.

Rick was proposing something which in reality would be uniform and
hence not generally efficient.
 
P

Paul Colin Gloster

Tricky <[email protected]> sent on December 15th, 2010:
|---------------------------------------------------------------------|
|"[..] |
|[..] The only way to know what will come out in advance |
|is either with experience or instantiation. [..] |
|[..] |
| |
|[..]" |
|---------------------------------------------------------------------|

It is not possible to know.
 
P

Paul Colin Gloster

Rick Collins had sent on December 10th, 2010:

|----------------------------------------------------------------------|
|"[..] |
| |
|What I mean is, if I want a down counter that uses the carry out to |
|give me an "end of count" flag, why can't I get that in a simple and |
|clear manner? It seems like every time I want to design a circuit I |
|have to experiment with the exact style to get the logic I want and it|
|often is a real PITA to make that happen. |
| |
|[..] |
| |
|I guess what I am trying to say is I would like to be able to specify |
|detailed logic rather than generically coding the function and letting|
|a tool try to figure out how to implement it. [..] |
|[..]" |
|----------------------------------------------------------------------|


On December 13th, 2010, Rick has claimed:
|-----------------------------------------------------------------|
|"[..] |
| |
|[..] |
|[..] I am not saying I want to use something similar to assembly|
|language [..] |
|[..] |
| |
|[..]" |
|-----------------------------------------------------------------|


Rick Collins <[email protected]> sent on December 14th, 2010:

|---------------------------------------------------------------------|
|"[..] |
| |
|[..] |
|[..] Like I said somewhere else, if you want a |
|particular solution, the vendors tell you to instantiate. |
|Instantiation is very undesirable since it is not portable across |
|vendors and often not portable across product lines within a vendor!"|
|---------------------------------------------------------------------|

As with many assembly languages for different models in a single
processor family.

|--------------------------------------------------------------------|
|"The language is flexible, that's for sure. [..] |
|[..]" |
|--------------------------------------------------------------------|

As flexible as Lisp or Confluence or Lava?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top