structured VHDL

A

alb

Hi everyone,

I'm trying to shift my implementation paradigm towards a more functional
description and I found some good lectures/articles to structured VHDL
(a Google search for 'structured vhdl design method' will provide a
handful of links).

Unfortunately I have only found a small bunch of examples in Mike
Treseler's Folder (http://myplace.frontier.com/~miketreseler/ but be
aware that links are referencing to the wrong url...) and not more.

Can anyone here point me to some other code for real applications using
the two process approach described in the above mentioned articles?

Any open discussion on the methodology itself?

Thanks a lot,

Al
 
K

KJ

Unfortunately I have only found a small bunch of examples in Mike Treseler's
Folder (http://myplace.frontier.com/~miketreseler/ but be aware that links
are referencing to the wrong url...) and not more.

I'm assuming that you've been able to figure out the correct links to Mike's code. It looks like just his root path changed.
Can anyone here point me to some other code for real applications using the
two process approach described in the above mentioned articles?

Mike would gag if the 'above mentioned articles' that you're talking about is Mike's stuff. Mike is all about one process, not two...literally. He extensively uses variables shunning all use of signals except to tie together entities (i.e. he wouldn't use a signal within the architecture.
Any open discussion on the methodology itself?

If the methodology you're talking about is what Mike uses, then the best source I've found is Mike's code itself and his postings. You can search this group for Mike's postings (he used to be a prolific poster...he has disappeared for quite some time which is too bad).

If you really *are* interested in 'two process' (i.e. one combinatorial process with the description of the logic and one clocked process to provide the storage of state) then you can pick up a bunch of textbooks. None of them would I recommend in this area because there is nothing to recommend about using the two process approach.

Kevin Jennings
 
A

alb

I'm assuming that you've been able to figure out the correct links
to Mike's code. It looks like just his root path changed.

Sure. I didn't know Mike has been out of this group since so long and I
believed reporting would have triggered some actions... well I hope some
other people will profit of this OT exchange.
Mike would gag if the 'above mentioned articles' that you're talking
about is Mike's stuff. Mike is all about one process, not
two...literally. He extensively uses variables shunning all use of
signals except to tie together entities (i.e. he wouldn't use a
signal within the architecture.

As a matter of fact here's a presentation where the 'two process
approach' is described and Mike Treseler's code is reported as an example:

http://ens.ewi.tudelft.nl/Education/courses/et4351/structured_vhdl_2010.pdf

I realize that Mike's code is all about one process, but I guess the
main focus of the 'above mentioned articles' is to move from the haze of
processes often found around to a more structured approach where
registers and combinatorial logic are well separated:

Quoting Ian Lang:

(http://www.designabstraction.co.uk/Articles/Advanced Synthesis Techniques.htm)
What we’ve just done is prove to ourselves that all synchronous
designs can be thought of as two simple elements:

The registers that hold its present state and The combinatorial logic
that determines its next state.

The 'procedural template coding style' (as Mike calls it) is what I'm
actually interested in because of its elegance, but I must admit that I
will need to read more code to be able to write the same way.
If the methodology you're talking about is what Mike uses, then the
best source I've found is Mike's code itself and his postings. You
can search this group for Mike's postings (he used to be a prolific
poster...he has disappeared for quite some time which is too bad).

Then I would be even more interested in reading more of his code, but I
guess that is not as 'open' as the one published on the net. If somebody
here happens to have GPL-like licensed code from Mike and you are
willing to share it I would greatly appreciate. I believe that reading
code is sometimes more important than writing it ;-).
If you really *are* interested in 'two process' (i.e. one
combinatorial process with the description of the logic and one
clocked process to provide the storage of state) then you can pick
up a bunch of textbooks. None of them would I recommend in this
area because there is nothing to recommend about using the two
process approach.

From your words I understand that Mike's approach is not described in
any textbook and it lives only in scattered presentations/articles/blogs
on the net.

Since this method doesn't seem so well spread, how do we know if there
is any particular instance, or a particular set of cases, where the
'procedural template coding style' falls short?

In the presentation referenced in this message there's a list of
problems reported:
- Keep the code synthesizable
- Synthesis tool might choose wrong gate-level structure
- Problems to understand the algorithm for less skilled engineers

They seems to me just speculations, but do they hint somehow a lack of
'acceptance' of this method?
 
K

KJ

On 10/07/2013 02:28, KJ wrote: > On Tuesday, July 9, 2013 2:31:35 AM UTC-4, alb wrote: >>> Can anyone here point me to some other code for real applications
As a matter of fact here's a presentation where the 'two process approach'
is described and Mike Treseler's code is reported as an example:

It's presented as an example called "Single process template program"
http://ens.ewi.tudelft.nl/Education/courses/et4351/structured_vhdl_2010.pdf I
realize that Mike's code is all about one process, but I guess the main focus
of the 'above mentioned articles' is to move from the haze of processes often
found around to a more structured approach where registers and combinatorial
logic are well separated: Quoting Ian Lang: > (http://www.designabstraction.co.uk/Articles/Advanced Synthesis Techniques.htm)
What we’ve just done is prove to ourselves that all synchronous
designs can be thought of as two simple elements:
The registers that hold its present state and The combinatorial logic
that determines its next state

All I have to say here is:
- There is no 'haze of processes'
- Ian states the obvious...that there is combinatorial logic and registers
- Ian overplays this into stating that coding this way is somehow 'structured'
- The paper takes non-issues and presents 'solutions' and ignores actual issues as well as the new issues that get created by the proposed 'solution'

From your words I understand that Mike's approach is not described in anytextbook
and it lives only in scattered presentations/articles/blogs on the net.

You might be right.
Since this
method doesn't seem so well spread, how do we know if there is any particular
instance, or a particular set of cases, where the 'procedural template coding style'
falls short?

Any tool (or method) can be misused. The skill of the designer is the mostimportant element.

In the presentation referenced in this message there's a list of problems
reported:
- Keep the code synthesizable
- Synthesis tool might choose wrong gate-level structure
- Problems to understand the algorithm for less skilled engineers

The first two are laughably wrong, Mike is all about synthesizable code andthe supposed 'wrong gate-level structure' is claptrap. The last point again relates back to the skill of the designer. However, assuming equally skilled designers, the supposed 'structured' approach given in the article iseven more likely to have 'Problems to understand the algorithm for less skilled engineers'.
They seems to me just speculations,

They are just speculating and drawing incorrect conclusions as well.
but do they hint somehow a lack of 'acceptance' of this method?

They hint that simply because they wanted to write a paper about their own method not somebody else. What would be the point of a paper about doing something one way that the authors acknowledge is done better by some other method?

The authors believe there method to be better. I won't begrudge them theirbeliefs nor try to convince them otherwise. It's a belief, not a fact.

Kevin Jennings
 
A

Andy

The ET4351 authors' proposed style uses two processes and three copies of the record data type (2 signals and one variable).

A single process style like Treseler's uses only one variable of the record..

The authors' style only allows all outputs to be combinatorial or registered together, rather than some outputs registered, some combinatorial.

Treseler's style allows outputs to be registered or to be combinatorial functions of process register(s).

The authors' style does allow pure combinatorial outputs (combinatorial function of inputs), which a single clocked process style does not allow. Treseler's style allows combinatorial functions of registers to be outputs. Pure in-to-out combinatorial functions should be avoided where possible. They tend to create long timing paths through multiple modules that are difficult to detect and create timing problems during P&R.

I do not share Treseler's single-process-per-entity preference. There are many cases where semi-independent functions in a single entity benefit from being in separate processes (isolated from access to each other's variables, except as allowed through signals), yet not needing the coding overhead of separate entities.

I do use a lot of functions and procedures, but not if the only use is for the template in Treseler's style (though this sometimes needs a 2nd variable copy).

Combinatorial processes waste simulation performance, since the entire process runs every time/delta cycle in which any input changes. Many simulatorsmerge processes that share the same sensitivity list for better performance. Combinatorial processes rarely share the same sensitivity list, and therefore cannot take advantage of this optimization.

Andy
 
A

alb

All I have to say here is:
- There is no 'haze of processes'

my apologies, I intended 'maze'. In this respect I must admit that the
DUFF.vhd example
(http://www.designabstraction.co.uk/EXAMPLE/HTML/duff.htm) pretty much
reflects what I used to write and what I want *not* to repeat in my next
projects. But how do you learn to write code at a higher level of
abstraction? I guess reading lots of code written that way, but where
could I find it?
- Ian states the obvious...that there is combinatorial logic and registers

true. But the separation might be at the gate level or higher and this
is where Ian style makes a difference (BTW Mike seems to have been
strongly inspired by Ian's article, at least from the comments he puts
in the uart.vhd example)
- Ian overplays this into stating that coding this way is somehow 'structured'

If you think about 'structured programming' as in a paradigm where every
computable function can be expressed as a combination of only three
control structures, then Ian proposal pretty much follows the same line.
Whether is overplayed or not I do not know.
- The paper takes non-issues and presents 'solutions' and ignores actual issues as well as the new issues that get created by the proposed 'solution'

Could you be more specific on what kind of 'non-issues' and 'actual
issues' you are referring to? Since I consider the talk rather to the
point I may have missed/misunderstood an essential part of it.
Any tool (or method) can be misused. The skill of the designer is the most important element.

Sure, but I guess you would probably agree that a practice which has
been used by many will certainly be stressed to the point where pitfalls
and benefits are better understood.

IMHO a method used by one person only, no matter how skillful is that
person, is possibly more prone to show weaknesses in several corner cases.
The first two are laughably wrong, Mike is all about synthesizable
code and the supposed 'wrong gate-level structure' is claptrap. The
last point again relates back to the skill of the designer. However,
assuming equally skilled designers, the supposed 'structured'
approach given in the article is even more likely to have 'Problems
to understand the algorithm for less skilled engineers'.

I may have misunderstood completely the talk but I do not read the
'increasing the abstraction level' slide as a critique to Mike's style
w.r.t. their style.

On the contrary it seems to me they're trying to give example of higher
level of abstraction (including Mike's one process example) and warn
about what they believe are weak points of these models. Considering the
experience with the LEON vs the ERC32, both quite full fledged projects,
it seems their warnings are not that unfounded (even though this is only
my speculation).
They are just speculating and drawing incorrect conclusions as well.

Which one specifically?
The first bullet qualifies the two-process approach as uniform and I
believe rightly since all entities look the same and a designer only
needs to look at the combinatorial process to understand the algorithm.

Looking at their LEON vs ERC32 example, it seems the method claims less
resources than an ad-hoc one, therefore improving development time.

I must say that I do not know how simulation and synthesis performances
increase with the approach proposed, but it would be nice to see some
numbers on this. I doubt though that this lack of facts qualify the
conclusion as incorrect.

I personally believe the reading is improved by the fact that you do not
need to trace several concurrent processes at the same time and your
flow of reading the code is more or less sequential.

I certainly believe that a readable code is easier to maintain, but I
may doubt that this approach does increase the re-usability of the code,
at least I consider this last point not more than the author's personal
opinion.
They hint that simply because they wanted to write a paper about
their own method not somebody else. What would be the point of a
paper about doing something one way that the authors acknowledge is
done better by some other method?

The authors believe there method to be better. I won't begrudge them
their beliefs nor try to convince them otherwise. It's a belief, not
a fact.

Here I need to urge you to go through the presentation again since I
believe, with all due respect, you missed the point. IMO they are *not*
proposing their two-processes approach vs Mike's one process approach.
They actually use Mike's example to show how increasing the level of
abstraction is a good thing.

The whole point I got, which still might be wrong, is that their
proposed style is way better than what the call 'traditional ad-hoc
design style' (pag. 5).

Their beliefs are well supported by the example they show (which still
may have some peculiarities and hide larger flows of their approach) and
it seems to me that Mike's style also supports their conclusions.
 
A

alb

The ET4351 authors' proposed style uses two processes and three
copies of the record data type (2 signals and one variable).

A single process style like Treseler's uses only one variable of the
record..

Does that imply any difference in terms of simulation/synthesis
performances? I've always believed signals are much more intensive to
process than variables.
The authors' style only allows all outputs to be combinatorial or
registered together, rather than some outputs registered, some
combinatorial.

Uhm, what does prevent us to write this:

comb : process (sysif, r)
variable v : reg_type;
begin
v := r; v.irq := '0';
for i in r.pend'range loop
v.pend(i) := r.pend(i) or (sysif.irq(i) and r.mask(i));
v.irq := v.irq or r.pend(i);
end loop;
rin <= v;
irqo.irq <= r.irq; -- registered
irqo.mask <= v.mask -- not registered
irqo.pend <= v.pend -- not registered
end process;

Isn't the above example a case where some elements of the record are
registered and some are not? Maybe I missed your point.

[]
The authors' style does allow pure combinatorial outputs
(combinatorial function of inputs), which a single clocked process
style does not allow. Treseler's style allows combinatorial functions
of registers to be outputs. Pure in-to-out combinatorial functions
should be avoided where possible. They tend to create long timing
paths through multiple modules that are difficult to detect and
create timing problems during P&R.

that is definitely something one must keep in mind. I share your opinion
about in-to-out combinatorial logic, but is not always possible to get
rid of them.
I do not share Treseler's single-process-per-entity preference. There
are many cases where semi-independent functions in a single entity
benefit from being in separate processes (isolated from access to
each other's variables, except as allowed through signals), yet not
needing the coding overhead of separate entities.

Agreed, but you could still use the template to code the
semi-independent functions and connect them through signals in one
entity. I believe that coding patterns do simplify a lot code
readability since you can nearly skip all the repeated patterns and
concentrate only on the differences (in software practices I think is
called 'programming by difference').
I do use a lot of functions and procedures, but not if the only use
is for the template in Treseler's style (though this sometimes needs
a 2nd variable copy).

Do you know of any text/article/code which is openly accessible and uses
lot of functions and procedures? I would appreciate any pointer.
Combinatorial processes waste simulation performance, since the
entire process runs every time/delta cycle in which any input
changes. Many simulators merge processes that share the same
sensitivity list for better performance. Combinatorial processes
rarely share the same sensitivity list, and therefore cannot take
advantage of this optimization.

But if you have one combinatorial (and one sequential) process per
entity then the amount of potential different sensitivity lists are just
the amount of entities (which still can be large, you may say). In this
respect I consider the /all/ keyword introduced in VHDL-2008 as a bad thing.
 
A

Andy

Alb,

Especially when signals are used to trigger combinatorial processes, there is much more overhead than variables, and even moreso if the signal is assigned with a delay. I would expect decent simulators to significantly optimize the overhead on a signal for which there are no delays or sensitive processes. But there is still the overhead of separate execution (computing thevalue to be assigned) and update (upon process suspension) for signals even within a single process. Variables lack this duality and its associated overhead. In my experience, using integers where possible for numeric quantities (counters, etc.) yields a far more significant simulation performance improvement than variables vs signals.

WRT combinatorial vs registered outputs from the authors' style, yes, one could separately assign outputs from the corresponding elements of v or r. However, in separate examples they assigned outputs either all from v, or all from r, with no mixing. If I was to use that style and wanted a mixture of r and v. I would probably assign the outputs en mass from the more common, and re-assign only those that differ (to borrow your mention of 'programming by difference'). I would expect the overhead of duplicate/overriding assignments in simulation to be trivial.

On the rare occasion that I need a combinatorial in-out path through an entity, I would prefer that to be the only situation where I use a combinatorial process (this tends to make such a practice stand out in the code, whichit should.) Common combinatorial logic feeding multiple sequential processes can better be expressed by a function or procedure invoked in each process, IMHO. Synthesis is plenty smart enough to figure out if only one copy of the logic is really needed.

I think maybe you do not understand the effects of the new "all" keyword ina sensitivity list. It does NOT indicate that the process is sensitive to all signals visible by the process. "All" indicates that the process is sensitive to any signal read by the process. Therefore, it has zero simulationperformance impact on a combinatorial RTL process.


By embedding the combinatorial process functionality in the clocked process, the level of control nesting is increased by one if-statement (or two if asynchronous reset is used). If one really wants to avoid that increase, then follow Treseler's model and include the combinatorial functionality in aprocedure. However, the root of the limitation on levels of control is based on testability. Declaring and using a separate procedure does not improve testabiliy unless the procedure is externally accessible (e.g. in a package)for separate testing. However, a procedure declared in a package no longer has implicit access to signals visible by the process in which it was originally declared. Such signals would have to be passed through the procedure call explicitly, preferably through records. In practice, the reset and clock controls are so ubiquitous in RTL that it is pointless to consider their impact on testability.

As for texts and examples on subprograms in RTL, I do not know of any extensive references. For synthesizable RTL, subprograms are prohibited from consuming time/delta cycles (e.g. no wait statements within the subprogram.) Functions cannot contain wait statements anyway. This effectively limits subprograms to describing combinatorial logic only. However, this allows replacing the second, combinatorial process in the authors' style with an equivalent procedure that can be called from the concurrent process. Thus one canhave their cake and eat it too: one clocked process, but with explicit separation of register and combinatorial logic descriptions which some seem toprefer.

I prefer an RTL style that emphasizes clock cycles of latency, and let the synthesis tool determine where registers must be inserted to accomplish said latency. If one uses retiming/pipelining optimizations during synthesis, the input-output latency is the only aspect retained by the implementation anyway. This style uses variables to describe function and latency by controlling the order of reading and writing a variable during the same clock cycle. Like Treseler, signals are only used for inter-process communication.

A few tips on RTL subprograms.

Concurrent procedure calls are allowed, but are easily confused with component instantiations (for a purely combinatorial component, which I rarely ifever use). I avoid concurrent procedure calls in RTL altogether. If used concurrently, procedures must use constant and/or signal class interfaces.

On the other hand, do not use signal class interfaces on procedures called in processes, use variable or constant classes instead. The delayed update semantics of signals in processes becomes even more confusing when procedures are involved.

When a subprogram is declared in a process, it has visibility to anything visible by the process. It also has visibility to any variables, types or subprograms declared before it in the process declarative region (before the begin statement).

When declared outside a process, a procedure cannot drive any signal not explicitly passed to it.

Hope this helps,

Andy
 
K

KJ

true. But the separation might be at the gate level or higher and this
is where Ian style makes a difference

What difference do you think it makes? The 'style' it is written in will make
no difference. Synthesis will take the description and turn it into logic that
is implemented in 'gates' (or connections to LUT) as well as flip flops (or
memory). The tools really do not need any help from the user in figuring out
which is which.
If you think about 'structured programming' as in a paradigm where every
computable function can be expressed as a combination of only three
control structures, then Ian proposal pretty much follows the same line.
Whether is overplayed or not I do not know.

That does not imply that the 'best' way to describe the code is in that form.
The soft metrics for 'best' here should be maintainability. On the assumption
that one can write code that meets the function and performance requirements
in many different ways, then one looks towards source code maintainability
which would be the ability for somebody (not necessarily the original designer)
to get up to speed on the design. Clumping all of the unrelated code physically
together because it describes the transfer function is rather pointless. Clumping
parts of code together that are inter-related makes sense.
Could you be more specific on what kind of 'non-issues' and 'actual
issues' you are referring to? Since I consider the talk rather to the
point I may have missed/misunderstood an essential part of it.

I'll just nitpick on a few points in no particular priority, just the order
that they are in the presentation.

- Other than the last bullet about auto-generated code, nothing on the slide
titled 'Traditional VHDL design methodology' is worth the time it took to
type. It's all wrong. Maybe some people have done he says here but that
doesn't make it 'Traditional'. The statement 'Hard to read = difficult to maintain'
was written and made to seem important by coloring it in red by Capt. Obvious.
- The next slide 'Traditional ad-hoc design style' is similarly biased and worthless.
Taking the statement 'No unified signal naming convention' as one example. The upcoming
'unified' naming convention will not add anything useful (more later). The
statement 'Coding is done at low RTL level' is laughable. Again some may code this
way, but let's not make elevate those people to be considered the traditionalists.
- Slide 'Unified signal naming example'. Only the convention for 'types' has any value
and that is because use of the type name can frequently be used throughout the code and
it can be of value sometimes to easily differentiate that xyz is a data type. Specific
objections to the others are:
* _in -- Yeah, I hadn't noticed that I only use the signal on the right hand side so
it must be an input. If you use a compiler that doesn't complain about assigning to
an input than maybe you need _in...or maybe you need to use a real tool.
* _out -- Ditto
* _s and _v -- Every signal and variable will have a logic meaning conveyed by the name
of the signal/variable. When that logic meaning is conveyed I'll use the appropriate
assignment operator and if I forget the compiler will cough. When I go to *use* that
thing which is the more important thing then I really don't care that it is a signal
or variable.
* _pkg -- It would be better to use it as a prefix so that all the packages will list
together if I needed to get at some signal in a package...but that's about it
* p_ and i_ -- Why? These can never be referenced elsewhere...the authors clearly thought
that they wouldn't have a complete unified example unless they had a scheme for naming
everything, whether it has utility or not. Doing extra for no benefit is not a benefit.
-- Conclusion on the naming convention...very little actual value since it doesn't help
prevent design errors, it doesn't help debug problems it just causes more typing for
no tangible benefit. If you doubt this, then state the tangible benefit in terms of
productivity or maintainability since that is the only metric that could possibly be
affected by this convention. Note: Stating 'I want to know that an object is a signal
or a variable simply by looking at the name' doesn't cut it. Reason: Without stating
the reason why knowing the object type helps your productivity means you're wondering
pointlessly.
- Slide 'The abstracted view in VHDL : the two-process scheme' and successors describing the
method does not justify how productivity would increase with any of the schemes. Examples:
* The collection of signals and/or ports into records groups together things that are logically
completely unrelated other than by the fact that they are pieces of an entity/architecture.
As an example, consider an entity that receives input data, processes it, and then outputs
the data. Further the input data interface has some implementation protocol, the output
has a different protocol, both protocols imposed by external forces (i.e. you can't change
them, maybe they are external device I/O). The natural collection from my perspective is
input interface signals, output interface signals and processing algorithm signals. The
input and output interfaces likely have nothing at all to do with each other...so why should
they be collected into a record together as if they are? Think about it. The input
interface likely has some state machine, the output interface has another, the processing
algorithm possibly a third. Do you think those three state machines should be all lumped
together? No? Then what benefit is it to lump them into a record together? (Hint: None)
But maybe you think there is no cost to doing so...think again. Try to follow the logic for
a particular signal (because when you're debugging real designs that's what you do) and
see how utterly useless you've made the Modelsim Dataflow window by collecting every possible
thing into one process fed by one signal. Guess what? Almost every signal is a function of
only a small handful of actual signals. By lumping this small handful in with a boatload of
unrelated signals by putting them into some bigger record will not help you be more productive.
- Slide 'Benefits'...every single supposed benefit except for 'Sequential coding is well known and
understood' is wrong. How is the 'algorithm' easily extractable as stated? Take the example
I mentioned earlier. There are at least three 'algorithms' going on: input protocol, output
protocol and processing algorithm...and yet the proposed method will lump these together into
one supposed 'algorithm'. What is stated by the author as an 'algorithm' is not really an
algorithm, all it is is the combinatorial logic...
- The other supposed benefits are merely unsubstantiated beliefs...I'll leave it to you to show
in concrete terms how any of them are either generally true. Be specific.
- Adding a port: While I do agree that the method does make it easier to add and subtract I/O,
I'll counter with if you'd put more thought into using a consistent interface protocol in
the first place (example: Avalon or Wishbone) you wouldn't find yourself adding and subtracting
ports in the first place because you'd get it right (or very nearly right) the first time. So
this becomes a benefit of small value over a superior method that mostly avoids the issue.
- Adding a register: There simply is no benefit here. Use a single clocked process and the few
concurrent statements where needed.
- Slide 'Tracing signals during debugging' is ridiculous. This method makes it much harder to
display the signals that are actually relevant to whatever debug you're trying to perform.
Remember that most signals are NOT a function of every other signal that is in these records
so by tracing the record you still have to hunt around and find the elements that you are
actually interested in seeing...no help there, I can see those signals just as easily from the
signal list window. No benefit, medium cost...no thanks.
- Slide 'Stepping through code during debugging'. This is generally the last thing one needs to
do unless you make heavy use of variables in which case you're stuck and forced to single step.
If you use signals you can usually trace through the code without any single stepping and fix
the problem. Remember: when a sim stops due to an assertion, the faulty behavior has already
occurred, no amount of single stepping helps you here because the 'bad' thing (whatever that
may be) has already happened. No benefit here.
- Slide 'Comparison MEC/LEON'...has the skill of the designers involved been controlled for? If not
then this is just a comparison of two different designs, so what? Maybe the LEON people were
simply better designers than the MEC shleps. Not enough information here to determine anything
but you're certainly entitled to infer whatever result you think you see here.
- Slide 'Increasing the abstraction level' is a misnomer. The method described does not increase
abstraction, it simply collects unrelated signals into a tidy (but more difficult to use) bucket.
The supposed 'benefits' and 'problems' are completely unsubstantiated and are simply opinions
presented as 'facts'.
- Slide 'Conclusions'...all opinion presented as unsubstantiated fact.

I may have misunderstood completely the talk but I do not read the
'increasing the abstraction level' slide as a critique to Mike's style
w.r.t. their style.

It sounds better to say 'higher abstration level' than 'collecting unrelated signals'
doesn't it?

On the contrary it seems to me they're trying to give example of higher
level of abstraction (including Mike's one process example) and warn
about what they believe are weak points of these models. Considering the
experience with the LEON vs the ERC32, both quite full fledged projects,
it seems their warnings are not that unfounded (even though this is only
my speculation).

I don't know what they were trying to show with Mike's example.
Which one specifically?
The first bullet qualifies the two-process approach as uniform and I
believe rightly since all entities look the same and a designer only
needs to look at the combinatorial process to understand the algorithm.

Only if you're implementing embarassingly simple entities I suppose. For anything
real, you're making things worse (refer to my simple example stated earlier of a
generic processing module).
Looking at their LEON vs ERC32 example, it seems the method claims less
resources than an ad-hoc one, therefore improving development time.

The most likely reason is more skilled designers rather than style...prove me wrong.
I personally believe the reading is improved by the fact that you do not
need to trace several concurrent processes at the same time and your
flow of reading the code is more or less sequential.

The reading improvement won't be because of the proposed method.
I certainly believe that a readable code is easier to maintain, but I
may doubt that this approach does increase the re-usability of the code,
at least I consider this last point not more than the author's personal
opinion.

A designer's goal is to implement a specific function and meet a specific level of
performance. A method that makes traceability of the function specification to the
source code easier is 'good'. The seperation of that description into a collection of
all combinatorial logic needed to implement the function and a seperate clocked process
does absolutely nothing to define tracability back to the specification. The specification
will most likely have no mention of combinatorial or storage, that is an implementation
decision. Therefore the proposed seperation does not aid tracability and in fact makes
it harder to follow.
Here I need to urge you to go through the presentation again since I
believe, with all due respect, you missed the point. IMO they are *not*
proposing their two-processes approach vs Mike's one process approach.
They actually use Mike's example to show how increasing the level of
abstraction is a good thing.

Nope. Calling it a 'higher level abstraction' doesn't make it so. You've
simply collected together unrelated signals into various records. Records are
a good thing, collecting unrelated things into a record...not so good thing.
The whole point I got, which still might be wrong, is that their
proposed style is way better than what the call 'traditional ad-hoc
design style' (pag. 5).

Different, not better.
Their beliefs are well supported by the example they show (which still
may have some peculiarities and hide larger flows of their approach) and
it seems to me that Mike's style also supports their conclusions.

And you of course are entitled to your opinion as well and I mean that with
absolutely no disrespect or sarcasm or anything else negative.

Kevin Jennings
 
A

alb

On 11/07/2013 19:43, Andy wrote:
[]
Especially when signals are used to trigger combinatorial processes,
there is much more overhead than variables, and even moreso if the
signal is assigned with a delay. I would expect decent simulators to
significantly optimize the overhead on a signal for which there are
no delays or sensitive processes. But there is still the overhead of
separate execution (computing the value to be assigned) and update
(upon process suspension) for signals even within a single process.
Variables lack this duality and its associated overhead. In my
experience, using integers where possible for numeric quantities
(counters, etc.) yields a far more significant simulation performance
improvement than variables vs signals.

That is quite an interesting point, I did not know that the signal
overhead could be optimized so much, I just knew signals had overhead
w.r.t. variables and therefore assumed they required more CPU to simulate.

Why instead integers do show such a performance difference?

[]
On the rare occasion that I need a combinatorial in-out path through
an entity, I would prefer that to be the only situation where I use a
combinatorial process (this tends to make such a practice stand out
in the code, which it should.) Common combinatorial logic feeding
multiple sequential processes can better be expressed by a function
or procedure invoked in each process, IMHO. Synthesis is plenty smart
enough to figure out if only one copy of the logic is really needed.

I should be starting using subprograms far more than I do.
I think maybe you do not understand the effects of the new "all"
keyword in a sensitivity list. It does NOT indicate that the process
is sensitive to all signals visible by the process. "All" indicates
that the process is sensitive to any signal read by the process.
Therefore, it has zero simulation performance impact on a
combinatorial RTL process.

You are right, I did not know that. Nevertheless let me say that I value
the sensitivity list because I can spot immediately what the process
depends on without the need to go through it. On top of it when I write
a process and I believe it should depend on something, then it *must* go
in the sensitivity list, otherwise I take it as a hint that my mental
model is so fragile that I cannot retain it until the 'end process;'.
By embedding the combinatorial process functionality in the clocked
process, the level of control nesting is increased by one
if-statement (or two if asynchronous reset is used). If one really
wants to avoid that increase, then follow Treseler's model and
include the combinatorial functionality in a procedure. However, the
root of the limitation on levels of control is based on testability.
Declaring and using a separate procedure does not improve testabiliy
unless the procedure is externally accessible (e.g. in a package)for
separate testing. However, a procedure declared in a package no
longer has implicit access to signals visible by the process in which
it was originally declared. Such signals would have to be passed
through the procedure call explicitly, preferably through records. In
practice, the reset and clock controls are so ubiquitous in RTL that
it is pointless to consider their impact on testability.

I think I lost you here. Why are you saying that using a procedure does
not improve testability unless the procedure is externally accessible?
I assume the result of the procedure is readily accessible to the
process and therefore can be tested (at least as a black box).

What does make the procedure more testable if it lays in a package?
As for texts and examples on subprograms in RTL, I do not know of any
extensive references. For synthesizable RTL, subprograms are
prohibited from consuming time/delta cycles (e.g. no wait statements
within the subprogram.) Functions cannot contain wait statements
anyway. This effectively limits subprograms to describing
combinatorial logic only. However, this allows replacing the second,
combinatorial process in the authors' style with an equivalent
procedure that can be called from the concurrent process. Thus one
can have their cake and eat it too: one clocked process, but with
explicit separation of register and combinatorial logic descriptions
which some seem to prefer.

Uhm, in Treseler's uart example
(http://myplace.frontier.com/~miketreseler/uart.vhd) if you look at the
procedure called 'retime' it seems to me the logic described is
sequential and not combinatorial. Am I missing your point? Why
subprograms are limited to describe combinatorial logic?
I prefer an RTL style that emphasizes clock cycles of latency, and
let the synthesis tool determine where registers must be inserted to
accomplish said latency. If one uses retiming/pipelining
optimizations during synthesis, the input-output latency is the only
aspect retained by the implementation anyway. This style uses
variables to describe function and latency by controlling the order
of reading and writing a variable during the same clock cycle. Like
Treseler, signals are only used for inter-process communication.

Ian Lang calls it 'the rule of inference':
Within a block of clocked logic, if a variable (VHDL), signal (VHDL)
or reg (Verilog) is used before it is assigned, then it will
synthesise to a flip-flop. Conversely, if it is assigned before it is
used, then it will be optimised away by the synthesiser.

I did not know that even a non-blocking assignment had this feature. I
assume that 'a block of clocked logic' is an:

if rising_edge(clk) then
....
end if;

but then what about the asynchronous reset part? The signal *is*
assigned before it's used with a non-blocking assignment, should it be
optimized away???
A few tips on RTL subprograms.

Concurrent procedure calls are allowed, but are easily confused with
component instantiations (for a purely combinatorial component, which
I rarely if ever use). I avoid concurrent procedure calls in RTL
altogether. If used concurrently, procedures must use constant and/or
signal class interfaces.

On the other hand, do not use signal class interfaces on procedures
called in processes, use variable or constant classes instead. The
delayed update semantics of signals in processes becomes even more
confusing when procedures are involved.

Does the procedure need to wait for the signal to be updated to execute
or it will execute with the not yet updated value of the signal? I would
expect the procedure to execute immediately. What else could happen?
When a subprogram is declared in a process, it has visibility to
anything visible by the process. It also has visibility to any
variables, types or subprograms declared before it in the process
declarative region (before the begin statement).

When declared outside a process, a procedure cannot drive any signal
not explicitly passed to it.

This is why I'm following the code example from Jannick Bergeron
(Writing Testbenches) to design a test harness. If I want to encapsulate
procedures in a package to allow reuse than I have no choice other than
pass to it all the interface signals. Another option would be to make
the signals globally visible, but I fell once in the trap and will never
do it again! :)
Hope this helps,

A lot, thanks.

Al
 
A

alb

On 12/07/2013 04:53, KJ wrote:
[]
What difference do you think it makes? The 'style' it is written in
will make no difference. Synthesis will take the description and
turn it into logic that is implemented in 'gates' (or connections to
LUT) as well as flip flops (or memory). The tools really do not need
any help from the user in figuring out which is which.

Certainly the tool is smart enough to figure that out, but writing code
at a higher level does pay off. Higher level of abstraction has the
purpose to describe the problem in a clearer way (I'm thinking of
'literate programming') for other people to read and maintain.

The methods reported by the authors of ET4351, as well as Mike's and
Ian's examples, have all one intent IMO, reduce the number of concurrent
statements which need to interact. This intent is beneficial since it
foster separation between blocks, reducing the amount of signals my head
has to keep track of when I try to write/read/debug some code.
That does not imply that the 'best' way to describe the code is in
that form.

Agreed, in the end 'best' is what fits the needs case by case and here's
where the designer's experience matter.
The soft metrics for 'best' here should be maintainability. On the
assumption that one can write code that meets the function and
performance requirements in many different ways, then one looks
towards source code maintainability which would be the ability for
somebody (not necessarily the original designer) to get up to speed
on the design. Clumping all of the unrelated code physically
together because it describes the transfer function is rather
pointless. Clumping parts of code together that are inter-related
makes sense.

I cannot agree more, both on the maintainability and the clumping. But
having the 'best' subdivision between functional blocks in a design is
another aspect that sits on top of the style you use. If your design
doesn't have any partition it can easily turn into a mess, if your
design is fragmented in micro blocks it does not make sense either.

Grouping together signals into records which do make sense is certainly
beneficial and still does not prevent the usage of the proposed approach.
I'll just nitpick on a few points in no particular priority, just the
order that they are in the presentation.

I'll nitpick along...
- Other than the last bullet about auto-generated code, nothing on
the slide titled 'Traditional VHDL design methodology' is worth the
time it took to type. It's all wrong. Maybe some people have done
he says here but that doesn't make it 'Traditional'.

I am certainly not in the position to say that I've seen *a lot* of code
in my life, but certainly most of the time I've seen code it was pretty
close to the description they provided (otherwise I wouldn't have needed
to start this thread in the first place). Does this qualify it as
'traditional'? Maybe not, but it does not qualify it neither wrong.

[]
- The next slide 'Traditional ad-hoc design style' is similarly
biased and worthless. Taking the statement 'No unified signal naming
convention' as one example. The upcoming 'unified' naming convention
will not add anything useful (more later).

I find meaningless their proposed 'unified' naming convention, but I
must say that too often I've seen a stunning negligence in the choice of
names. IMO names should convey their purpose/function, rather than type
and or direction.

I remember we had once two sets of tx/rx cables going in and out a
series of avionics boxes and people were convinced that
labeling tx_out/rx_in on one end and tx_in/rx_out on the other would
have been sufficient...we spent few hours before finding the right
combination (ouch!).
The statement 'Coding is done at low RTL level' is laughable. Again
some may code this way, but let's not make elevate those people to be
considered the traditionalists.

is laughable because 'level' is already part of the RTL acronym :)

My first supervisor used to design logic with a schematic entry tool,
with gates and flops. I was asked to learn vhdl though, so I did learn
the syntax (or part of it - at least), but the mindset when designing
was still with gates and flops. Most of the people I know are still not
far from there... I am slowly departing from that paradigm.
- Slide 'Unified signal naming example'. Only the convention for
'types' has any value and that is because use of the type name can
frequently be used throughout the code and it can be of value
sometimes to easily differentiate that xyz is a data type.

Indeed I ignored this slide entirely and only
retained the suggestion to name a type as such (which is what I normally
do in C). Since these guys are
strongly linked to ESA, with a lot of bureaucratic non-sense written
down on piles of documents, I guess this is a some sort of remnant of
their complex yet proficuous partnership.

[]
- Slide 'The abstracted view in VHDL : the two-process scheme' and successors describing the
method does not justify how productivity would increase with any of the schemes. Examples:
* The collection of signals and/or ports into records groups together things that are logically
completely unrelated other than by the fact that they are pieces of an entity/architecture.

Couldn't agree more.
As an example, consider an entity that receives input data, processes it, and then outputs
the data. Further the input data interface has some implementation protocol, the output
has a different protocol, both protocols imposed by external forces (i.e. you can't change
them, maybe they are external device I/O). The natural collection from my perspective is
input interface signals, output interface signals and processing algorithm signals. The
input and output interfaces likely have nothing at all to do with each other...so why should
they be collected into a record together as if they are?

nobody prevents you to combine the input/output/processing signals in
three separate records, you would still benefit of the fact that adding
a signal to a record definition does not have an impact on propagating
the change throughout all the component/instantiation/entity definitions.
Think about it. The input
interface likely has some state machine, the output interface has another, the processing
algorithm possibly a third. Do you think those three state machines should be all lumped
together? No? Then what benefit is it to lump them into a record together? (Hint: None)

but you could still have a three 'two-processes' or one-process
exchanging the necessary information via signals. The interface
separation you have at the entity level (with three different records)
is kept throughout the architecture. I understand that what I'm saying
*is not* what they are saying, but I can equally ask why should the
three functions sit in the same entity? After all if they are only
passing data through a well defined interface, they could equally sit on
separate entities and keep a one record per interface.
- Slide 'Benefits'...every single supposed benefit except for 'Sequential coding is well known and
understood' is wrong. How is the 'algorithm' easily extractable as stated? Take the example
I mentioned earlier. There are at least three 'algorithms' going on: input protocol, output
protocol and processing algorithm...and yet the proposed method will lump these together into
one supposed 'algorithm'. What is stated by the author as an 'algorithm' is not really an
algorithm, all it is is the combinatorial logic...

That is certainly one of the reasons why I would either break your
example in smaller entities, or have three processes running
concurrently and passing data to each other.
- The other supposed benefits are merely unsubstantiated beliefs...I'll leave it to you to show
in concrete terms how any of them are either generally true. Be specific.

'Uniform coding style simplify maintenance': the fact that you have a
pattern in each entity of your code (see Mike's template) let's you read
and maintain only the part where stuff happens, relying on the template
to do its job together with the synthesis tool.

Knowing 'a priori' that somebody else's code is following a logical
separation as the one proposed does shorten the time I need to spend to
understand how pieces are glued together.

In your example I assume that data received from the input are 'passed'
to the algorithm which 'passes' them eventually to the output. The three
functions are concurrent and they need some sort of 'handshake' to pass
the data around, maybe a couple of fifos to be monitored for
'full/empty' conditions... how many things should I know about this code
while I read it? Where should this description go?

With Mike's template something similar can be quite self explaining:

procedure update_regs is
begin
receive_data;
process_data;
transmit_data; -- too bad receive/transmit have different word length!
end procedure update_regs;

A simple glance at this part already tells me a lot about what the code
is doing, without the need to read through often not up to date comments
scattered around.
- Adding a port: While I do agree that the method does make it easier to add and subtract I/O,
I'll counter with if you'd put more thought into using a consistent interface protocol in
the first place (example: Avalon or Wishbone) you wouldn't find yourself adding and subtracting
ports in the first place because you'd get it right (or very nearly right) the first time. So
this becomes a benefit of small value over a superior method that mostly avoids the issue.

I do wish more of my colleagues understood the benefits coming from a
standard interface, but unfortunately I've seen many wheels reinvented
from scratch (and only few of them spinning as they should!).

I would only add that adding/removing I/O from an entity takes a matter
of a couple of key strokes with emacs, so I never found this to be a
real problem.

[]
- Slide 'Stepping through code during debugging'. This is generally the last thing one needs to
do unless you make heavy use of variables in which case you're stuck and forced to single step.

AFAIK if variables are used in processes than you can still use 'add
wave' with the process label and * for everything in the process, but if
they are used in subprograms then it's needed to single step, since they
are popped off the stack when the procedure exits.

I try to design my logic such that is observable and controllable (which
I normally lose the moment I 'transfer' the firmware on some piece of
silicon). I try to keep functional blocks as separate as possible and
with clear interfaces (often going through the entity boundaries), that
I end up tracing.

[]
- Slide 'Increasing the abstraction level' is a misnomer. The method described does not increase
abstraction, it simply collects unrelated signals into a tidy (but more difficult to use) bucket.

How would you increase the level of abstraction instead?
Any pointer to available code would be really useful.

[]
It sounds better to say 'higher abstration level' than 'collecting unrelated signals'
doesn't it?

I must say that it does sound better! Jokes apart, I believe that style
does play a big role in coding and any attempt to propose some is worth
the effort. If, as you say, it does not help in increasing the level of
abstraction, it does help to have the 'bucket tidy' instead of messy.

[]
Only if you're implementing embarassingly simple entities I suppose. For anything
real, you're making things worse (refer to my simple example stated earlier of a
generic processing module).

Mike's example is not so complex, but not so 'embarrassingly simple'
either I suppose.
The most likely reason is more skilled designers rather than style...prove me wrong.

I cannot certainly prove you wrong, I can only say that both are 32bit
RISC processors based on SPARC architecture. Temic (now Atmel) developed
the ERC2 and ESTEC (part of ESA) the first versions of the LEON. A
research center versus an established microcontroller manufacturer... I
guess both had quite skilled designers (but this is not more than a guess).

[]
A designer's goal is to implement a specific function and meet a specific level of
performance. A method that makes traceability of the function specification to the
source code easier is 'good'. The seperation of that description into a collection of
all combinatorial logic needed to implement the function and a seperate clocked process
does absolutely nothing to define tracability back to the specification. The specification
will most likely have no mention of combinatorial or storage, that is an implementation
decision. Therefore the proposed seperation does not aid tracability and in fact makes
it harder to follow.

The separation is still left to the designer to make, sectioning the
code in structural elements (entity, function, procedure, package),
providing the traceability you referred to.

If your entity is cluttered with small processes, which share signals
all together, the traceability issue is not solved either.

[]
Nope. Calling it a 'higher level abstraction' doesn't make it so. You've
simply collected together unrelated signals into various records. Records are
a good thing, collecting unrelated things into a record...not so good thing.

Would you qualify Mike's template as a hardware description at higher
level of abstraction? if yes than I do not see where's the *big*
difference w.r.t. Ian's example or the presentation examples.

If not than I should ask you again to point me to some code example that
you consider written at a higher level of abstraction, because there is
where I aim to go.

[]
And you of course are entitled to your opinion as well and I mean that with
absolutely no disrespect or sarcasm or anything else negative.

I'm enjoying the ride.

Al
 
A

Andy

Why instead integers do show such a performance difference?

SLV/unsigned/signed, etc. are arrays of enumerated values (for each bit). Even if the enumerated value is represented as a byte, each bit is then a byte in memory, and the CPU's built-in arithmetic instructions cannot be usedon the vector directly. The best bet for the simulator (rather than the boolean implementations in the reference package bodies) is to convert it to integer(s) where the integer(s) represent(s) the numerical value (assuming there are no meta-values in the vector), and then use the CPU's arithmetic instructions on those integers.

Using integers directly where possible (within the range constraints available on integers) avoids all that conversion and memory usage.

MANY years ago, I replaced a widely-used 5 bit unsigned address bus with aninteger in a ~20K gate FPGA design, and that alone improved RTL simulationruntimes by ~2.5X IIRC.
Nevertheless let me say that I value the sensitivity list because I can spot
immediately what the process depends on without the need to go through it.

If you limit signals to only that information that is communicated between processses, and use local variables for everything else, then your benefit from the explicit sensitivity list is minimal. If only some of many processes share a group of signals, then enclose those few processes, and declarations for their exclusively shared signals, in a block statement, so that the signals are not only hidden from outside access, but also declared in close proximity to the processes that use them.
Why are you saying that using a procedure does not improve testability unless the procedure is externally accessible?

Testability of the architecture that uses subprograms is improved by separately testing each subprogram, without the process/architecture around it, hindering access to it. If a subprogram is declared in an architecture or ina process, how would you call the subprogram directly from a testbench to test it? For it to be called by some other architecture (e.g. testbench) the subproram must be declared in a package accessible to the testbench.

I'm not trying to say that all subprograms should be declared in packages. There are many reasons to use subprograms, and improved testability is onlyone of them. Localization of data (variables) and complexity is another. IMHO, since I'm not a big fan of unit level testing FPGA designs anyway, improved unit level testability is not as big a reason for using subprograms as is the localization.
Uhm, in Treseler's uart example
(http://myplace.frontier.com/~miketreseler/uart.vhd) if you look at the
procedure called 'retime' it seems to me the logic described is sequentialand
not combinatorial. Am I missing your point?

In synthesis, a suprogram cannot span time, and it cannot remember local variables' values from one subprogram call to the next. It can control the order of its access/update of its interface variables, so it could be considered as inferring registers. However, in Treseler's example the previous values of the variables are assigned in a different (in time) call to the procedure. Thus it is also the surrounding process that infers the registers, and the procedure could be considered as simply defining the combinatorial logic (wires in this case) between them.

On the other hand, if the order of the variable assignment statements in that procedure were reversed, only one register would be inferred. So in thatsense, the procedure is the one inferring at least two of the registers. Good point! I had not thought of that.

The important thing to remember about variables in clocked processes is that each reference to a variable, relative to its most recent update, determines whether a register is inferred for that reference. Thus a single variable declaration can represent, through multiple references and assignments to it, both a register and combinatorial values. Naturally, if multiple references to the variable return the same previously stored value, only one register is inferred for all of those references.

Signals can be thought of as inferring registers in the same way (if the value accessed was updated in a previous clock cycle). But since signal valueupdates are always postponed until the process suspends, a reference to a signal assigned on a clock edge is necessarily the value stored in a previous clock cycle, thus all references to said signal are to the registered value.

In that sense, variables and signals both infer registers for the same reason, but you have more control over that inference with variables. But don'tworry; the synthesis tool will generate hardware that behaves the same waythe RTL does (on a clock cycle basis), with or without registers as required.
Does the procedure need to wait for the signal to be updated to execute orit
will execute with the not yet updated value of the signal?

Since a procedure in synthesis cannot wait (span time/delta), the not-yet-updated value will be used. Similar confusion also arises when a statement following the procedure call that assigned a signal also sees the non-yet-updated value. Signals assigned within the procedure are not automatically updated when the procedure exits, only when the calling process suspends.
Another option would be to make the signals globally visible, but I fell once
in the trap and will never do it again!

Don't confuse accessibility/visibility of signals with driveability of signals from a procedure. The only way a procedure can DRIVE signals not passedto it explicitly is if the procedure is declared in a process, regardless of whether it can "see" the signals. If declared in a process, the procedure can drive any signal that can be driven from the process, whether the signal is passed to the procedure or not.

There are a couple of options to consider to make testbench procedures moreeasily reusable.

First, you can declare a procedure, with signal interfaces, in a package, and then within each process that needs to call it, you can overload it witha declaration that omits the signal interfaces, and that procedure simply calls the full version procedure with signals. Then you can more easily call the short-hand version procedure anywhere in that process, as many times as needed.

Second, you can declare records for the signal interface, either for separate in and out record ports, or for a combined inout record port, but the inout record type must either be resolved (implement a custom resolution function for it), or it must contain elements of resolved types (e.g. SL/SLV/Unsigned, etc.)
The record makes calling the procedure over and over much simpler.

Also, don't forget; in simulation (testbenches), procedures can have wait statments, and therfore can span time during their execution.

Andy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top