Coding style, wait statement, sensitivity list and synthesis.

R

Rob Dekker

Jonathan Bromley said:
Sorry, I emphatically disagree with you here.

It is absolutely NOT the purpose of synthesisable HDL to guarantee
that your code will synthesise on any platform. FPGA users (at least,
the non-naive ones) already know this - they understand how to write
behavioural code that will synthesise, for example, to dual-port RAM
that may or may not exist in a certain target. The code is completely
valid even if the target can't implement it.

Mmm. I think this is different. The RAM extraction is optional in most tools,
and there is an acceptable replacement pattern : plain old RTL synthesis
using latches or dff, with address decoders and select latch/dff-enables.
The front-end can recognize the RAM, but if the technology mapper does
not have a RAM that fits there, they put the replacement pattern in there and proceed.
Simulation-synthesis behavioral consistency is still guaranteed.

The RAM thing IS similar to the support of tristate drivers from Z value assignments:
Some target technologies do not have (internal) tristates.
But if that is the case, the synthesis tool (after the front-end) will replace
the tristates with muxes, and proceed. As long as there are no bus-conflicts
or internal dangling busses (and there should not be any in a good design),
the again simulation-synthesis behavioral consistency is still guaranteed.

Either way, you can code-up Z values and RAM-like behavior and synthesis
WILL synthesize it to something that is behaviorally consistent with the RTL model.

The dual-edge problem is different since there is no 'general' replacement
pattern that will synthesize to something that fits in any technology (standard primitives)
and is still behaviorally consistent with simulation.

Until now : Andy Jones came up with an elegant solution, and we will implement that.
See side-thread.

[...]
Absolutely. But I think you are looking at this from the point of
view of a writer of an HDL front-end. That is absolutely the wrong
place to ask technology-specific questions. It is the right place to
decide what hardware the HDL implies, leaving it up to a later
stage of the synthesis to decide whether the target technology
can implement that function. That later stage is perfectly capable
of issuing fatal errors on its own account. The front end should
only ask "is the code synthesisable in principle, given the standard
synthesis subset". That subset can, and should, include stuff that
is synthesisable to some hardware platforms but not to others.

I agree. "is the code synthesisable in principle, given the standard synthesis subset".
The standard set of rules is written down in 1076.6,
Now where do you see that there is anthing technology-specific about these rules ?
Once again I ask: what is different about saying "register
initialisation is OK, but this device can't do it" and saying
"a megabit of embedded RAM is OK, but this device can't
fit it"? In both examples, the front end should correctly
infer the desired function and a mapper should correctly
determine that it can't be done.

I think you are comparing apples and oranges....

Are you implying that the front-end should synthesize the entire
language, preserve ALL info and pass it through to the back-end
(mapper and such), who will then decide what should be done with it ?

Now, the Verific RTL front-end is very flexible, and we are preserving
initial values, delay info, event expressions and messages for some of our
customers, so this is something that we can already do to a great extend.

But our customers' application decides what they want to allow and what they
want to block from getting to the back-end.

RTL synthesis back-end does not need all that info, and it would
be a terrible waste of cpu time and memory if we would compute it and
pass it along to the back-end.
So the front-end warns, ignores, or errors out on some of the constructs
that will not go through to the back-end.

We typically error out when the simulation-synthesis behavioral consistency is seriously
compromised, given the primitive set that the front-end needs to use.
Not while there are cost/performance/size/power/time-to-market
tradeoffs, no, we're not. Until you can provide us with a tool that
will, without designer intervention, translate a spec into an
optimised design on an automatically-chosen optimal target,
please support us properly in our efforts to explore those tradeoffs.

Say hi to Doris from me..
 
R

Rob Dekker

Thanks for the links Jerry !

We may have a good solution for DDR design modeling for synthesis from VHDL (See Andy's side-thread)

The Verilog DDR styles in these links below make me doubt that we can do the same for Verilog.
Verilog has no explicit edge expressions (other than in the sensitivity lists), so it will be much harder to do.
At least it will be much more non-standard...

Rob

Jerry Coffin said:
Rob Dekker wrote:

[ ... ]
OK. You have my interest. Tell me about Double Data Rate design !

Does it use both edges of the same clock ? or two separate clock lines that are fixed-phase-shifted ?

That depends. Just for one obvious example, DDR2 SDRAM defines both DQS
and DQS# signals, but makes use of the DQS# signal optional, so a
design might use both rising and falling edges of DQS, or it might use
a single edge each of DQS and DQS# (oh, I should add that the DDR2
standard calls them DQS and /DQS).

Micron has some applicable information at:

http://download.micron.com/pdf/technotes/ddr2/TN4711.pdf
How would you write a DDR design in VHDL for simulation ?

I doubt they're synthesizable, but most of the usual DRAM manufacturers
have simulation models available for free download. I haven't looked
through them to see exactly how they deal with the DQS and DQS# lines
though. Hynix has models for various sizes here:

http://www.hynix.com/eng/02_products/08_technical/02_06_03.jsp

I've usually gotten models from Micron in the past, but they don't have
VHDL models available for their current parts (at least any I've looked
at) -- though if you want Verilog, IBIS or HSpice models, you can get
them, such as here:

http://www.micron.com/products/dram/ddr2sdram/part.aspx?part=MT47H256M4BT-5E

This is for a 1 Gb part, but I believe the strobe logic should be
constant across sizes, widths, etc.
 
J

Jerry Coffin

Rob said:
Thanks for the links Jerry !
Surely.

We may have a good solution for DDR design modeling for synthesis from VHDL (See Andy's side-thread)

Yup -- I've been watching it closely.
The Verilog DDR styles in these links below make me doubt that we can do the same for Verilog.
Verilog has no explicit edge expressions (other than in the sensitivity lists), so it will be much harder to do.
At least it will be much more non-standard...

[warning: semi off-topic nonsense to follow]

Oh, it's no problem at all. You just use "always @(posedge clk or
negedge clk)" and then click your heels three times and say "There's no
logic like DDR. There's no logic like DDR. There's no logic like DDR!"
Next thing you know, you'll wake up in Kansas and ask your dear auntie
Em why your monitor's suddenly producing black and white output... :)
 
J

Jonathan Bromley

On Fri, 06 Jan 2006 00:36:37 GMT, "Rob Dekker"


[...]
RAM extraction is optional in most tools,
and there is an acceptable replacement pattern : plain old RTL synthesis
using latches or dff, with address decoders and select latch/dff-enables.
The front-end can recognize the RAM, but if the technology mapper does
not have a RAM that fits there, they put the replacement pattern in there and proceed.
Simulation-synthesis behavioral consistency is still guaranteed.

OK, that's fair. Obviously the RTL implementation won't be
practicable, but it will match simulation, and your responsibilities
are duly discharged.
The RAM thing IS similar to the support of tristate drivers from Z value assignments:
Some target technologies do not have (internal) tristates.
But if that is the case, the synthesis tool (after the front-end) will replace
the tristates with muxes, and proceed. As long as there are no bus-conflicts
or internal dangling busses (and there should not be any in a good design),
the again simulation-synthesis behavioral consistency is still guaranteed.

All agreed.
"is the code synthesisable in principle, given the standard synthesis subset".
The standard set of rules is written down in 1076.6,
Now where do you see that there is anthing technology-specific about these rules ?

IIRC there is a coding style mandated in 1076.6 that synthesises to
asynchronously-loadable flops. Can't be done in many FPGAs.
It was precisely because that style was included that I felt it should
be legitimate to incorporate another: register initialisation.
Are you implying that the front-end should synthesize the entire
language, [...]

No, of course not. But I AM saying that there are some coding
styles that synthesise to perfectly reasonable hardware but
nonetheless may be unimplementable on some platcforms.
It's not too hard to find more examples, particularly when we
look at synthesis of I/O functions: how about a weak-keeper
(bus hold) using something equivalent to this...

process (Pad, Drive, Enable)
function Weaken(S: std_logic) return std_logic is
constant S_X01 : std_logic := to_X01(S);
begin
case S_X01 is
when '1' => Pad <= 'H';
when '0' => Pad <= 'L';
when others => Pad <= 'W';
end case;
end;
begin
if Enable = '1' then
Pad <= Drive;
else
Pad <= Weaken(Pad);
end if;
end process;

The point of all this, as always, is to get RTL code that both
implies my desired hardware and provides a functionally
correct simulation model. If I then try to port my inherently
non-portable design to a different device that doesn't support
it, I will find out soon enough and be able to do something
about it. I don't much like the idea of tools forcing me
to instantiate a technology primitive
(or, worse still, tag the I/O pad with synthesis attributes
or vendor constraints that won't affect functional simulation)
merely because some other device happens not to support
that function.
Now, the Verific RTL front-end is very flexible, and we are preserving
initial values, delay info, event expressions and messages for some of our
customers, so this is something that we can already do to a great extend.
But our customers' application decides what they want to allow and what they
want to block from getting to the back-end.

I didn't know about the configurability. Sounds like an even stronger
reason to process as many plausible synthesis styles as possible,
then let your OEM's application sieve out any it doesn't like.
RTL synthesis back-end does not need all that info, and it would
be a terrible waste of cpu time and memory if we would compute it and
pass it along to the back-end.

Sorry, I really think that's irrelevant. Synthesis tools (in my
experience) spend only a tiny fraction of their runtime and memory
on the front-end inference. There is *so* much more horsepower
needed to do all the back-end mapping and optimisation stuff that
even quite a large increase in front-end resource usage would be
at worst a minor inconvenience.
We typically error out when the simulation-synthesis behavioral consistency is seriously
compromised, given the primitive set that the front-end needs to use.

And the whole design community is very grateful for that!
Say hi to Doris from me..

done :)
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK
Tel: +44 (0)1425 471223 Email: (e-mail address removed)
Fax: +44 (0)1425 471573 Web: http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.
 
A

Andy

Rob,

I agree with your stipulations on explicit edge semantics in the RTL.

As to your questions,

I don't know if there is a 'standard' netlist implementation for this
function, or if this implementation matches it. It appears to be an
optimal implementation in terms of resources, and is probably optimal
WRT performance as well. The choice of XOR or XNOR encoding/decoding
may be relevant to some target architectures (see set/reset comments
below)

I can read verilog, but writing a good verilog description of this
implementation probably exceeds my capabilities. I would think it
would be very straightforward though. Maybe someone else can chime in
with one?

If you think of XOR as a two bit parity calculation, and parity is a
function whose output can be determined by any single input, then an
N-clock implementation would require N flops and N+1, N-bit-wide parity
generators. Even/odd parity can be chosen based on N and/or target
architecture preferences, and set/reset requirements (see set/reset
comments below).

The implementation for a DDR (both edges of the same clock) register is
no different.

A couple of extra comments:

Pipelines of double clock registers, where intermediate values are not
needed, can be optimized to two simple SDR pipelines, with the encoding
circuitry only needed on the input(s) to the pipeline, and decoding
circuitry only needed at the output(s). This could also affect the
implementation choices for state machines relative to one-hot vs binary
encoding.

For reset, the two flops only need to have the same value, not
necessarily both zero. For set, there are two options: either the two
flops need to have different values, or XNOR gates can be used for
encoding/decoding, thus both registers can be reset/preset to the same
value. Using XNOR gates may allow better results given some
architectures' limitations for flops in the same slice/clb/etc. having
same set/reset requirements (perhaps useless if they also require the
same clock/edge), but it could also get confusing with gate level
simulations and/or formal verification (like that would't be a total
mess anyway!)

I have spoken with our local synplicity AE about this, but have not
lodged a formal enhancement request. I did not realize until this
thread that 1076.6 included DDR semantics. I shall renew this with them
through official channels. They already infer DDR IOB primitives when
the target allows. We also use synopsys on a limited basis for fpga's,
but they're so bone-headed about not even inferring ram from rtl arrays
that I seriously doubt they would be receptive to this.

Thank you,

Andy Jones
 
A

Andy

Rob,

(I hope this is not a duplicate post... at any rate there are a few
other minor points here too.)

I agree with your stipulations regarding explicit edge semantics in the
inferring RTL description.

Among other things, the ability to directly infer double clock
registers would allow the design of double clock state machines using
enumerated types, so that the synthesis tool can make optimal
implementation decisions with regard to state mapping, etc.

Answers to your questions:

I don't know whether there is a "standard" netlist implementation of
DDR registers or not. The implementation I have shown appears to be
optimal with regard to both resource consumption and performance.
Depending on the target architecture, either XOR or XNOR
encoding/decoding may be preferable. See comments on set/reset below.

I don't know verilog well enough to write a description of this, but
one should be rather straightforward. Hopefully someone else can chime
in and provide one (OTOH, this is a vhdl group...)

Since XOR can be thought of as a two bit parity generator, and parity
is a function whose output is determinable by any single input (given
the other inputs), an N clock implementation could use N flops and N+1,
N bit parity generators. Even or odd parity could be chosen based on
architecture, N, and/or set/reset requirements. See set/reset comments
below.

For a DDR (both edges of the same clock) the implementation is the
same. However, I would add the following edge semantics for the
inferring code in the case of single input two clock registers:

process (clk1, clk2, reset)
begin
if reset then
q <= '0';
elsif rising_edge(clk1) or rising_edge(clk2) then
q <= data; -- data1 = data2 = data
end if;
end process;

The above would probably be the most common form for a DDR state
machine: one that reacts to inputs, changes states, and creates outputs
on either clock.

Other comments:

If a pipeline of two-clock registers is desired, only the pipeline
input(s) need encoding, and only the pipeline output(s) need decoding.
Internal delay stages can be simplified to two ordinary single clock
pipeline stages. This optimization may also affect some implementation
style decisions for state machines (i.e. one-hot vs binary encoding).

If a reset is required, the two flops only need the same value
(assuming XOR encoding/decoding), not necessarily '0'.
If a set is required, there are a couple of options. Either set &
preset the two flops to different values (assuming XOR), or use XNOR
and set or preset both flops the same. Some architectures require
flops in the same block to have the same set/reset configuration
(although they usually also require the same clock if not clock edge),
so this may or may not be an architecture-specific implementation
decision. Also, as stated above, some architectures may prefer XNOR
over XOR mapping. Note that not always having the same XOR/XNOR
mapping, especially if based on set/reset requirements, may impact
formal verification, or gate level simulations, but then again, both of
these activities are likely to be impacted by this anyway.

I have mentioned this to our local Synplicity AE, but I have not filed
a formal enhancement request. I did not realize until this thread that
1076.6-2004 specified dual clock behaviour. I shall submit a formal
enhancement request for this feature. We also use synopsys FPGA
synthesis tools on a limited basis, but they're so bone-headed about
not even inferring ram from RTL arrays, I seriously doubt they'd
consider something like this.

Thanks for your interest in this capability!

Andy Jones
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top