Mixed clocked/combinatorial coding styles (another thread)

W

whygee

Hi,

I have loosely followed the threads on this group and tried to extract some
wisdom from everybody. This is quite useful to me /now/ because I'm
implementing a "simple" SPI master interface for my Actel board.
So I tried to use a single process idea as described by M. Treseler
and his interesting http://mysite.verizon.net/miketreseler/uart.vhd

However my circuit is not a UART. Also I don't much like "state machines"
(I try to avoid them unless really necessary) and I don't see them as
a one-must-fit-all problem solver. And (most importantly)
I need to integrate a clock divider...

I've been busy on this subject for the last 48 hours and I have been
through at least 3 implementations. First I thought about how
the gates would be connected and used to implement the function.
Then I thought "Nah, Mike would beat me if i do that" so I restarted
from scratch, in a "descriptive" and process-driven way.

* As I read the SPI description, like
http://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus
I see that the sampling on the input is on the opposite edge
of output shifting. This means that I can't use the same register
in the same process and in the same "if (clk'event and clk='1')"
statement. Register reuse (as advocated in the UART document)
is not possible here.

* I had put everything as variables. Then it became obvious
that the ORDER of the assignations is important, while
it is less critical for signals. Well, in a process, you should
care not to write the same signal twice, but it is a blood-thirsty
2-edge sword : it can simplify some statements, but the synthesizer
will maybe not warn when you assign 2x (or more) a value to the
same signal. Or variable.

* Finally, my 2nd thought gave birth to an intricated, monolithic
piece of dense code. It was a sort of huge state machine without the name,
and it would always run at CPU frequency, even though most signals/variables
were updated very infrequently : This was a waste of clock signals,
of energy and routing resources. And I don't think it would have helped
me prevent bugs (despite being seemingly easy to write).

* Oh, and P&R results gave disapointing numbers. It was bloated.
Mostly because the FFs had to support set/reset/enable and this
did not map : a single bit of storage turned into 4 logic elements
(each corresponding to a sort of 3-LUT in Actel technology).
Speed was ok but that did not make me feel more comfortable.


Style depends a lot on the writer and his perception of the design.
For me it's not a problem to write 3 (or even much more) versions of the same thing,
as I'm not "paid for it" (I could be described as a professional hobbyist,
not a full-time engineering mercenary). Experience varies, and various tricks have
different acceptance.

So I decided to split the thing into several parts :
the CPU interface, the programmable clock divider,
the receive shift register and the sender.
Each one can have its own clock so
- power consumption is reduced
- Enable and reset pins can be avoided every time it's possible
(which is very important as I realised that my toolchain does
not accept to have BOTH enable and set/reset despite the
ability of the logic array : This led to a big bloat for the
precedent design).

Working with different clocks is tricky and this is also why
"one big process" can't be the 100% recommended solution
for every design.

Also, splitting up things allows one to test each part
separately (in my case, i check the logic that is generated
and measure the compactness). One can more easily see where
the synthesizer uses the best approach, so it can be guided
to the expected result. When everything is packed together,
it's less easy...

Also, the behavioural/variable/process approach is
not compelling (for me) at least for the final implementation.
I am not concerned by simulation speed gained by the compacity
and the high level of code description. This does not keep
me from using the features of the language (genericity,
modularity, portability etc.)

http://www.designabstraction.co.uk/Articles/Advanced Synthesis Techniques.htm
makes some good points. However I put emphasis on the quality
of the result and I don't care if the source code is bloated
as long as i get what i want in the end. And I am used
to think in the "mousetrap" way for so long that I have
my own methods to ensure that everything works in the end.

I admit however that high-level behavioural description/coding
can be helpful to understand the difficulties and dark corners
(side effects and special cases) of some circuitry.
Also, now, I use variables in a slightly different way :
- variables inside processes to hold temporary results for logic/boolean stuff
- signals when FF is desired.
So it's less prone to unwanted FF inferences and i get a warning when
the order of the variable assignation could cause a trouble.

Finally, I have gotten very nice results :
- the circuit uses 2x less cells than before
- the code uses signals and variables and about 4 processes
- an interesting technique avoids glitches and cross-domain issues
- One saturated Gray counter is used (that could be thought of as a disguised FSM)

However the design is not finished and I'm curious how the
validation will find errors where and why.
A few days will pass before I can draw the first conclusions.

But I really like VHDL because it is very expressive : one has the choice
to use ANY level of description, and (even better) it is possible
to MIX any levels together ! So it is always possible to reach the
desired result in one way or another when a trouble or limitation
appears somewhere. It is also possible to learn this language
progressively, starting with dumb RTL and mastering functions years later.

YG
 
R

rickman

Hi, ....snip...
* As I read the SPI description, likehttp://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus
I see that the sampling on the input is on the opposite edge
of output shifting. This means that I can't use the same register
in the same process and in the same "if (clk'event and clk='1')"
statement. Register reuse (as advocated in the UART document)
is not possible here.

Why not? You can use a single FF to delay the signal so that the data
is shifted from one edge of the clock to the other. Then the same
shift register can be used for both input and output.

Rick
 
W

whygee

Hi !
Why not? You can use a single FF to delay the signal so that the data
is shifted from one edge of the clock to the other. Then the same
shift register can be used for both input and output.

I thought about that, of course.

however i have used a different approach.
Maybe i'll change that again, as it takes more room
than just a FF, but it avoids off-by-ones
and other corner cases while remaining glitch-free and asynchronous.

I eye on the ENC28J60 which has, at least for the
first silicon revisions, particular clocking issues
requiring my SPI core to use either internal or external
clock sources. It will be cool to have a tiny 10Mbps Ethernet
tranciever hooked to my 16-bit core in a FPGA :)

Anyway, this "small" SPI example is an eye-opener
and I learnt a few asynchronous design tricks
(about clock synchronisation, not FIFO) yesterday :)

regards,
yg
 
W

whygee

rickman said:
Why not? You can use a single FF to delay the signal so that the data
is shifted from one edge of the clock to the other. Then the same
shift register can be used for both input and output.

Something just struck me :

I understand that the slave SPI device samples MOSI on the opposite clock
edge that shifts the bit out. This ensures that setup&hold is roughly
symetric and leaves some margin.

However,

my master SPI controller emits the clock itself (and resynchronises it)
so for MOSI, the system can be considered as "source clocked", even
if the slave provides some clock (it is looped back in my circuit).
So i can also sample the incoming MISO bit on the same clock edge as MOSI :
the time it takes for my clock signal to be output, transmitted,
received by the slave, trigger the shift, and come back, this is
well enough time for sample & hold.

I am thinking more about the "fast" applications (20MHz) where
the propagation delays start to reduce the sample time margin.

This is not what i intend to implement (i sample at opposite
edges), but this is an interesting discussion anyway :
most sampled/latched digital logic i know sample at one
edge (or another but not both).

cheers,
yg
 
W

whygee

whygee said:
I thought about that, of course.

however i have used a different approach.
Maybe i'll change that again, as it takes more room
than just a FF, but it avoids off-by-ones
and other corner cases while remaining glitch-free and asynchronous.

Oh, now I remember exactly why i use the current, seemingly crazy method.

This is because the SPI clock is different from the CPU clock.
The shift register (in shared/single configuration) must have 2 clock sources,
one for loading the register, another for shifting.
When only the CPU clock controls the shift, it is very easy, but
NO FF i know has dual clock inputs. I have found a mostly equivalent
circuit with 3 FF and 1 MUX2 but ... heeeeek ! That would require at least
64 Actel cells for a 16-bit register (plus the rest).

To reduce the cell count, one could switch from CPU clock to SPI clock and back.
I know how to do this and there would be 16FF, 16MUX (2xless).
However, the switching time creates excessive write latencies.

-oO0Oo-

The solution I have implemented is a bit unusual for a SPI master (AFAIK) but
gives satisfying characteristics :

- the receive register is a standard shift register. Nothing fancy,
I only added a single AND gate to "mask" the 8 MSB in 8-bit mode.
The data is easily read by the CPU, when a "ready flag" is read,
so no Clock Domain Crossing issue here.

- the emit system is more interesting :
- A "classic" 16-bit data register is written by the CPU in the CPU clock domain.
- a kind of FSM (5-bit Gray counter with saturation) operates in the SPI clock domain.
The clock is already filtered and resynchronised using techniques from
http://www.design-reuse.com/articles/5827/techniques-to-make-clock-switching-glitch-free.html
I add one cycle of delay to let everything settle, before starting the counter.
- the Gray counter (quite compact) controls a 16-input MUX2 tree, whose output goes to MOSI.

Result : 32FF, 16 MUX (48 cells instead of 64) and any clock works. Speed is good too.

I have found another structure for the emitter (one-hot encoding the shift register FSM
and AND-OR the FSM with the Data Out register), but it was marginally larger
(though less challenging than addressing a bit-reversed Gray counter in a MUX tree)
and could not prevent some bit-to-bit transition glitches.


If someone has a better idea, please tell me ...
But when 2 different unrelated clocks are used, there is no solution
using less than 1 FF per side. Then again, I'm pretty sure that
I am reinventing the wheel for the 100001th time :)


have a nice week-end,
yg
 
M

Mike Treseler

whygee said:
This is because the SPI clock is different from the CPU clock.

Could SPI use the CPU clock also?
If someone has a better idea, please tell me ...
But when 2 different unrelated clocks are used, there is no solution
using less than 1 FF per side. Then again, I'm pretty sure that
I am reinventing the wheel for the 100001th time :)

Is there some reason that a handshake
won't work in both cases?

-- Mike Treseler
 
K

KJ

whygee said:
Oh, now I remember exactly why i use the current, seemingly crazy method.

This is because the SPI clock is different from the CPU clock.
The shift register (in shared/single configuration) must have 2 clock
sources,
one for loading the register, another for shifting.
When only the CPU clock controls the shift, it is very easy, but

Since you said you're implementing the SPI master side, that implies that
you're generating the SPI clock itself which *should* be derived from the
CPU clock...there should be no need then for more than a single clock domain
(more later).
If someone has a better idea, please tell me ...
But when 2 different unrelated clocks are used, there is no solution
using less than 1 FF per side. Then again, I'm pretty sure that
I am reinventing the wheel for the 100001th time :)

The CPU clock period and the desired SPI clock period are known constants.
Therefore one can create a counter that counts from 0 to Spi_Clock_Period /
Cpu_Clock_Period - 1. When the counter is 0, set your Spi_Sclk output
signal to 1; when that counter reaches one half the max value (i.e.
"(Spi_Clock_Period / Cpu_Clock_Period/2") then set Spi_Sclk back to 0.

The point where the counter = 0 can also then be used to define the 'rising
edge of Spi_Sclk' state. So any place where you'd like to use
"rising_edge(Spi_Sclk)" you would instead use "Counter = 0". The same can
be done for the falling edge of Spi_Sclk; that point would occur when
Counter = Spi_Clock_Period / Cpu_Clock_Period/2.

Every flop in the design then is synchronously clocked by the Cpu_Clock,
there are no other clock domains therefore no clock domain crossings. The
counter is used as a divider to signal internally for when things have
reached a particular state.

KJ
 
K

KJ

whygee said:
However,

my master SPI controller emits the clock itself (and resynchronises it)

No need for the master to resynchronize something that it generates itself
(see my other post).
so for MOSI, the system can be considered as "source clocked", even
if the slave provides some clock (it is looped back in my circuit).

I don't think you understand SPI. The master always generates the clock, it
is up to the slave to synchronize to that clock. The master never has to
synchronize to the SPI clock since it generates it.
So i can also sample the incoming MISO bit on the same clock edge as MOSI
:
the time it takes for my clock signal to be output, transmitted,
received by the slave, trigger the shift, and come back, this is
well enough time for sample & hold.

See my other post for the details, but basically you're making this harder
than it need be. Since the master is generating the SPI clock it knows when
it is about to switch the SPI clock from low to high or from high to low,
there is no need for it to detect the actual SPI clock edge, it simply needs
to generate output data and sample input data at the point that corresponds
to where it is going to be switching the SPI clock.

KJ
 
W

whygee

Hi !

Mike said:
Could SPI use the CPU clock also?
Yes, this was in the first implementation.
But some slaves have clocking restrictions.
Also, my (current) clock divider is not large
enough to output quite low frequencies
(for example, < 1MHz for very low voltage SPI Flash memories)

Now i can select between the internal, divided clock,
or an external pin.
Is there some reason that a handshake
won't work in both cases?

Do you mean a handshake between a CPU-driven 16-bit register,
and a SPI-clock 16-bit shift register ? That would work,
why not, but it is better or safer, or smaller ? That would amount,
just as in my current implementation, to 32FF and 16MUX.

I could strip one row of 16 FF by exploiting the fact
that there is a latch somewhere in the CPU datapath,
between the pipeline and the I/O section,
but i can't garantee when this datapath latch is updated :
the CPU may want to poll the control register, or access
another peripheral, which would destroy the
emitted data before it is sent.

The VHDL code may be shorter
but density is not an issue for me because
it's not necessarily related to coding time.

regards,
-- Mike Treseler
YG
 
W

whygee

Hi !
Since you said you're implementing the SPI master side, that implies that
you're generating the SPI clock itself which *should* be derived from the
CPU clock...there should be no need then for more than a single clock domain
(more later).

As pointed in my previous post, there is at least one peripheral
(ENC28J60 revB4) that has clocking restrictions
(also know as "errata") and I happen to have some ready-to-use
modules equipped with this otherwise nice chip...

I don't know if my chip revision is B4 and the errata
suggest using a clock between 8 and 10MHz.
However, it also suggest using the ENC28J60-provided 12.5MHz
output : I'm ready to add an external clock input in the master
if i'm allowed to "legally" go beyond the 10MHz rating
(a 25% bandwidth increase is always a good thing, particularly
with real-time communications).

As another "unintended case", an external clock input opens
the possibility to bit-bang data with some PC or uC.
I know it sounds stupid :) but I had a project 10 years
ago that would stream bootstrap code
to a DSP through the PC's parallel printer port.
ADi's SHARC had a boot mode where a DMA channel
loaded code from outside, and I had found a trick
to single-cycle the transfer with external circuits.
That's very handy for early software development,
more than flashing the boot memory all the time...
Now, if I can stream external boot code WITHOUT the
hassles of external circuitry (which was a pain
to develop without the test devices I have now),
that is an even better thing.

For me and in the intended application, that's enough
to justify another clock domain.
If I had no ready-to-use ENC28J60 mini-module,
I would not have bothered.
The CPU clock period and the desired SPI clock period are known constants.
They are indicated in the datasheet of each individual product.
And there is no "SPI standard" contrary to I2C or others.
( http://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus#Standards )
Some chips accept a falling CLK edge after CS goes low,
and some other chips don't (even chips by the same manufacturer vary).

So i have read the datasheets of the chips i want to interface,
and adapted the master interface to their various needs (and errata).
Therefore one can create a counter that counts from 0 to Spi_Clock_Period /
Cpu_Clock_Period - 1. When the counter is 0, set your Spi_Sclk output
signal to 1; when that counter reaches one half the max value (i.e.
"(Spi_Clock_Period / Cpu_Clock_Period/2") then set Spi_Sclk back to 0.
I have (more or less) that already, which is active when the interal
CPU clock is selected. This is used when booting the CPU soft core
from an external SPI EEPROM.

Note however that your version does not allow to use the CPU clock at full speed,
what happens if you set your "max value" to "00000" ? And it does not garantee
that the high and low levels have equal durations.

But i'm sure that in practice, you will do much better
(and i still have a few range limitations in my the clock divider,
i'll have to add an optional prescaler).

Here is the current (yet perfectible) version :

clk, ... : in std_logic; -- the CPU clock
....
signal clkdiv, -- the frequency register
divcounter : std_logic_vector(4 downto 0); -- the actual counter
signal ... SPI_en, lCK, ... : std_logic;
begin
....

-- free-running clock divider
clock_process : process(SPI_en, clk)
variable t : std_logic_vector(5 downto 0); -- holds the carry bit without storing it
begin
-- no reset needed, synchro is done later
if (clk'event and clk='1' and SPI_en='1') then
t := std_logic_vector( unsigned('0' & divcounter) + 1 ); -- increment the counter
if t(t'left)='1' then -- if (expected) overflow then toggle lCK
divcounter <= clkdiv;
lCK <= not(lCK);
else
divcounter <= t(divcounter'range); -- Just update the counter
end if;
end if;
end process;

This method divides by 2x(-clkdiv).
Without the 2x factor, it is impossible to work with clkdiv="00000",
and the High and Low durations are unequal when clkdiv is odd.
I use a count up, not of down, but the difference is marginal.

The point where the counter = 0 can also then be used to define the 'rising
edge of Spi_Sclk' state. So any place where you'd like to use
"rising_edge(Spi_Sclk)" you would instead use "Counter = 0". The same can
be done for the falling edge of Spi_Sclk; that point would occur when
Counter = Spi_Clock_Period / Cpu_Clock_Period/2.

Every flop in the design then is synchronously clocked by the Cpu_Clock,
there are no other clock domains therefore no clock domain crossings. The
counter is used as a divider to signal internally for when things have
reached a particular state.

I understand that well, as this is how i started my first design iteration
I soon reached some inherent limitations, however.

Particularly because of (slightly broken ?) tools that won't allow both a clock
enable AND a preset on the FF (even though the Actel cells in ProAsic3
have this capability). Synplicity infers the right cell, which is later
broken into 2 cells or more by the Actel backend :-? Maybe I missed
a restriction on the use of one of the signals, using a specific kind
of net or something like that (I hope).

As the RTL code grows, the synthesizer infers more and more stuffs,
often not foreseen, which leads to bloat. Muxes everywhere,
and duplicated logic cells that are necessary to drive higher fanouts.
I guess that this is because I focused more on the "expression"
of my need than on the actual result (but I was careful anyway).

I have split the design in 3 subparts (CPU interface,
clock divider/synch and emit/receive, a total of 7 processes
in a single architecture) and this needs < 140 cells
instead of the 180 cells in the first iteration.
And I use whatever clock I want or need.

I could upload the source code somewhere so others can
better understand my (fuzzy ?) descriptions.
I should finish the simulation first.

(I answer in the same post to stay on topic)
>
> No need for the master to resynchronize something that it generates itself
> (see my other post).

In fact, there IS a need to resynchronise the clock, even when
it is generated by the CPU, because of the divider.

Imagine (I'm picky here) that the CPU runs at 100MHz (my target)
and the slave at 100KHz (an imaginary old chip).
The data transfer is setup in the control register, then
the write to the data register triggers the transfer.
But this can happen at any time, whatever the value of the predivider's counter.
So the clock output may be toggled the first time well below
the required setup time of the slave. That's a glitch.

In this case, the solution is easy : reset the counter
whenever a transfer is requested. That's what i did too,
the first time.

but there is an even simpler solution : add a "clear" input condition
to the FF that are used to resynchronise the clocks as in
http://i.cmpnet.com/eedesign/2003/jun/mahmud3.jpg
so the next clock cycle will be well-formed, whether the
source is internal or external. The created delay is not an issue.
> I don't think you understand SPI. The master always generates the clock, it
> is up to the slave to synchronize to that clock. The master never has to
> synchronize to the SPI clock since it generates it.
I thought that too, until I read erratas of the chips i want to use.

A friend told me years ago : "Never read the datasheet before the errata".
An excellent advice, indeed.
> See my other post for the details, but basically you're making this harder
> than it need be.
Though sometimes there needs to be something
a bit more than the "theoretically practically enough".
> Since the master is generating the SPI clock it knows when
> it is about to switch the SPI clock from low to high or from high to low,
> there is no need for it to detect the actual SPI clock edge, it simply needs
> to generate output data and sample input data at the point that corresponds
> to where it is going to be switching the SPI clock.
This is what I did in the first design iteration.

However, now, i avoid large single-clock processes
because there is less control over what the synthesiser does.
My code now uses 7 processes (one clockless, just because
it's easier to code than in parallel statements) and fanout
and MUXes are OK.

Which goes back to the other thread : everybody has his own
idea of what is "good", "acceptable", "required"...
style and taste are difficult to discuss, and one rule
does not apply to ANY case :)


Finally, I have the impression that you misunderstood the initial post about "SPI clocking".
The idea was that the SPI master "could" sample MISO with the same (internal) clock signal
and edge that samples MOSI. The issue this would "solve" is when capacitance
and propagation delays on the PCB, along with relatively high clock speed
(the 25AA1024 by Microchip goes up to 20MHz) delay the MISO signal
enough to miss the normal clock edge.

regards,
yg
 
R

rickman

Hi !


As pointed in my previous post, there is at least one peripheral
(ENC28J60 revB4) that has clocking restrictions
(also know as "errata") and I happen to have some ready-to-use
modules equipped with this otherwise nice chip...

Can you be specific on the restrictions and how that relates to the
CPU clock?
I don't know if my chip revision is B4 and the errata
suggest using a clock between 8 and 10MHz.
However, it also suggest using the ENC28J60-provided 12.5MHz
output : I'm ready to add an external clock input in the master
if i'm allowed to "legally" go beyond the 10MHz rating
(a 25% bandwidth increase is always a good thing, particularly
with real-time communications).

When you talk about B4 is that the ENC28J60 chip?? How can it
recommend a 10 MHz max clock and also recommend using a 12.5 MHz
clock? I have to say you are using "it" and "my chip" in unclear
ways. Try being specific and not using pronouns.

As another "unintended case", an external clock input opens
the possibility to bit-bang data with some PC or uC.
I know it sounds stupid :) but I had a project 10 years
ago that would stream bootstrap code
to a DSP through the PC's parallel printer port.
ADi's SHARC had a boot mode where a DMA channel
loaded code from outside, and I had found a trick
to single-cycle the transfer with external circuits.
That's very handy for early software development,
more than flashing the boot memory all the time...
Now, if I can stream external boot code WITHOUT the
hassles of external circuitry (which was a pain
to develop without the test devices I have now),
that is an even better thing.

For me and in the intended application, that's enough
to justify another clock domain.
If I had no ready-to-use ENC28J60 mini-module,
I would not have bothered.

I still have no idea why you think you need two clock domains. Are
you saying that you can't pick a CPU clock rate that will allow an SPI
clock rate of 8 to 10 MHz? What CPU are you using?

They are indicated in the datasheet of each individual product.
And there is no "SPI standard" contrary to I2C or others.
(http://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus#Standards)
Some chips accept a falling CLK edge after CS goes low,
and some other chips don't (even chips by the same manufacturer vary).

Are you saying you can't find a suitable subset of SPI operation that
will work with both? Are you aware that you can operate the bus in
different mode when addressing different peripherals? That can be
handled in your FSM.

So i have read the datasheets of the chips i want to interface,
and adapted the master interface to their various needs (and errata).


I have (more or less) that already, which is active when the interal
CPU clock is selected. This is used when booting the CPU soft core
from an external SPI EEPROM.

Note however that your version does not allow to use the CPU clock at full speed,
what happens if you set your "max value" to "00000" ? And it does not garantee
that the high and low levels have equal durations.

But i'm sure that in practice, you will do much better
(and i still have a few range limitations in my the clock divider,
i'll have to add an optional prescaler).

Here is the current (yet perfectible) version :

clk, ... : in std_logic; -- the CPU clock
...
signal clkdiv, -- the frequency register
divcounter : std_logic_vector(4 downto 0); -- the actual counter
signal ... SPI_en, lCK, ... : std_logic;
begin
...

-- free-running clock divider
clock_process : process(SPI_en, clk)
variable t : std_logic_vector(5 downto 0); -- holds the carry bit without storing it
begin
-- no reset needed, synchro is done later
if (clk'event and clk='1' and SPI_en='1') then
t := std_logic_vector( unsigned('0' & divcounter) + 1 ); -- increment the counter
if t(t'left)='1' then -- if (expected) overflow then toggle lCK
divcounter <= clkdiv;
lCK <= not(lCK);
else
divcounter <= t(divcounter'range); -- Just update the counter
end if;
end if;
end process;

This method divides by 2x(-clkdiv).
Without the 2x factor, it is impossible to work with clkdiv="00000",
and the High and Low durations are unequal when clkdiv is odd.
I use a count up, not of down, but the difference is marginal.

If you want to divide by 0, you can use a mux to route the input clock
to the output instead of the divider. Restricting the divider value
to even values is reasonable, I seem to recall the 8053 working that
way in some modes. But that is just semantics.
I understand that well, as this is how i started my first design iteration
I soon reached some inherent limitations, however.

Particularly because of (slightly broken ?) tools that won't allow both a clock
enable AND a preset on the FF (even though the Actel cells in ProAsic3
have this capability). Synplicity infers the right cell, which is later
broken into 2 cells or more by the Actel backend :-? Maybe I missed
a restriction on the use of one of the signals, using a specific kind
of net or something like that (I hope).

There are reasons why the smaller logic companies are small.

As the RTL code grows, the synthesizer infers more and more stuffs,
often not foreseen, which leads to bloat. Muxes everywhere,
and duplicated logic cells that are necessary to drive higher fanouts.
I guess that this is because I focused more on the "expression"
of my need than on the actual result (but I was careful anyway).

I have split the design in 3 subparts (CPU interface,
clock divider/synch and emit/receive, a total of 7 processes
in a single architecture) and this needs < 140 cells
instead of the 180 cells in the first iteration.
And I use whatever clock I want or need.

I could upload the source code somewhere so others can
better understand my (fuzzy ?) descriptions.
I should finish the simulation first.

I can't say that I am following what you are doing. But if you are
using multiple clock domains that are larger than 1 FF in each
direction, I think you are doing it wrong.

I try to keep the entire chip design in a single clock domain if
possible. I seldom find it necessary to make exceptions for anything
other than the periphery.

I don't understand your "clocking restrictions". How does that mean
you can't use the CPU clock for the SPI interface? You are designing
the master, you can use any main clock you choose. I don't understand
your restrictions, but unless you *have* to use some specific
frequency to sync a PLL or something unusual on SPI, you can use your
CPU clock as the timing reference eliminating synchronization issues.

In fact, there IS a need to resynchronise the clock, even when
it is generated by the CPU, because of the divider.

Imagine (I'm picky here) that the CPU runs at 100MHz (my target)
and the slave at 100KHz (an imaginary old chip).
The data transfer is setup in the control register, then
the write to the data register triggers the transfer.
But this can happen at any time, whatever the value of the predivider's counter.
So the clock output may be toggled the first time well below
the required setup time of the slave. That's a glitch.

I don't know what you are saying here, but if it implies that you have
to do something to synchronize the SPI and CPU, you are doing it
wrong. Use the same clock for both and use clock enables instead of a
second clock domain.

In this case, the solution is easy : reset the counter
whenever a transfer is requested. That's what i did too,
the first time.

but there is an even simpler solution : add a "clear" input condition
to the FF that are used to resynchronise the clocks as inhttp://i.cmpnet.com/eedesign/2003/jun/mahmud3.jpg
so the next clock cycle will be well-formed, whether the
source is internal or external. The created delay is not an issue.

Why are you resyncing clocks? Use 1 clock.

I thought that too, until I read erratas of the chips i want to use.

A friend told me years ago : "Never read the datasheet before the errata".
An excellent advice, indeed.

You haven't explained yourself still. If you are using a chip that
does not act as a slave on the SPI bus, then maybe you shouldn't use
that part???

Are you saying that the slave chip you are using can not work with an
async master and that the master SPI interface *has* to use a clock
from the slave chip??? I've never seen that before.

Though sometimes there needs to be something
a bit more than the "theoretically practically enough".

No, actually. When you are working with read data, this is very
viable. The round trip timing is enough for an FPGA to receive the
data with sufficient hold time. Most FPGAs have I/O delays which can
be added to give a negative hold time assuring that that will work.
This also gives the maximum setup time. It is in the slave that you
can't depend on this because of the race condition between the write
data and the clock.

This is what I did in the first design iteration.

However, now, i avoid large single-clock processes
because there is less control over what the synthesiser does.
My code now uses 7 processes (one clockless, just because
it's easier to code than in parallel statements) and fanout
and MUXes are OK.

I have no idea what you mean by "single-clock processes". I think
what KJ is referring to is that if you want to provide additional hold
time for read data, you can sample the input data one fast clock
earlier than the SPI clock edge that changes the data. Likewise the
output data can be changed a bit later than the actual SPI clock edge,
not that it would be needed.

Which goes back to the other thread : everybody has his own
idea of what is "good", "acceptable", "required"...
style and taste are difficult to discuss, and one rule
does not apply to ANY case :)

Finally, I have the impression that you misunderstood the initial post about "SPI clocking".
The idea was that the SPI master "could" sample MISO with the same (internal) clock signal
and edge that samples MOSI. The issue this would "solve" is when capacitance
and propagation delays on the PCB, along with relatively high clock speed
(the 25AA1024 by Microchip goes up to 20MHz) delay the MISO signal
enough to miss the normal clock edge.

Even at 20 MHz, you would need to have (25 ns - master setup time) of
delay to cause a problem. That should still be a large amount of
delay and I would not expect any problem with normal boards. The
master has no downside to clocking the read data on either edge.

Rick
 
W

whygee

Hi !
Can you be specific on the restrictions and how that relates to the CPU clock?

apparently, there is probably an internal clock (synchro) isse.
http://ww1.microchip.com/downloads/en/DeviceDoc/80257d.pdf

"
1. Module: MAC Interface
When the SPI clock from the host microcontroller
is run at frequencies of less than 8 MHz, reading or
writing to the MAC registers may be unreliable.

Work around 1
Run the SPI at frequencies of at least 8 MHz.

Work around 2
Generate an SPI clock of 25/2 (12.5 MHz), 25/3
(8.333 MHz), 25/4 (6.25 MHz), 25/5 (5 MHz), etc.
and synchronize with the 25 MHz clock entering
OSC1 on the ENC28J60. This could potentially be
accomplished by feeding the same 25 MHz clock
into the ENC28J60 and host controller. Alterna-
tively, the host controller could potentially be
clocked off of the CLKOUT output of the
ENC28J60.
"

What is interesting is the 2nd workaround :
it implies that it is possible, when the clocks
are synchronized, to go faster than the 10MHz limit.
And it is not unsafe, considering that the latest parts
(not mine) are able of 20MHz SPI frequencies.

The ENC28J60 has a programmable external clock output
CLKOUT that i will program to generate 12.5MHz
and this will feed the SPI master that I have designed.

The 12.5MHz output works only when the the ENC28J60
is powered on and operating, so I need both internal
clocking from the CPU (for the startup sequence) and
the external clock (during transmission).

When you talk about B4 is that the ENC28J60 chip??
"ENC28J60 silicon rev. B4"
How can it
recommend a 10 MHz max clock and also recommend using a 12.5 MHz
clock?
Ask the manufacturer :)
but reading the errata sheet, it seems that the problem
comes when "bridging" two serial interfaces (the integrated MII
and the external SPI). The MII is clocked from the onboard 25MHz
but the external interface must accomodate "any" other frequency.
I have to say you are using "it" and "my chip" in unclear
ways. Try being specific and not using pronouns.
Sorry... I "understand myself", even though I have carefully
proofread the post before hitting "send".
I still have no idea why you think you need two clock domains.
I still don't understand why several people think I "must not" use 2 clocks.
I understand that "normally" this goes against common knowledge
but I see no limitation, legal reason or technical issue that
could prevent me from doing this (Usenet misunderstanding are not,
IMHO, "limitations" :p). Furthermore, the first simulation results
are very encouraging.
Are you saying that you can't pick a CPU clock rate that will allow an SPI
clock rate of 8 to 10 MHz?
No. I'm saying that if the manufacturer infers that it is possible
to go a bit faster, adding a clock input to a 208-pin FPGA is not
a serious issue at all. Furthermore, when done carefully,
some asynchronous designs are not that difficult.
I'm not speaking FIFOs here.
What CPU are you using?
http://yasep.org (not up to date)
This is a soft core that I am developing, based on ideas dating back to 2002.
It has now a configurable 16-bit or 32-bit wide datapath with quite dumb
but unusual RISC instructions. I am writing the VHDL code since
the start of the summer, when I got my Actel eval kit.
Between 2000 and 2002, I had some other intensive VHDL experience, with other tools.
Are you saying you can't find a suitable subset of SPI operation that
will work with both? Are you aware that you can operate the bus in
different mode when addressing different peripherals? That can be
handled in your FSM.
I will work in mode 0, though I have provision for mode 1.
I also manage 8-bit and 16-bit transfers easily, as well as clocks.
If you want to divide by 0, you can use a mux to route the input clock
to the output instead of the divider. Restricting the divider value
to even values is reasonable, I seem to recall the 8053 working that
way in some modes. But that is just semantics.

The div/2 from the internal CPU clock is fine in my case
because the CPU clock is quite high (64 or 100MHz).
I consider adding a few programable post or prescalers because
it's indeed very high. Otherwise, a bypass MUX could have been used too.
There are reasons why the smaller logic companies are small.
in your reasoning, are small companies small because they are small,
and big companies big because they are big ?
And if I mentioned another large company, what would you have answered ?

Now, the tool mismatch I mentioned is maybe a bug, maybe not.
But this did not prevent me from implementing my stuff,
once I understood why some tool added unexpected MUXes.

Furthermore, the more I use the A3Pxxx architecture, the more
I understand it and I don't find it stupid at all.
The "speed issue" seem to be because they use an older
silicon process than the "others". But the A3Pxxx are old,
there are newer versions (unfortunately under-represented,
overpriced and not well distributed). Once again,
commercial reasons, not technical.
I can't say that I am following what you are doing. But if you are
using multiple clock domains that are larger than 1 FF in each
direction, I think you are doing it wrong.
What do you mean by "clock domains that are larger than 1 FF in each direction" ?
What I do (at a clock domain boundary) is quite simple :
- the FF is clocked by the data source's clock (only)
- the FF is read (asynchronously) by the sink.
Because of the inherent higher-level handshakes (for example, the
receive register is read only when a status flag is asserted and
detected in software), there is no chance a metastability can
last long enough to cause a problem.

Maybe a drawing can help too here.
I try to keep the entire chip design in a single clock domain if
possible. I seldom find it necessary to make exceptions for anything
other than the periphery.
The rest of the design is clocked from a single source,
if that makes you feel better :)
The first big challenge I faced (when I started to adapt the YASEP
architecture to the limits of a FPGA) was to make the CPU core
run at the same frequency as the external memory.
The initial idea was that YASEP was decoupled from memories
through the use of cache blocks, but CAM (or multiple address
comparison logic) is too expensive in most FPGAs.
Once again, I adapted my design to the external constrains.
I don't understand your "clocking restrictions". How does that mean
you can't use the CPU clock for the SPI interface?
I can use the internal clock, but not all the time.
You are designing the master, you can use any main clock you choose. Sure.

> I don't understand your restrictions,
Let's say that now (considering the first results),
it's not a restriction, but added flexibility.
but unless you *have* to use some specific
frequency to sync a PLL or something unusual on SPI, you can use your
CPU clock as the timing reference eliminating synchronization issues.
And now, what if I "can" use something else ?

In fact, I see that it opens up new possibilities now.
Let's consider the CPU clock and the predivider :
with the CPU running 100MHz, the predivider can generate
50, 25, 16.66, 12.5... MHz. Now, the SPI memories can
reach 20MHz. 16.66MHz is fine, but my proto board has a 40MHz
oscillator (that drives the PLL generating the 100MHz).
All I have to do now is MUX the SPI clock input between
the 40MHz clock source and the CPU clock source.
All the deglitching is already working,
and I can run at top nominal speed.
I don't know what you are saying here, but if it implies that you have
to do something to synchronize the SPI and CPU, you are doing it
wrong. Use the same clock for both and use clock enables instead of a
second clock domain.
>

Why are you resyncing clocks? Use 1 clock.

Please reread carefully the above paragraphs :
The point is not an issue with clock domains (because there's obviously
only one in the initial example), I just wanted to show that
the predivider can create glitches too (from the slave point of view).

In the usual single-clock design, the predivider is reset when
a new transaction starts. This prevents very short pulses from
being generated ("short" means 1 CPU clock cycle or more, and
less than 1/2*SPIclock).
You haven't explained yourself still. If you are using a chip that
does not act as a slave on the SPI bus, then maybe you shouldn't use
that part???
reason 1) : I have the Ethernet chips presoldered already and the errata don't
prevent them from working. The conditions are a bit narrower than
what the DataSheet says, but I have coped with terrifyingly worse.
reason 2) : What's the point of a FPGA if I can't use its reconfigurability
features to adapt the design to the existing constrains ?
I'm not using a usual microcontroller here, I can work with as many clocks as I want,
implement all kinds of I/O protocols (tens or none if I desire).

So if the interface works, I can use the parts.
And if I can, I don't see why I shouldn't, particularly if I want to.
Are you saying that the slave chip you are using can not work with an
async master and that the master SPI interface *has* to use a clock
from the slave chip??? I've never seen that before.
Maybe you have not read enough errata ;-P

Seriously : sure, this is a stupid silicon bug.

I could have limited myself to 100/12=8.33MHz,
which according to the errata is fine.
With a "fixed" processor/chip, everybody would do that.

However, there are other possibilities and opportunities.
I am not using FPGAs to limit myself.
And I can spend some time to tune my design and add features.
Some FF here, a MUX there, and I have complete clocking freedom.
This, for example, turns a partly buggy Ethernet chip
into a well performing interface.
No, actually. When you are working with read data, this is very
viable. The round trip timing is enough for an FPGA to receive the
data with sufficient hold time. Most FPGAs have I/O delays which can
be added to give a negative hold time assuring that that will work.
This also gives the maximum setup time. It is in the slave that you
can't depend on this because of the race condition between the write
data and the clock.
OK

> I think
what KJ is referring to is that if you want to provide additional hold
time for read data, you can sample the input data one fast clock
earlier than the SPI clock edge that changes the data. Likewise the
output data can be changed a bit later than the actual SPI clock edge,
not that it would be needed.

It sounds fine to me.
Even at 20 MHz, you would need to have (25 ns - master setup time) of
delay to cause a problem. That should still be a large amount of
delay and I would not expect any problem with normal boards. The
master has no downside to clocking the read data on either edge.

OK.

I was just concerned by the fact that maybe, one of the SPI slaves
(a sensor) would be located a bit further from the FPGA than usual
(10 or 20cm). I could play with buffers and skew, but line capacitance
could have been a problem for the other (faster) slaves.

Thanks for the insights,
YG
 
K

KJ

whygee said:
Hi !


As pointed in my previous post, there is at least one peripheral
(ENC28J60 revB4) that has clocking restrictions
(also know as "errata") and I happen to have some ready-to-use
modules equipped with this otherwise nice chip...

It's always fun when someone refers to mystery stuff like "clocking
restrictions (also know as "errata")" instead of simply stating what they
are talking about. There is setup time (Tsu), hold time (Th), clock to
ouput (Tco), max frequency (Fmax). That suffices for nearly all timing
analysis although sometimes there are others as well such as minimum
frequency (Fmin), refresh cycle time, latency time, yadda, yadda, yadda. I
did a quick search for the erratta sheet and came up with...
http://ww1.microchip.com/downloads/en/DeviceDoc/80257d.pdf

In there is the following blurb which simply puts a minimum frequency
requirement of 8 MHz on your SPI controller design, nothing else. I'd go
with the work around #1 approach myself since it keeps the ultimate source
of the SPI clock at the master where it *should* be for a normal SPI system.

-- Start of relevant errata
1. Module: MAC Interface

When the SPI clock from the host microcontroller
is run at frequencies of less than 8 MHz, reading or
writing to the MAC registers may be unreliable.

Work around 1
Run the SPI at frequencies of at least 8 MHz.

Work around 2
Generate an SPI clock of 25/2 (12.5 MHz), 25/3
(8.333 MHz), 25/4 (6.25 MHz), 25/5 (5 MHz), etc.
and synchronize with the 25 MHz clock entering
OSC1 on the ENC28J60. This could potentially be
accomplished by feeding the same 25 MHz clock
into the ENC28J60 and host controller. Alternatively,
the host controller could potentially be
clocked off of the CLKOUT output of the
ENC28J60.
-- End of relevant errata

I don't know if my chip revision is B4 and the errata
suggest using a clock between 8 and 10MHz.
However, it also suggest using the ENC28J60-provided 12.5MHz
output :

Read it again. That suggestion was one possible work around, there is
nothing there to indicate that this is a preferred solution, just that it is
a solution.
I'm ready to add an external clock input in the master
if i'm allowed to "legally" go beyond the 10MHz rating
(a 25% bandwidth increase is always a good thing, particularly
with real-time communications).

You can run SPI at whatever clock frequeny you choose. What matters is
whether you meet the timing requirements of each of the devices on your SPI
bus. In this case, you have a minimum frequency clock requirement of 8 MHZ
when communicating with the ENC28J60. If you have other SPI devices on this
same bus, this clock frequency does not need to be used when communicating
with those devices...unless of course the ENC28J60 is expecting a free
running SPI clock, they don't mention it that way, but I'd be suspicious of
it. Many times SPI clock is stopped completely when no comms are ongoing
and Figures 4-4 and 4-4 of the datasheet seem to imply that the clock is
expected to stop for this device as well.
As another "unintended case", an external clock input opens
the possibility to bit-bang data with some PC or uC.
I know it sounds stupid :)

Many times that's the most cost effective approach since the 'cost' is 4
general purpose I/O pins that are usually available. In this case though,
maintaining an 8 MHz
They are indicated in the datasheet of each individual product.
And there is no "SPI standard" contrary to I2C or others.
( http://en.wikipedia.org/wiki/Serial_Peripheral_Interface_Bus#Standards )

Yes, all the more freedom you have.
Some chips accept a falling CLK edge after CS goes low,
and some other chips don't (even chips by the same manufacturer vary).

So i have read the datasheets of the chips i want to interface,
and adapted the master interface to their various needs (and errata).

Sounds good.
I have (more or less) that already, which is active when the interal
CPU clock is selected. This is used when booting the CPU soft core
from an external SPI EEPROM.

Note however that your version does not allow to use the CPU clock at full
speed,
what happens if you set your "max value" to "00000" ?

That's correct but I wouldn't set the max value to anything, it would be a
computed constant like this

constant Spi_Clks_Per_Cpu_Clk: positive range 2 to positive'high :=
Spi_Clk_Period / Cpu_Clk_Period.

Synthesis (and sim) would fail immediately if the two clock periods were the
same since that would result in 'Spi_Clks_Per_Cpu_Clk' coming out to be 1
which is outside of the defined range. Running SPI at the CPU speed is
rarely needed since the CPU typically runs much faster than the external SPI
bus. If that's not your case, then you've got a wimpy CPU, but in that
situation you wouldn't have a clock divider, and the data handling would be
done differently. This type of information though is generally known at
design time and is not some selectable option so if your CPU did run that
slow you wouldn't even bother to write code that would put in a divider so
your whole point of "what happens if you set your "max value" to "00000"" is
moot.
And it does not garantee
that the high and low levels have equal durations.

That's not usually a requirement either. If it is a requirement for some
particular application, then one can simply write a function to compute the
constant so that it comes out to be an even number. In the case of the
ENC28J60 the only specification (Table 16-6) on the SPI clock itself is that
it be in the range of DC to 20 MHz, with the errata then ammending that to
be 8 MHz min *while* writing to that device. It can still be at DC when not
accessing the device. In any case, there is no specific 'SPI clock high
time' or 'SPI clock low time' requirement for the device, so unless there is
some other errata there is no requirement for this device to have a 50% duty
cycle clock.
I understand that well, as this is how i started my first design iteration
I soon reached some inherent limitations, however.

I doubt those limitations were because of device requirements though...they
seem to be your own limitations. If not, then specify what those
limitations are. Just like with your previously mentioned "clocking
restrictions (also know as "errata")" comment I doubt that these
limitations are due to anything in the device requirements.
As the RTL code grows, the synthesizer infers more and more stuffs,
often not foreseen, which leads to bloat. Muxes everywhere,
and duplicated logic cells that are necessary to drive higher fanouts.
I guess that this is because I focused more on the "expression"
of my need than on the actual result (but I was careful anyway).

Don't write bloated code. Use the feedback you're seeing from running your
code through synthesis to sharpen your skills on how to write good
synthesizable code...there is no substitute for actual experience in gaining
knowledge.
In fact, there IS a need to resynchronise the clock, even when
it is generated by the CPU, because of the divider.

No there isn't. Everything is clocked by the high speed clock (the CPU
clock I presume). The counter being at a specific count value is all one
needs to know in order to sample the data at the proper time. Since the
master generates the SPI clock from the counter there is no need for it to
then *use* the SPI clock in any fashion. You could 'choose' to do so, but
it is not a requirement, it would mainly depend on how you transfer the
receive data back to the CPU, but I suspect either method would work just
fine...but again, that doesn't make it a requirement.
Imagine (I'm picky here) that the CPU runs at 100MHz (my target)
and the slave at 100KHz (an imaginary old chip).
The data transfer is setup in the control register, then
the write to the data register triggers the transfer.
But this can happen at any time, whatever the value of the predivider's
counter.
So the clock output may be toggled the first time well below
the required setup time of the slave. That's a glitch.

So don't write such bad code for a design. There is no need for the clock
divider to be running when you're not transmitting. It should sit at 0
until the CPU write comes along, then it would step through a 1000+ CPU
clock cycle state machine, with the first few clocks used for setting up
timing of data relative to chip select and the start of SPI clock. Then
there are a few CPU clocks on the back end for shutting off the chip select
and then of course the 1000 CPU clocks needed in order to generate the 100
kHz SPI clock itself. Any time the counter is greater than 0, the SPI
controller must be telling the CPU interface to 'wait' while it completes

You should make sure your design works in the above scenario as a test case.
In this case, the solution is easy : reset the counter
whenever a transfer is requested. That's what i did too,
the first time.

but there is an even simpler solution : add a "clear" input condition
to the FF that are used to resynchronise the clocks as in
http://i.cmpnet.com/eedesign/2003/jun/mahmud3.jpg
so the next clock cycle will be well-formed, whether the
source is internal or external. The created delay is not an issue.

You haven't guarded against the CPU coming in and attempting to start a
second write while the first one is ongoing. You need a handshake on the
CPU side to insert wait states while the controller is spitting out the
bits. When you look at it in that perspective and design it correctly,
there will be no chance of any glitchy clocks or anything else. If you
don't have a 'wait' signal back to the CPU, then certainly you have an
interrupt that you can use to send back to the CPU to indicate that it
fouled up by writing too quickly...many possible solutions.
Though sometimes there needs to be something
a bit more than the "theoretically practically enough".

I've done this, it's not a theoretical exercise on my part either. It's not
that hard.
This is what I did in the first design iteration.

However, now, i avoid large single-clock processes
because there is less control over what the synthesiser does.

That makes no sense.
Finally, I have the impression that you misunderstood the initial post
about "SPI clocking".
The idea was that the SPI master "could" sample MISO with the same
(internal) clock signal
and edge that samples MOSI. The issue this would "solve" is when
capacitance
and propagation delays on the PCB, along with relatively high clock speed
(the 25AA1024 by Microchip goes up to 20MHz) delay the MISO signal
enough to miss the normal clock edge.

Your proposed solution wouldn't solve anything. If you have a highly loaded
MISO this means you have a lot of loads (or the master and slave are
faaaaaaaar apart on separate boards). It also likely means you have a
highly loaded SPI clock since each slave device needs a clock. You likely
won't be able to find a driver capable of switching the SPI clock so that it
is monotonic at each of the loads (which is a requirement) which will force
you to split SPI clock into multiple drivers just to handle the electrical
load at switching...but now you've changed the topology so that trying to
feed back SPI clock to somehow compensate for delays will not be correct.
Far easier to simply sample MISO a tick or two later. For example, using
the 100 MHz/100kHz example you mentioned, the half way point would be 500,
but there is nothing to say that you can't sample it at 501, 502 or
whatever, it doesn't matter.

Kevin Jennings
 
W

whygee

Hello,

It's a bit scary that, according to my news reader, we posted
to the same newsgroup at the same time with a post that is
roughly the same size, stating with the same extract of a PDF.
As if we had nothing more constructive to do.

What is more scary is some posters' constant desire for
explanations and justifications of *personal choices*.
As if a technical choice was only a technical matter.
We are *all* biased, whether we realize it or not.
Our experiences differ and mature.
And of cource, because this is personal, nobody agrees.
Just like the other thread about coding styles :
there is so much freedom that everybody will
be contented by something that varies according the individuals.
And it's fine for me because I can do more new and great
things every year.

Freedom is wonderful and today's technology is empowering.
We are free to test, experiment, discover, learn and "invent".
So I'm getting a bit tired of "you should do that"
and "that's how it should be done". As if there ware two
kinds of engineers, those who "implement" and those who "innovate"
(often by trial and error).
And I believe that each practical case has distinctive aspects,
that make us reconsider what we know and how we apply our knowledge.
One is free to abide to strict rules, or not. For the rest,
read the standards (and find creative ways to exploit them).
And with YASEP ( http://yasep.org ), I have decided to
go completely wild, unusual and fun (and constructive).

Personally, for the application that has become the focus
of the latest post (SPI master), I have chosen to not be limited
by "what is usually done". I have used a bit of imagination,
confronted the idea to the technical possibilities and
made it work within 48h. The simulations have shown nothing
nasty, and I've learnt some tricks about asynchronous designs
(I wonder why it's so scary for most people, I'm not doing
any FIFO-style magic).

Does somebody need more justifications ?
Shall I quote some nation's constitution ?
I hope not, thank you.

Maybe my error was to think that more Usenet posters would
be open-minded, or at least curious, instead of rehashing
old techniques that I already know work. Now, I realize
that according to some people, pushing the envelope is
not desirable. I did not expect that adding an external
clock input to an otherwise inoffensive circuit would
get the reactions that I have seen. It's just one
stupid pin... Let's all use our 14K8 modems, while
we're at it.

I don't consider my SPI code as finished but I've seen what
I wanted to see, and I'm now looking at the cache memory
system. And once again, it's another occasion to look at
what others have done, what is practically possible, and
how things can be bent, adapted, transformed, twisted,
in order to exploit what is available to perform the
desired functions, and a bit more when the opportunity appears.
24h ago, I thought that it was not even possible to do the
internal cache system.
It's always fun when someone refers to mystery stuff like "clocking
restrictions (also know as "errata")" instead of simply stating what they
are talking about.

I have "fun" when I imagine something and implement it.
It's more fun when the thing is unusual, like usign a mechanism
to perform another useful function.
I have even more fun when it works as expected.
Oh, and it works. I guess I'm learning and getting better.

Concerning "stating what I was talking about" :
If anybody has to quote every single document about every matter,
then Usenet would become (more) unreadable.
We would need assistants to redact and analyse the posts.
It would be like being a lawyer... and Usenet would
be a courtroom (is it already ?)
So I tried to keep the post short (*sigh*) and
avoided (what I thought) "unecessary details".
There is setup time (Tsu), hold time (Th), clock to
ouput (Tco), max frequency (Fmax). That suffices for nearly all timing
analysis although sometimes there are others as well such as minimum
frequency (Fmin), refresh cycle time, latency time, yadda, yadda, yadda.

Sometimes it's so simple that we don't have to care, sometimes not.
I did a quick search for the erratta sheet and came up with...
http://ww1.microchip.com/downloads/en/DeviceDoc/80257d.pdf
bingo.

In there is the following blurb which simply puts a minimum frequency
requirement of 8 MHz on your SPI controller design, nothing else.

You see nothing else when I see an opportunity later.
It's a matter of taste, experience, and will of pushing the envelope.
You're not forced to agree with my choices, just as I'm not forced
to follow your advices. I didn't break the entropy principle or
the whole number theory rules. I just added a feature.
I'd go with the work around #1 approach myself since it keeps the ultimate source
of the SPI clock at the master where it *should* be for a normal SPI system.

Ok, that's a legitimate point of view.
But does this choice force me to, for example, clock the CPU
with different clocks source, *just* so that the SPI master
interface works in the same clock domain as the CPU ?
Let's see this as an exercise of thinking out of the bag.
-- Start of relevant errata
-- End of relevant errata


Read it again. That suggestion was one possible work around, there is
nothing there to indicate that this is a preferred solution, just that it is
a solution.

This sentence too is hurting.
When facing a choice, where should one go ?
It depends on your objectives, mindset, resources...
So you're basically telling me : don't look, this solution does not exist,
just because there is another one more reassuring (to you) just before.
If I thought like that, I would be some random clerk at some
boring office, not an independent guy earning his life
and his wife's "hacking" things. I would rehash proven things
and let people rule over me.
You can run SPI at whatever clock frequeny you choose. What matters is
whether you meet the timing requirements of each of the devices on your SPI
bus.
I'm also concerned by that too.
I expect some small buffers here and there.
Fortunately, this is much simpler than I2C :)
In this case, you have a minimum frequency clock requirement of 8 MHZ
when communicating with the ENC28J60. If you have other SPI devices on this
same bus, this clock frequency does not need to be used when communicating
with those devices...
of course.
unless of course the ENC28J60 is expecting a free
running SPI clock, they don't mention it that way, but I'd be suspicious of
it. Many times SPI clock is stopped completely when no comms are ongoing
and Figures 4-4 and 4-4 of the datasheet seem to imply that the clock is
expected to stop for this device as well.
obviously.


Many times that's the most cost effective approach since the 'cost' is 4
general purpose I/O pins that are usually available. In this case though,
maintaining an 8 MHz

hmm this seems to be unfinished, but let me try to continue your phrase :
"maintaining an 8MHz clock on a // port is difficult" (or something like that).
Of course ! that's all the point of having an external clock input !
Though it would be more natural to have a SPI slave instead of a SPI master.

I'll cut into this subject because for the specific purpose of communication
with a host, I intend to use another kind of parallel, synchronous protocol :
4 bits of data, 1 pulse strobe, 1 output enable, 1 reset, 1 "slave data ready".
With some crude software handshake, it's really easy to implement and use,
and 4x faster than SPI.
Yes, all the more freedom you have.

So I'd be lazy to not use it.
That's correct but I wouldn't set the max value to anything, it would be a
computed constant like this

constant Spi_Clks_Per_Cpu_Clk: positive range 2 to positive'high :=
Spi_Clk_Period / Cpu_Clk_Period.

As I need flexibility (the system I develop is also development platform for me),
the clock divider is programmable. I need to ensure that any combination
of configuration bits won't bork something.
Synthesis (and sim) would fail immediately if the two clock periods were the
same since that would result in 'Spi_Clks_Per_Cpu_Clk' coming out to be 1
which is outside of the defined range. Running SPI at the CPU speed is
rarely needed since the CPU typically runs much faster than the external SPI
bus. If that's not your case, then you've got a wimpy CPU, but in that
situation you wouldn't have a clock divider, and the data handling would be
done differently. This type of information though is generally known at
design time and is not some selectable option so if your CPU did run that
slow you wouldn't even bother to write code that would put in a divider so
your whole point of "what happens if you set your "max value" to "00000"" is
moot.
Nice try.

In my case, the SPI divider is programmable, and I expect to be able
to slow down the CPU (from 100 to maybe 20 or 10MHz) when it is not in use
(to conserve power, which is mostly sunk by the main parallel memory interface,
a problem that I am addressing currently).
That's not usually a requirement either.
At the highest frequencies, I assume that it could become a problem.
So since I have a decent frequency margin in the CPU, there's no use
having non-50% duty cycles, when the inherent divide-by-2 (of the code
that I copy-pasted in the previous post) solves the problem before it
appears.
If it is a requirement for some
particular application, then one can simply write a function to compute the
constant so that it comes out to be an even number. In the case of the
ENC28J60 the only specification (Table 16-6) on the SPI clock itself is that
it be in the range of DC to 20 MHz, with the errata then ammending that to
be 8 MHz min *while* writing to that device. It can still be at DC when not
accessing the device. In any case, there is no specific 'SPI clock high
time' or 'SPI clock low time' requirement for the device, so unless there is
some other errata there is no requirement for this device to have a 50% duty
cycle clock.

As pointed in another post about SPI edge sampling,
I was a bit worried that longer-than-expected wiring would cause trouble.
At 20MHz (the latest chips are that fast), I'm not willing to play with the duty cycles.
Sampling clock, maybe, but that would remain experimental.
Fortunately, my scope is fast enough for this case so in practice,
I would find the solution if a problem arises.
I doubt those limitations were because of device requirements though...they
seem to be your own limitations. If not, then specify what those
limitations are. Just like with your previously mentioned "clocking
restrictions (also know as "errata")" comment I doubt that these
limitations are due to anything in the device requirements.

I don't want to bloat this post (that probably nobody reads) so i'll
cut this useless issue too. Your doubts may be legitimate but they
are not my concern (which is now : cache memory).
Maybe I'll write a full report later, when things have settled and
become clear, in another thread.
Don't write bloated code.
I tried to keep the thing as bare as possible, of course.
Use the feedback you're seeing from running your
code through synthesis to sharpen your skills on how to write good
synthesizable code...there is no substitute for actual experience in gaining
knowledge.

Sure. I did that for the precedent project.
But in the end, it backfired as I did not think to try simulation.
I've learnt the lesson.
I have also tried to
No there isn't.
maybe I should have been more explicit.
I meant : resynchronise the output clock after the divider.
my fault.

So don't write such bad code for a design.
If I pointed to this case, then I also addressed it in the code.
There is no need for the clock
divider to be running when you're not transmitting.
your initial code did not mention this.
It should sit at 0
until the CPU write comes along, then it would step through a 1000+ CPU
clock cycle state machine, with the first few clocks used for setting up
timing of data relative to chip select and the start of SPI clock.
May I address another issue ?
alright, thanks.

CS should be controlled by software, at least with the slaves i intend to use.
Because only the master know how much bytes or half-word it wants.

Most slaves see the CS falling edge as the beginning of the transaction,
and rising edge as the end. Between those two edges, as many words as
desired can be transmitted. For the case of the ENC28J60 or the SPI EEPROM,
one can dump the whole memory in one go if needed.
Emitting a new command word for every new byte is pure overhead and loss.
Then
there are a few CPU clocks on the back end for shutting off the chip select
and then of course the 1000 CPU clocks needed in order to generate the 100
kHz SPI clock itself. Any time the counter is greater than 0, the SPI
controller must be telling the CPU interface to 'wait' while it completes

In the system I build, the CPU MUST poll a "RDY" flag in the control register
before starting a new command, that's the simple software protocol I defined.
If the CPU starts a new transaction when RDY is cleared, then this is a voluntary error,
not the interface's fault. The interface FSM will be reset and restarted
automatically, and the aborted transaction will be lost, that's all.
And this is defined as an error, that I expect will never happen in well
formed SW.
You should make sure your design works in the above scenario as a test case.

The first simulations are encouraging,
but the use of Modelsim is a bit tedious.
I'm learning.
You haven't guarded against the CPU coming in and attempting to start a
second write while the first one is ongoing.
I don't guard with HS because this is the SW's responsibility in this case.
You need a handshake on the
CPU side to insert wait states while the controller is spitting out the bits.
Ouch, you're tough, here !

In YASEP, there is a dedicated "Special Registers" area dedicated
to general HW configuration and peripherals like the SPI master.
It is mapped as 2 registers : 0 is control/state, 1 is data.
To poll the RDY bit takes 2 instructions and 6 bytes :

label_poll
GET SPI_CTL, R0
JO label_poll ; RDY is in bit 0, which makes R0 Odd when not ready.
When you look at it in that perspective and design it correctly,
there will be no chance of any glitchy clocks or anything else. If you
don't have a 'wait' signal back to the CPU, then certainly you have an
interrupt that you can use to send back to the CPU to indicate that it
fouled up by writing too quickly...many possible solutions.

It feels like i'm reading one of those old books I read when aged 14
about how to design a 6809-based system. Thank you Technology for the FPGA !

That makes no sense.

To you, it seems.

Breaking the thing into well defined sub-blocks
communicating with signals has not only made synthesis more predictable (IMHO, YMMV but IANAL)
but also simulation easier (I can't see how to enable visualisation
of variables with Modelsim). So to me, it does make some sense.

To quote another thread, I don't remember who said something like :
"I take the D in VHDL very seriously". While others concentrate on the 'L',
I "think" graphically with diagrams, blocks and signal arrows...
just like with paper. Even after I tried other ways,
I feel more comfortable this way and it gets the job done.
What more do I need ?

I have a certain way of doing things, like anybody else.
I won't tell them that they make no sense.
I'll just sit there and learn, and when I face a design challenge,
I'll weight the pros & cons of the solutions I already know,
and maybe even find a totally different solution.
Your proposed solution wouldn't solve anything. If you have a highly loaded
MISO this means you have a lot of loads (or the master and slave are
faaaaaaaar apart on separate boards). It also likely means you have a
highly loaded SPI clock since each slave device needs a clock. You likely
won't be able to find a driver capable of switching the SPI clock so that it
is monotonic at each of the loads (which is a requirement)
If the case appears, I have some 100s of 74-1G125 single-gate tristate buffers,
this could help. Or i can simply add other dedicated pins to the FPGA (1/slave).
Currently, I'll have 2 fast slaves and maybe one slow remote sensor that could
stand some signal margin (probably < 1MHz).

By the way, I have not been able to find the SPI Switching parameters
of ST's LIS3LV02DQ 3D accelerometer but the transmission speed of this sensor
is not critical, i'll try 100KHz if I can.
Far easier to simply sample MISO a tick or two later. For example, using
the 100 MHz/100kHz example you mentioned, the half way point would be 500,
but there is nothing to say that you can't sample it at 501, 502 or
whatever, it doesn't matter.

This more or less confirms what I was thinking.
It may be useful one day (as well as a good scope).

And now, let's see how I can create some cache memory
with just 3 512-byte reconfigurable memory blocks.
Kevin Jennings
YG
 
K

KJ

The simulations have shown nothing
nasty, and I've learnt some tricks about asynchronous designs

- Logic simulations do not find async design problems.
- Timing analysis will find async design problems.
(I wonder why it's so scary for most people, I'm not doing
any FIFO-style magic).

- Async design is not 'scary'
Maybe my error was to think that more Usenet posters would
be open-minded, or at least curious, instead of rehashing
old techniques that I already know work.

Maybe Usenet posters are not as narrow minded as you incorrectly seem
to believe them to be. Most people post questions or are looking for
better ways of doing something and the responses try to fulfill that
need.

Your posts on the other hand are long winded and only go to show how
wonderful and creative and refreshing you think you are...consider
using a blog instead.
Now, I realize
that according to some people, pushing the envelope is
not desirable.

If you think you've pushed any envelopes with your SPI
design...well...you haven't.
Ok, that's a legitimate point of view.
But does this choice force me to, for example, clock the CPU
with different clocks source, *just* so that the SPI master
interface works in the same clock domain as the CPU ?

No it does not force that at all.
I'm also concerned by that too.
I expect some small buffers here and there.
Fortunately, this is much simpler than I2C :)

You shouldn't need small buffers here and there.

Good luck on your design. You might also want to consider rethinking
the attitude you've shown in your postings and work more to make the
content to be more relevant to the members of this newsgroup...just a
suggestion, I know I'm done with 'whygee' postings for a while.

KJ
 
A

Andy

Hello,

It's a bit scary that, according to my news reader, we posted
to the same newsgroup at the same time with a post that is
roughly the same size, stating with the same extract of a PDF.
As if we had nothing more constructive to do.

What is more scary is some posters' constant desire for
explanations and justifications of *personal choices*.
As if a technical choice was only a technical matter.
We are *all* biased, whether we realize it or not.
Our experiences differ and mature.
And of cource, because this is personal, nobody agrees.
Just like the other thread about coding styles :
there is so much freedom that everybody will
be contented by something that varies according the individuals.
And it's fine for me because I can do more new and great
things every year.

Freedom is wonderful and today's technology is empowering.
We are free to test, experiment, discover, learn and "invent".
So I'm getting a bit tired of "you should do that"
and "that's how it should be done". As if there ware two
kinds of engineers, those who "implement" and those who "innovate"
(often by trial and error).
And I believe that each practical case has distinctive aspects,
that make us reconsider what we know and how we apply our knowledge.
One is free to abide to strict rules, or not. For the rest,
read the standards (and find creative ways to exploit them).
And with YASEP (http://yasep.org), I have decided to
go completely wild, unusual and fun (and constructive).

Personally, for the application that has become the focus
of the latest post (SPI master), I have chosen to not be limited
by "what is usually done". I have used a bit of imagination,
confronted the idea to the technical possibilities and
made it work within 48h. The simulations have shown nothing
nasty, and I've learnt some tricks about asynchronous designs
(I wonder why it's so scary for most people, I'm not doing
any FIFO-style magic).

Does somebody need more justifications ?
Shall I quote some nation's constitution ?
I hope not, thank you.

Maybe my error was to think that more Usenet posters would
be open-minded, or at least curious, instead of rehashing
old techniques that I already know work. Now, I realize
that according to some people, pushing the envelope is
not desirable. I did not expect that adding an external
clock input to an otherwise inoffensive circuit would
get the reactions that I have seen. It's just one
stupid pin... Let's all use our 14K8 modems, while
we're at it.

I don't consider my SPI code as finished but I've seen what
I wanted to see, and I'm now looking at the cache memory
system. And once again, it's another occasion to look at
what others have done, what is practically possible, and
how things can be bent, adapted, transformed, twisted,
in order to exploit what is available to perform the
desired functions, and a bit more when the opportunity appears.
24h ago, I thought that it was not even possible to do the
internal cache system.




I have "fun" when I imagine something and implement it.
It's more fun when the thing is unusual, like usign a mechanism
to perform another useful function.
I have even more fun when it works as expected.
Oh, and it works. I guess I'm learning and getting better.

Concerning "stating what I was talking about" :
If anybody has to quote every single document about every matter,
then Usenet would become (more) unreadable.
We would need assistants to redact and analyse the posts.
It would be like being a lawyer... and Usenet would
be a courtroom (is it already ?)
So I tried to keep the post short (*sigh*) and
avoided (what I thought) "unecessary details".


Sometimes it's so simple that we don't have to care, sometimes not.


You see nothing else when I see an opportunity later.
It's a matter of taste, experience, and will of pushing the envelope.
You're not forced to agree with my choices, just as I'm not forced
to follow your advices. I didn't break the entropy principle or
the whole number theory rules. I just added a feature.


Ok, that's a legitimate point of view.
But does this choice force me to, for example, clock the CPU
with different clocks source, *just* so that the SPI master
interface works in the same clock domain as the CPU ?
Let's see this as an exercise of thinking out of the bag.






This sentence too is hurting.
When facing a choice, where should one go ?
It depends on your objectives, mindset, resources...
So you're basically telling me : don't look, this solution does not exist,
just because there is another one more reassuring (to you) just before.
If I thought like that, I would be some random clerk at some
boring office, not an independent guy earning his life
and his wife's "hacking" things. I would rehash proven things
and let people rule over me.



I'm also concerned by that too.
I expect some small buffers here and there.
Fortunately, this is much simpler than I2C :)


of course.



hmm this seems to be unfinished, but let me try to continue your phrase :
"maintaining an 8MHz clock on a // port is difficult" (or something like that).
Of course ! that's all the point of having an external clock input !
Though it would be more natural to have a SPI slave instead of a SPI master.

I'll cut into this subject because for the specific purpose of communication
with a host, I intend to use another kind of parallel, synchronous protocol :
4 bits of data, 1 pulse strobe, 1 output enable, 1 reset, 1 "slave data ready".
With some crude software handshake, it's really easy to implement and use,
and 4x faster than SPI.


So I'd be lazy to not use it.



As I need flexibility (the system I develop is also development platform for me),
the clock divider is programmable. I need to ensure that any combination
of configuration bits won't bork something.


...

read more »

Don't ask questions to which you don't want to know the answers. Why
did you post here in the first place? You've received excellent,
extremely patiently provided advice. My advice to you is to take it.
Thanks is optional.

One major reason to avoid multiple clock domains when possible is that
simulation (RTL or full-timing) rarely reveals the problems inherent
in crossing clock domains. Static timing analysis does not reveal them
either. Experienced designers know to avoid problems they don't need.
If you think complex code is hard to debug, you should try debugging
behavior that is not repeatable in simulation at all.


Andy
 
R

rickman

Hi !


I still don't understand why several people think I "must not" use 2 clocks.
I understand that "normally" this goes against common knowledge
but I see no limitation, legal reason or technical issue that
could prevent me from doing this (Usenet misunderstanding are not,
IMHO, "limitations" :p). Furthermore, the first simulation results
are very encouraging.

I didn't say you "must not" use 2 clocks. It is just a PITA to use
more than one because of the synchronization issues. Often when the
system clock is significantly faster than the interface clock, the
interface clock can be used as a signal instead of a clock. The clock
can be sampled and the edge detected and used as an enable. It also
helps with the timing analysis a lot.
http://yasep.org(not up to date)
This is a soft core that I am developing, based on ideas dating back to 2002.
It has now a configurable 16-bit or 32-bit wide datapath with quite dumb
but unusual RISC instructions. I am writing the VHDL code since
the start of the summer, when I got my Actel eval kit.
Between 2000 and 2002, I had some other intensive VHDL experience, with other tools.



in your reasoning, are small companies small because they are small,
and big companies big because they are big ?
And if I mentioned another large company, what would you have answered ?

PLD companies spend more money on developing software than they do the
hardware. At least this was stated by Xilinx some 4 or 5 years ago...
the rising costs of cutting edge ICs may have changed this. The point
is that if they sell fewer chips, they have a smaller budget for the
software. Since the software requirements are the same if they sell 1
chip or a billion chips, they can obviously afford to develop better
software if they sell more chips.

Furthermore, the more I use the A3Pxxx architecture, the more
I understand it and I don't find it stupid at all.
The "speed issue" seem to be because they use an older
silicon process than the "others". But the A3Pxxx are old,
there are newer versions (unfortunately under-represented,
overpriced and not well distributed). Once again,
commercial reasons, not technical.

No one said their chips are "stupid". The point is that their tools
will be less developed. Personally I think it is entirely feasible to
develop tools that work in a different domain. I remember that the
Atmel 6000 parts had tools that facilitated a building block
approach. But this requires a significant investment in building up
your own library in a reusable manner and most designers just want to
"get it done". So the tools moved more and more to a push button
approach. The push button approach is also very appealing to a
beginner and so opens up the market to a much wider range of
developers. Obviously this is the route that was taken by logic
companies and the result is complex, expensive (for the chip maker)
tools.

What do you mean by "clock domains that are larger than 1 FF in each direction" ?

I mean for a serial interface, you only need one FF to synchronize
each input/output signal. This assumes you have a sufficient speed
difference in the I/O clock and the internal sys clock (> 2:1). Then
everything can be done in your sys clock domain and sync issues are
limited to just the I/O. This can be much more manageable than having
banks of logic with different clocks.

What I do (at a clock domain boundary) is quite simple :
- the FF is clocked by the data source's clock (only)
- the FF is read (asynchronously) by the sink.
Because of the inherent higher-level handshakes (for example, the
receive register is read only when a status flag is asserted and
detected in software), there is no chance a metastability can
last long enough to cause a problem.

There are other sync issues than metastability. I'm not saying it is
hard, I'm saying that everywhere you have this sort of interface, you
have to go through an analysis to verify that it will work under all
conditions. Again, it is the PITA factor that can be eliminated by
different means.

Rick
 
R

rickman

Don't ask questions to which you don't want to know the answers. Why
did you post here in the first place? You've received excellent,
extremely patiently provided advice. My advice to you is to take it.
Thanks is optional.

One major reason to avoid multiple clock domains when possible is that
simulation (RTL or full-timing) rarely reveals the problems inherent
in crossing clock domains. Static timing analysis does not reveal them
either. Experienced designers know to avoid problems they don't need.
If you think complex code is hard to debug, you should try debugging
behavior that is not repeatable in simulation at all.

Did you really need to quote his entire message?

I don't think it is appropriate to criticize the OP because he
received good advice but still wants to go his own way. It is
impossible (or at least very difficult) to articulate all the reasons
for doing something the way a designer wants. And sometimes it just
comes down to personal preference. I am not trying to tell the OP he
is wrong. I'm just trying to give him info to base his decision and
to make sure he understands what I have said (and that I understand
what he has said).

But the final decision is his and I will like to find out how it works
for him.

Rick
 
M

Mike Treseler

whygee said:
Freedom is wonderful and today's technology is empowering.
We are free to test, experiment, discover, learn and "invent".
So I'm getting a bit tired of "you should do that"
and "that's how it should be done". As if there ware two
kinds of engineers, those who "implement" and those who "innovate"
(often by trial and error).
And I believe that each practical case has distinctive aspects,
that make us reconsider what we know and how we apply our knowledge.
One is free to abide to strict rules, or not. For the rest,
read the standards (and find creative ways to exploit them).
And with YASEP ( http://yasep.org ), I have decided to
go completely wild, unusual and fun (and constructive).

Adventure and discovery are the upside of randomness.
Crashing on the rocks is the downside, but this also
provides the most memorable lesson.

Once I know where some of the rocks are,
I am inclined to steer around those next time
even if the water happens to be running higher.

-- Mike Treseler
 
A

Andy

Did you really need to quote his entire message?

I most humbly offer my sincerest apology for my wanton disregard for
usenet etiquette...

We are all free to accept or disregard advice we receive via usenet.

Andy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top