Very fast counter in VirtexII

Marty Ryba · Feb 21, 2009

Hi gang,

I have an idea for a tweak of my FPGA design that involves essentially
building a time interval counter. I found that there are some IP cores out
there that get as much as 100ps resolution, but before I go that route I
want to experiment with something "free" first, especially since I don't
need any bells and whistles like embedded bus protocols or programmable
timers. Neither of the signals I want to time between are synchronous with
my main clock, so I'm thinking of generating a new DCM just for this purpose
(I think I have a few left in my XC2V6000-5). Otherwise my fastest clock is
either 133 MHz or maybe 204.8 MHz coming from an outside clock chip (I might
be able to goose it to 409.6 MHz).

My question is there any good "how to" on writing a counter so that it runs
at a maximum clock rate for my chip? I perused the Xilinx site, and there
were some very old articles on fast counters in antique chip architectures;
they provide OrCAD macros(?); not even VHDL.

So, do I just naively code the counter and pray that synthesis does the
right things (I don't need a huge number of bits; my maximum time interval
is maybe 80 ns), or are there some tricks needed to get optimum clock speed
(what could I rationally expect in this FPGA?)?

Thanks for your help,

Marty

Allan Herriman · Feb 21, 2009

Hi gang,

I have an idea for a tweak of my FPGA design that involves
essentially
building a time interval counter. I found that there are some IP cores
out there that get as much as 100ps resolution, but before I go that
route I want to experiment with something "free" first, especially
since I don't need any bells and whistles like embedded bus protocols
or programmable timers. Neither of the signals I want to time between
are synchronous with my main clock, so I'm thinking of generating a
new DCM just for this purpose (I think I have a few left in my
XC2V6000-5). Otherwise my fastest clock is either 133 MHz or maybe
204.8 MHz coming from an outside clock chip (I might be able to goose
it to 409.6 MHz).

My question is there any good "how to" on writing a counter so that it
runs at a maximum clock rate for my chip? I perused the Xilinx site,
and there were some very old articles on fast counters in antique chip
architectures; they provide OrCAD macros(?); not even VHDL.

So, do I just naively code the counter and pray that synthesis does
the right things (I don't need a huge number of bits; my maximum time
interval is maybe 80 ns), or are there some tricks needed to get
optimum clock speed (what could I rationally expect in this FPGA?)?

The "naive counter" has no chance of giving you a resolution of better
than a ns with current FPGA technology.

I'm guessing that most of the IP cores that achieve better than 1ns
resolution do so by using a wider bus at a lower clock rate, e.g. a 100
bit bus at 100MHz. You use logic to locate the bit position on the bus
where a transition occurs. Each bit position in this contrived example
represents 100ps, and each word represents 10ns.

There are two basic ways of turning your 1 bit test signal into a wider
bus:

1. Use a SERDES. Most modern (larger) FPGAs have these built in, either
as a true transceiver (with PLLs and CDR, etc.), or as a simple SERDES in
to the IOB. The most recent FPGAs have on-board SERDES blocks that can
sample at 100ps intervals.

2. Use a (different) phase delay for each of the bits, and sample them
all with the word clock. This has the advantage that the word clock is
the highest frequency you need, however getting the phase delays right in
an FPGA might be tricky. (This method is better suited to ASIC
implementations.)

There are some tricks you can use that will get you part way to your
goal:
- Use both clock edges for sampling. This gives you a 2x speedup (but
requires a 50% duty cycle clock).
- Use multiple phases from a DCM or PLL. This can give you a 4x
speedup.

Regards,
Allan

-jg · Feb 21, 2009

Hi gang,

I have an idea for a tweak of my FPGA design that involves essentially
building a time interval counter. I found that there are some IP cores out
there that get as much as 100ps resolution, but before I go that route I
want to experiment with something "free" first, especially since I don't
need any bells and whistles like embedded bus protocols or programmable
timers. Neither of the signals I want to time between are synchronous with
my main clock

Your title says fast counter, but the text says time interval.
They are not quite the same thing.

If you want to do precise interval timing, then multi-phase capture,
and/or
delay line capture will give you time-domain precisions above the
clock frequency.

What time-precision do you actually need ?
eg 250MHz with 4 phases, resolves to 1ns

I think I read the some of the very newest FPGAs can self-calibrate
their
delay lines, which saves you the trouble

-jg

Michael Brown · Feb 21, 2009

Note: Since Optus can't figure out how to run a Newsgroup server, the
original post hasn't appeared for me ...

Marty Ryba wrote in

What sort of resolution and dead time do you need? If you're willing to do a
bit of legwork with manual place'n'route, and consume a fair bit of
resources, you can get in the order of 10 ps or so resolution and accuracy
at the cost of 10's of nanoseconds of dead time. See:
http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC.ppt
They use an Altera Cyclone II, but I've implemented a similar thing on a
Spartan 3E with reasonable success. I don't have good enough testing
apparatus to properly measure the resolution and accuracy though. And at the
10 ps level, you've got to think a bit about what's on the outside of the
FPGA too ...

The main downside to the Xilinx parts for this purpose is that you've only
got 4 elements on the carry chain per block, as opposed to 8 in the Altera.
You can also tweak out the dead time by throwing more resources at it
(basically loop the end of the carry chain around to the start where it xors
with the input, then do edge detection along the whole buffer and track the
edges). Of course, this is so far beyond the point of "supported" that using
it in a commercial project is debatable, but it's certainly a fun thing to
play with.

gabor · Feb 22, 2009

Your title says fast counter, but the text says time interval.
They are not quite the same thing.

If you want to do precise interval timing, then multi-phase capture,
and/or
delay line capture will give you time-domain precisions above the
clock frequency.

What time-precision do you actually need ?
eg 250MHz with 4 phases, resolves to 1ns

I think I read the some of the very newest FPGAs can self-calibrate
their
delay lines, which saves you the trouble

-jg

Virtex 2 doesn't have these structures. However I remember seeing
appnotes using carry chains as delay elements. You basically
run your input into the carry chain and then have a flip-flop
at each stage in the chain all running on the master clock.
Ideally your output would look like "1110000000" for a single
transition, allowing you to interpolate between clock cycles.
I think the original appnote was for a serdes using Virtex E,
and the carry chain delay was used for phase adjustment
without the IDELAY elements available in the newer parts.

Regards,
Gabor

goouse · Feb 23, 2009

Hi gang,

I have an idea for a tweak of my FPGA design that involves essentially
building a time interval counter. I found that there are some IP cores out
there that get as much as 100ps resolution, but before I go that route I
want to experiment with something "free" first, especially since I don't
need any bells and whistles like embedded bus protocols or programmable
timers. Neither of the signals I want to time between are synchronous with
my main clock, so I'm thinking of generating a new DCM just for this purpose
(I think I have a few left in my XC2V6000-5). Otherwise my fastest clock is
either 133 MHz or maybe 204.8 MHz coming from an outside clock chip (I might
be able to goose it to 409.6 MHz).

My question is there any good "how to" on writing a counter so that it runs
at a maximum clock rate for my chip? I perused the Xilinx site, and there
were some very old articles on fast counters in antique chip architectures;
they provide OrCAD macros(?); not even VHDL.

So, do I just naively code the counter and pray that synthesis does the
right things (I don't need a huge number of bits; my maximum time interval
is maybe 80 ns), or are there some tricks needed to get optimum clock speed
(what could I rationally expect in this FPGA?)?

Thanks for your help,

Marty

Hi Marty,
the general way for fast design is reducing combinatorical logic. For
counters (or FSMs) that means using shift register based designs.
Depending on the number of clock cycles you want to count you can
either design a simple FSM and use OneStateHot encoding, or build a
Johnson counter with a small ripple generator (e.g. a edge detecting
monoflop) or use a LFSR structure.
All of these give you maximum speed.

Have a nice synthesis
Eilert

jprovidenza · Feb 23, 2009

Note: Since Optus can't figure out how to run a Newsgroup server, the
original post hasn't appeared for me ...

What sort of resolution and dead time do you need? If you're willing to do a
bit of legwork with manual place'n'route, and consume a fair bit of
resources, you can get in the order of 10 ps or so resolution and accuracy
at the cost of 10's of nanoseconds of dead time. See:http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/WaveletTDC.ppt
They use an Altera Cyclone II, but I've implemented a similar thing on a
Spartan 3E with reasonable success. I don't have good enough testing
apparatus to properly measure the resolution and accuracy though. And at the
10 ps level, you've got to think a bit about what's on the outside of the
FPGA too ...

The main downside to the Xilinx parts for this purpose is that you've only
got 4 elements on the carry chain per block, as opposed to 8 in the Altera.
You can also tweak out the dead time by throwing more resources at it
(basically loop the end of the carry chain around to the start where it xors
with the input, then do edge detection along the whole buffer and track the
edges). Of course, this is so far beyond the point of "supported" that using
it in a commercial project is debatable, but it's certainly a fun thing to
play with.

Michael -

Would you be willing to share your design? I'm curious to see what is
involved.

John Providenza

Marty Ryba · Feb 24, 2009

Your title says fast counter, but the text says time interval.
They are not quite the same thing.
If you want to do precise interval timing, then multi-phase capture,
and/or
delay line capture will give you time-domain precisions above the
clock frequency.
What time-precision do you actually need ?
eg 250MHz with 4 phases, resolves to 1ns

Thanks for the useful tips; it seems the primary approach is to "stretch"
the signals of interest into fast elements like shift registers and/or carry
chains, and then count these up at some leisure later (how?). That sounds
like it takes a lot of resources (e.g., 16 ticks per slice if I use a
SRL16E). This could explain why some of the papers I've glanced at seem to
take pretty much an entire chip to make a couple of these high-end delay
measuring devices. For now, since it seems feasible to run a small (8 bit)
counter at 204.8 MHz, I'll try that route. 4.883 ns of precision is about
1.5 meters when you multiply by c, so that's still useful to me. Once I get
the basic structure figured out I can look at speeding it up. Today I got
the input logic figured out (what signal is my start condition, and what is
my stop). Since I'm using these signals to calibrate out differences between
identical bitstreams on separated boards inside a common chassis, the
differential delays inside the logic should mostly wash out.

One "newbie" question: I notice you can't use an output pin signal to drive
internal logic (at least Modelsim barfs on it). I ended up for now declaring
a signal and copying some of the code that generates that output pin to
generate my signal as well. Is there a "smarter" way?

Thanks again,

Marty

Ken Cecka · Feb 24, 2009

Marty said:
Thanks for the useful tips; it seems the primary approach is to "stretch"
the signals of interest into fast elements like shift registers and/or
carry chains, and then count these up at some leisure later (how?). That
sounds like it takes a lot of resources (e.g., 16 ticks per slice if I use
a SRL16E). This could explain why some of the papers I've glanced at seem
to take pretty much an entire chip to make a couple of these high-end
delay measuring devices. For now, since it seems feasible to run a small
(8 bit) counter at 204.8 MHz, I'll try that route. 4.883 ns of precision
is about 1.5 meters when you multiply by c, so that's still useful to me.
Once I get the basic structure figured out I can look at speeding it up.
Today I got the input logic figured out (what signal is my start
condition, and what is my stop). Since I'm using these signals to
calibrate out differences between identical bitstreams on separated boards
inside a common chassis, the differential delays inside the logic should
mostly wash out.

One "newbie" question: I notice you can't use an output pin signal to
drive internal logic (at least Modelsim barfs on it). I ended up for now
declaring a signal and copying some of the code that generates that output
pin to generate my signal as well. Is there a "smarter" way?

You shouldn't need to duplicate an logic; just create an intermediate signal.

For example this code would fail:

ENTITY divider IS
PORT
(
clk : IN STD_LOGIC;
div : OUT STD_LOGIC
);
END divider;
ARCHITECTURE model OF top IS
BEGIN
PROCES (clk)
BEGIN
IF (clk'EVENT) and (clk = '1') THEN
div <= NOT div;
END IF;
END PROCESS;
END;

But can be fixed by using an intermediate signal:

ENTITY divider IS
PORT
(
clk : IN STD_LOGIC;
div_o : OUT STD_LOGIC
);
END divider;
ARCHITECTURE model OF top IS
SIGNAL div : STD_LOGIC;
BEGIN
PROCES (clk)
BEGIN
IF (clk'EVENT) and (clk = '1') THEN
div <= NOT div;
END IF;
END PROCESS;
div_o <= div;
END;

Ken

OutputLogic · May 22, 2009

Online LFSR Counter generator

There is an online tool that generates a Verilog code for LFSR Counters of any value up to 31-bit wide. It's on "http OutputLogic dot com" [sorry, this site doesn't let me post a link in a regular way]
LFSR counters are much smaller than regular ones, therefore can run faster. The catch is that they count only to a predefined value.

Hope that helps

JohnDuq · May 29, 2009

I like Ken's solution of an intermediate signal for accessing the output pin.

Changing the port definition from OUT to INOUT is an option for some devices too.

Very fast PWM in Cyclone III FPGA	2	Apr 28, 2011
Optimizing an inferred counter	15	Mar 19, 2008
Calculating Pulse per minute in a FPGA	6	May 28, 2013
fast universal compression scheme and its implementation in VHDL	2	Aug 29, 2005
VHDL help.	4	Feb 24, 2009
Counter Intuitive Results: Optimising an FFT routine for Speed	2	Oct 8, 2003
Access to SDRAM on Altera Cyclone dev kit - compactflash controller	0	Dec 20, 2004
XML Newbie needing some serious help..	2	May 20, 2005

Very fast counter in VirtexII

Marty Ryba

Allan Herriman

-jg

Michael Brown

gabor

goouse

jprovidenza

Marty Ryba

Ken Cecka

OutputLogic

JohnDuq

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads