Best CPU platform(s) for FPGA synthesis

J

jjohnson

Yo, Adrian! ;) and Paul and everyone else, that's some great info and
is very much appreciated.

Since quartus_fit is dominating my runtime (EP2S180 and HC230), and
quartus_fit gains the most from extra CPUs, it makes sense for me to
go at least to 4 CPUs (I currently only have dual-processor boxes,
thus the need to go shopping). Do you know if the HardCopyII fitter
also makes use of multiple processors?

When Quartus does spawn jobs off to up to 4 processors, can each one
of those spawned jobs use up to 4GB?

In the case of Quartus supporting a max of 4 processors, at the very
least an 8-processor box would allow me to run two copies of Quartus
at the same time (e.g., different designs, or different flavors of the
same design). 8 processors on 64-bit Linux w/ 16GB of RAM with 32-bit
Quartus would seem to be a well-balanced setup if most Quartus jobs
remain under 2GB, correct?

Since memory access is such a big part of the overall runtime,
obviously the faster memory buses on newer machines will help. (Good
thing, because the clock speed difference along from an Opteron 250 to
a newer Opteron 2218 isn't much of an increase: 2.4GHz to 2.8GHz).

Since the databases for big chips get so large (and memory accesses
apparently so random), does a larger data cache buy you much? The L1
I&D caches are relatively small on both AMD and Intel, although
Opteron is 2x (64K Instr, 64K Data) larger than Intel's.

For the L2 cache, Intel's is 2x larger than AMDs on a per-core basis.
Since Intel shares two caches between neighboring cores (as you say
1&2 or 3&4 can share quickly, but slow from 1/3 and 2/4), whereas
Opterons have a dedicated cache per core, would Opterons see a speedup
from less contention for the cache, or a slowdown from having to go
outside the local caches in order to share data? (I guess a function
of how often the quartus_fit algorithms need to share data, right?)

If I were trying to run two Quartus jobs simultaneously on one 8-CPU
machine (with NUM_PARALLEL_CPUS = 4 for each run), I would expect
competition for external memory to be huge, and thus statistically
some benefit to Intel's larger cache. And with more "stuff" cached,
that the higher clock speeds on current Intel CPUs might give the
runtime advantage to Intel. On the other hand, AMD has the Direct
Connect Architecture and HyperTransport, so...

I know you vendor guys are reluctant to publish benchmark info, but
from the currently-available, mainstream, small-server perspective
with 8 processors, I'm kind of pushed toward the following CPU
choices:

4 dual-core Opteron 2218's (2.6 GHz, 90nm process, 2MB L2 cache as 1MB
dedicated per core )
4 dual-core Opteron 2220's (2.8 GHz, 90nm process, 2MB L2 cache as 1MB
dedicated per core )
4 dual-core Intel 5160's (3.0 GHz, 65nm process, 1333 MHz FSB, 4MB
shared L2 cache)
2 quad-core Intel X5355's (2.66 GHz, 65nm process, 1333 MHz FSB, 8MB
L2 cache, shared 4MB per core pair)

Of those, is there an obvious bang for the buck advantage (weighted
more toward bang than buck) for any one of those in particular?

-------
P.S. Those QX6850's are hard to come by; Dell's overclocked XPS720's
look sweet, but my company won't spring for overclocked boxes...


Thanks again, very very much!
 
W

Wei Wang

Did you changed the setting "use up to x number of CPUs" (don't remember
the exact name) somewhere in the project settings?

is there such a setting for xilinx ise as well?

thx, -wei
 
W

Wei Wang

If cost is no object, then go with the Intel quad-core running at 3
GHz : QX6850. Each core has 2 MB of L2 cache (8MB total), which is,
according to several reports in this forum, the single most important
factor.

I would say go with 4GB of ram, although if you're using the biggest
chips, you might need more. Keep in mind that Windows 32-bit will only
be able to use 3GB max of this 4 GB, and each application will only be
able to access 2GB max. So you might consider Windows 64 bits or Linux
64 bits if necessary.

Patrick

Why only 3GB max of 4GB? thanks, -Wei
 
M

MM

Hi Steve,

Could you give us (Xilinx users) some more detailed recommendations on what
would be the best platform to run ISE/EDK tools when working on midsize to
big designs? Tell us what you are using @ Xilinx? :)



Thanks,
/Mikhail
 
G

Guest

I can give you some general recommendations. For the best place and route
runtimes,
use a 64bit Linux system. If your design is small enough to fit into 4G of
memory
(LX110 or smaller), and you are not programming devices (the 32bit cable
drivers
don't work on a 64bit system), you can use the 32bit executables to save
memory.
Otherwise, go ahead and use the 64bit executables. They use more memory and
the runtime is simular.

As mentioned earlier, synthesis, map, place and route do not use
multithreading, so
you will not get an advantage using multiple processors for a single design.
However,
ProjNav is multithreaded so if you are doing different tasks, other
processors will
be used. In addition, upcoming software releases will use those processors.

Steve
 
E

Eric Smith

Steve said:
I can give you some general recommendations. For the best place and
route runtimes, use a 64bit Linux system. If your design is small
enough to fit into 4G of memory (LX110 or smaller), and you are not
programming devices (the 32bit cable drivers don't work on a 64bit
system), you can use the 32bit executables to save memory.
Otherwise, go ahead and use the 64bit executables. They use more
memory and the runtime is simular.

Note that it works just fine to install 32-bit ISE on a 64-bit Linux
system, and to install the 64-bit cable drivers.

In my experience, the open source user-space-only cable interface works
far better than the Xilinx-supplied cable drivers anyhow:

http://www.rmdir.de/~michael/xilinx/
 
M

MM

I can give you some general recommendations. For the best place and route
runtimes,
use a 64bit Linux system. If your design is small enough to fit into 4G of
memory
(LX110 or smaller), and you are not programming devices (the 32bit cable
drivers
don't work on a 64bit system), you can use the 32bit executables to save
memory.
Otherwise, go ahead and use the 64bit executables. They use more memory and
the runtime is simular.

Is there a 64-bit version of EDK ? If not, can I mix 64 bit ISE with 32 bit
EDK?

Thanks,
/Mikhail
 
W

Wei Wang

I can give you some general recommendations. For the best place and route
runtimes,
use a 64bit Linux system. If your design is small enough to fit into 4G of
memory
(LX110 or smaller), and you are not programming devices (the 32bit cable
drivers
don't work on a 64bit system), you can use the 32bit executables to save
memory.
Otherwise, go ahead and use the 64bit executables. They use more memory and
the runtime is simular.

As mentioned earlier, synthesis, map, place and route do not use
multithreading, so
you will not get an advantage using multiple processors for a single design.
However,
ProjNav is multithreaded so if you are doing different tasks, other
processors will
be used. In addition, upcoming software releases will use those processors.

Steve







- Show quoted text -

What I found was very interesting, it was taking me 12 hours to run
the MAP process before, but yesterday it only took me ~3 hours to run
MAP, and PAR only too took ~40 mins as well.

I was trying to figure out the reasons, then found in *.map *.mrp
files that there was always a map phase which took such a long time as
~10+ hours, and that phrase was always very memory hungry. I was using
Linux64 with 2GB real memory and 4GB swap memory, as I just found that
the real 2GB memory was much smaller than the required peak memory
10.6GB. Yesterday, I was running ISE9.1i for XC5VLX330 on another
Linux64 machine with 11G real memory and 8G swap memory, the there
wasn't any MAP phrase which took a ridiculous ~10+ hours.

Can Xilinx guys shed some more light on the runtime of MAP and PAR,
wrt different memory sizes and CPU cores?
 
P

Patrick Dubois

P.S. Those QX6850's are hard to come by; Dell's overclocked XPS720's
look sweet, but my company won't spring for overclocked boxes...

Polywell has some desktop computers with QX6850 available. Although
since you're looking at an 8-way workstation (!), QX6850 is probably
not an option. Polywell has AMD or Intel workstations with the CPUs
you're looking at as well.

For one socket, Intel clearly has the edge over AMD I think. For multi-
socket workstations/servers however, I'm not so sure. Benchmarks are
harder to find. I would suspect that the Hypertransport bus would help
AMD close the gap with Intel a little. Their integrated memory
controller probably helps as well in a multi-socket machine.

I searched for benchmarks for the newest 90-nm Opteron but couldn't
find any unfortunately...

Patrick
 
G

Guest

Can Xilinx guys shed some more light on the runtime of MAP and PAR,
wrt different memory sizes and CPU cores?
Even though our memory requirement table lists devices, memory is more
dependent on the design and the timing constraints. Since we can't predict
what is in your design, we just give you the typical and max numbers from
our collected test cases.

An example for constraints which will reduce memory is instead of creating
a bunch of individual from to timespecs, you can create timegroups with the
endpoints, then put one timespec on that.

Also, ISE 9.2i is getting an average of 27% improvement in memory
utilization.

I don't have any data regarding runtime of different CPU cores.

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top