speed performances / hardware / cpu

A

antoine

Hello,

I'm developing / supporting a java "client" application (running on PCs
with XP pro, jre 1.5) which is a high performance trading client. it
receives market updates, displays them on screen (swing), does a serie
of computation, and performs several actions based on computated values
(order sending, cancelation, etc...). it is designed to run for 8 hours
straight without interruption, does not access any database, only uses
socket-based I/O, and is correctly multi-threaded.

I'm looking at upgrading our workstations, to hopefully get a speed
increase. currently, our "base computation" routine takes around 5ms
average, and I'm looking at reducing this number (I'm also looking at
improving CODE performances, but this post is about hardware).

currently we're running on dual CPU intel Xeon 2.8GHz, roughly 3 years
old, with 1GB RAM. virtual memory usage is around 128MB, so I believe
RAM is not an issue.

which kind of upgrade would sound smart to you ? I've seen technologies
like:
- all the "dual core" family
- 64-bit architecture (although no JVM for intel on XP pro 64-bit)
- simply pushing the frequency to 3.6GHz...

does 64-bit make sense ? or is it only for memory intensive application
(we're more concerned with execution speed) ?

any insight or link to any informative page would be most welcome !

thanks

-Antoine
 
D

Daniel Pitts

antoine said:
Hello,

I'm developing / supporting a java "client" application (running on PCs
with XP pro, jre 1.5) which is a high performance trading client. it
receives market updates, displays them on screen (swing), does a serie
of computation, and performs several actions based on computated values
(order sending, cancelation, etc...). it is designed to run for 8 hours
straight without interruption, does not access any database, only uses
socket-based I/O, and is correctly multi-threaded.

I'm looking at upgrading our workstations, to hopefully get a speed
increase. currently, our "base computation" routine takes around 5ms
average, and I'm looking at reducing this number (I'm also looking at
improving CODE performances, but this post is about hardware).

currently we're running on dual CPU intel Xeon 2.8GHz, roughly 3 years
old, with 1GB RAM. virtual memory usage is around 128MB, so I believe
RAM is not an issue.

which kind of upgrade would sound smart to you ? I've seen technologies
like:
- all the "dual core" family
- 64-bit architecture (although no JVM for intel on XP pro 64-bit)
- simply pushing the frequency to 3.6GHz...

does 64-bit make sense ? or is it only for memory intensive application
(we're more concerned with execution speed) ?

any insight or link to any informative page would be most welcome !

thanks

-Antoine

Its hard to say without testing your particular code on the different
types of upgrades.
If your code can utilize multithreading effectively, I would probably
look into dual core (or quad processor) technologies.
There is a 64 bit JVM for Linux, and it might provide a boost.
Actually, just move to Linux may provide a slight boost.
Its really hard to say exactly what will be the most cost-effective
upgrade without actually testing results. I suggest you talk your
distributor into allowing you to run some custom benchmarks. Ofcourse,
the benchmarks should accurately reflect the profile of your
application.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

antoine said:
I'm developing / supporting a java "client" application (running on PCs
with XP pro, jre 1.5) which is a high performance trading client. it
receives market updates, displays them on screen (swing), does a serie
of computation, and performs several actions based on computated values
(order sending, cancelation, etc...). it is designed to run for 8 hours
straight without interruption, does not access any database, only uses
socket-based I/O, and is correctly multi-threaded.

I'm looking at upgrading our workstations, to hopefully get a speed
increase. currently, our "base computation" routine takes around 5ms
average, and I'm looking at reducing this number (I'm also looking at
improving CODE performances, but this post is about hardware).

currently we're running on dual CPU intel Xeon 2.8GHz, roughly 3 years
old, with 1GB RAM. virtual memory usage is around 128MB, so I believe
RAM is not an issue.

which kind of upgrade would sound smart to you ? I've seen technologies
like:
- all the "dual core" family
- 64-bit architecture (although no JVM for intel on XP pro 64-bit)
- simply pushing the frequency to 3.6GHz...

does 64-bit make sense ? or is it only for memory intensive application
(we're more concerned with execution speed) ?

any insight or link to any informative page would be most welcome !

I agree that more memory will probably not help.

Higher frequency will almost certainly help (3.6/2.8 is +28%, but it
is not certain that the GHz are divideable like that).

Switching to 64 bit in itself does increase calculation speed, but
because x86-64 has more registers than x86 it may actually give
some (like +10%).

If you can parallelize to 4 execution units then 2 dual core CPU's will
certainly give a huge jump compared with the current 2 single core
CPU's (like +80%).

If your app frequently access data in the few MB range, then the
extra L2 cache in newer CPU's may also help.

Arne
 
E

Eric Sosman

antoine said:
Hello,

I'm developing / supporting a java "client" application (running on PCs
with XP pro, jre 1.5) which is a high performance trading client. it
receives market updates, displays them on screen (swing), does a serie
of computation, and performs several actions based on computated values
(order sending, cancelation, etc...). it is designed to run for 8 hours
straight without interruption, does not access any database, only uses
socket-based I/O, and is correctly multi-threaded.

I'm looking at upgrading our workstations, to hopefully get a speed
increase. currently, our "base computation" routine takes around 5ms
average, and I'm looking at reducing this number (I'm also looking at
improving CODE performances, but this post is about hardware).

currently we're running on dual CPU intel Xeon 2.8GHz, roughly 3 years
old, with 1GB RAM. virtual memory usage is around 128MB, so I believe
RAM is not an issue.

which kind of upgrade would sound smart to you ? I've seen technologies
like:
- all the "dual core" family
- 64-bit architecture (although no JVM for intel on XP pro 64-bit)
- simply pushing the frequency to 3.6GHz...

does 64-bit make sense ? or is it only for memory intensive application
(we're more concerned with execution speed) ?

any insight or link to any informative page would be most welcome !

Not to be unduly harsh, but all these upgrades would be
STUPID! -- until you've measured what's happening on the current
hardware. You've already made a start by measuring the memory
usage, and that's good. Now measure the other components that
you might upgrade: How much CPU are you using, how much time do
you spend waiting for disk I/O, are you drowning in cache misses,
and so on and so on. If the real problem is (for example) a
network transaction, making the client machine faster just makes
it wait faster.

Continue as you have begun: Measure the consumption of the
different resources, and try to determine what is holding you
back. From your description there's an excellent chance that the
scarce resource is in fact CPU power -- but won't you feel silly
(and impoverished) if you spend a lot of money upgrading the CPUs
only to discover that the real bottleneck was the El Cheapo
graphics card?

Carpenters have a motto: "Measure twice, cut once." For some
reason, computerfolk seem resistant to that wisdom. Buck the trend,
and spend your money ("cut") only after you've measured. Be a
carpenter!
 
B

bugbear

antoine said:
Hello,

I'm developing / supporting a java "client" application (running on PCs
with XP pro, jre 1.5) which is a high performance trading client. it
receives market updates, displays them on screen (swing), does a serie
of computation, and performs several actions based on computated values
(order sending, cancelation, etc...). it is designed to run for 8 hours
straight without interruption, does not access any database, only uses
socket-based I/O, and is correctly multi-threaded.

I'm looking at upgrading our workstations, to hopefully get a speed
increase. currently, our "base computation" routine takes around 5ms
average, and I'm looking at reducing this number (I'm also looking at
improving CODE performances, but this post is about hardware).

If this is the "limiting path" (I assume you've profiled(*))
you could run at 200 FPS, which is faster than normal
screen refresh.

So I don't see your problem.

BugBear

(*) if you haven't profiled, you're wasting your time, and your employers's
hardware money.
 
N

Nigel Wade

antoine said:
Hello,

I'm developing / supporting a java "client" application (running on PCs
with XP pro, jre 1.5) which is a high performance trading client. it
receives market updates, displays them on screen (swing), does a serie
of computation, and performs several actions based on computated values
(order sending, cancelation, etc...). it is designed to run for 8 hours
straight without interruption, does not access any database, only uses
socket-based I/O, and is correctly multi-threaded.

I'm looking at upgrading our workstations, to hopefully get a speed
increase. currently, our "base computation" routine takes around 5ms
average, and I'm looking at reducing this number (I'm also looking at
improving CODE performances, but this post is about hardware).

currently we're running on dual CPU intel Xeon 2.8GHz, roughly 3 years
old, with 1GB RAM. virtual memory usage is around 128MB, so I believe
RAM is not an issue.

which kind of upgrade would sound smart to you ? I've seen technologies
like:
- all the "dual core" family
- 64-bit architecture (although no JVM for intel on XP pro 64-bit)
- simply pushing the frequency to 3.6GHz...

does 64-bit make sense ? or is it only for memory intensive application
(we're more concerned with execution speed) ?

any insight or link to any informative page would be most welcome !

thanks

-Antoine

Firstly I'd verify that the code is really taking advantage of dual CPUs and is
not wasting CPU cycles by having threads waiting on locks, or cache thrashing
due to each thread/CPU modifying the same data concurrently. Secondly, you need
to determine what is the major bottleneck in the current system. Is it CPU,
memory bandwidth, PCI/graphics latency, network latency etc. Until you know
this you have no idea where to spend your money effectively.

The only way to really know what is best is to actually run your code on various
systems. It's not just about pure raw CPU cycles, GHz etc. It's also about the
support chipsets, how well the motherboard is put together, how well the multi
processors can manage cache coherency, and other very esoteric hardware issues.
 
T

Tris Orendorff

If this is the "limiting path" (I assume you've profiled(*))
you could run at 200 FPS, which is faster than normal
screen refresh.

So I don't see your problem.

You are assuming only one stock is beiong monitored. If 200 are being
monitered, at 5ms each, then 1hz may be too slow. Note: I am assuming the
base computation is for 1 stock and the times for multiples are simply
added.
(*) if you haven't profiled, you're wasting your time, and your employers's
hardware money.

Agreed! You must profile before you can make proper decisions.
 
A

antoine

- I've profiled, and methods seem to be running effectively, I have not
one in particular taking up most of the processing (well, I do, but
it's an expected behavior, and it's not 99.999% against 0.001% for the
other processes). no huge memory waste anywhere. the goal here would be
to improve core speed.

- there are around 100 financial instruments monitored concurrently.
that means market prices (not updated all at the same time, as those
are callbacks from the "market"), but also theoretical variables
computed locally depending on other market values (themselves updated
through callbacks).

- basically it works like this: one specific update in market triggers
callback, that triggers global recomputation, variables are tested,
then may or may not trigger the sending of an order. I'm trying to
reduce the time between the reception of the update and the release of
the new order (it's currently between 2 and 8ms)

- I'm working on improvements on the code side, but feel I've reached a
limit. last time I changed my hardware was 2.5 years ago, and even
though Murphy's not been feeling too well recently, I still believe
there's some good to be taken by upgrading hardware. just trying to
figure out on which element I should focus the most. for the moment it
appears CPU & GPU number & clocking could help, but hey, maybe simply
"getting a regular, faster machine" will do the trick :)

thanks for everybody's comments...

-Antoine
 
E

Eric Sosman

antoine said:
- I've profiled, and methods seem to be running effectively, I have not
one in particular taking up most of the processing (well, I do, but
it's an expected behavior, and it's not 99.999% against 0.001% for the
other processes). no huge memory waste anywhere. the goal here would be
to improve core speed.

Something still puzzles me: If the CPU is not 100% utilized,
or very nearly so, it follows that something else is retarding
the progress of your code. I/O, memory, internecine contention
for locks, you name it: But if there's spare CPU capacity lying
about unused, speeding up the CPU just leads to spending even
more time in the idle loop.

But perhaps I've misunderstood your description of what's
been measured. (You're clearly measuring things, and that's good,
but I'm not confident that I've brasped the measurements.)
- there are around 100 financial instruments monitored concurrently.
that means market prices (not updated all at the same time, as those
are callbacks from the "market"), but also theoretical variables
computed locally depending on other market values (themselves updated
through callbacks).

- basically it works like this: one specific update in market triggers
callback, that triggers global recomputation, variables are tested,
then may or may not trigger the sending of an order. I'm trying to
reduce the time between the reception of the update and the release of
the new order (it's currently between 2 and 8ms)

Knowing nothing about how your application is organized, and
at the risk of making an even greater fool of myself: Is a "global
recomputation" necessary? Consider the lowly spreadsheet: it toils
not, neither doth it spin, yet it's smart enough to react to a change
in cell C6 by recomputing only those other cells for which C6 is an
input, and those further cells that depend on the recomputed cells,
and so on. Perhaps you could keep track of these dependencies and
replace a "global recomputation" with a "local recomputation." Might
a tree of dependencies reduce the workload?
- I'm working on improvements on the code side, but feel I've reached a
limit. last time I changed my hardware was 2.5 years ago, and even
though Murphy's not been feeling too well recently, I still believe
there's some good to be taken by upgrading hardware. just trying to
figure out on which element I should focus the most. for the moment it
appears CPU & GPU number & clocking could help, but hey, maybe simply
"getting a regular, faster machine" will do the trick :)

It might. In fact, it probably will improve matters to some
extent. But to what extent? 500% better, or just 50%, or merely 5%?
An informed economic decision requires that you have some idea of how
much bang your buck will buy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top