Increase WinXP/jre CPU usage?

S

Steve Brecher

I have a compute-intensive program with a simple console user interface.
While the program is running (number crunching), WinXP's Task Manager's CPU
usage for it never goes above 50%. I'd like to use the other half of my CPU
:) I've tried, via a separate SetPriority utility, setting the java (JVM)
process priority to 256 (max; real time) and all of its threads' priorities
to 15 (max). This causes the JVM's Base Prio entry in Task Manager to
become Real Time -- but CPU usage remains at 50%.

The code that is running does no I/O, only calculation.
 
T

Thomas Kellerer

Steve Brecher wrote on 14.11.2006 00:47:
I have a compute-intensive program with a simple console user interface.
While the program is running (number crunching), WinXP's Task Manager's CPU
usage for it never goes above 50%. I'd like to use the other half of my CPU
:) I've tried, via a separate SetPriority utility, setting the java (JVM)
process priority to 256 (max; real time) and all of its threads' priorities
to 15 (max). This causes the JVM's Base Prio entry in Task Manager to
become Real Time -- but CPU usage remains at 50%.

The code that is running does no I/O, only calculation.
Do you happen to have a dual processor/dual core computer? If so, then I suspect
your calculation is running in a single thread only which will not make use of
the second processor, and thus your overal CPU load will not exceed 50%

Thomas
 
P

Patricia Shanahan

Steve said:
I have a compute-intensive program with a simple console user interface.
While the program is running (number crunching), WinXP's Task Manager's CPU
usage for it never goes above 50%. I'd like to use the other half of my CPU
:) I've tried, via a separate SetPriority utility, setting the java (JVM)
process priority to 256 (max; real time) and all of its threads' priorities
to 15 (max). This causes the JVM's Base Prio entry in Task Manager to
become Real Time -- but CPU usage remains at 50%.

The code that is running does no I/O, only calculation.

Is there any possibility that you have a dual processor, possibly two
cores in one chip?

Utilization freezing at close to 50%, even at very high priority, for a
compute intensive job is typical of running a single threaded
application on a dual processor.

If that is what is going on, you should be able to run two copies of the
job (if it does not use too much memory) at the same time almost as fast
as one copy. If so, look at parallelizing the compute-bound portion of
the job.

What dominates the computation? Some algorithms are easier to
parallelize than others.

Patricia
 
S

Steve Brecher

Thomas Kellerer said:
Steve Brecher wrote on 14.11.2006 00:47:
Do you happen to have a dual processor/dual core computer? If so,
then I suspect your calculation is running in a single thread only
which will not make use of the second processor, and thus your overal
CPU load will not exceed 50%

(Thanks also to Patricia, who responded similarly.)

It's a Pentium 4 (3.4G) vintage early 2004. If it's dual, I never knew it!
Might it be?

The calculation is definitely single-thread.
 
L

Luc The Perverse

Steve Brecher said:
I have a compute-intensive program with a simple console user interface.
While the program is running (number crunching), WinXP's Task Manager's CPU
usage for it never goes above 50%. I'd like to use the other half of my
CPU :) I've tried, via a separate SetPriority utility, setting the java
(JVM) process priority to 256 (max; real time) and all of its threads'
priorities to 15 (max). This causes the JVM's Base Prio entry in Task
Manager to become Real Time -- but CPU usage remains at 50%.

The code that is running does no I/O, only calculation.

Do you by chance have a dual core system?
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Steve said:
(Thanks also to Patricia, who responded similarly.)

It's a Pentium 4 (3.4G) vintage early 2004. If it's dual, I never knew it!
Might it be?

The calculation is definitely single-thread.

No it is single core.

BUT it has hyperthreading.

Which in WinXP task manager looks like 2 CPU's !

And to utilize HT you still need to multithread.

Arne
 
L

Luc The Perverse

Patricia Shanahan said:
Is there any possibility that you have a dual processor, possibly two
cores in one chip?

Utilization freezing at close to 50%, even at very high priority, for a
compute intensive job is typical of running a single threaded
application on a dual processor.

If that is what is going on, you should be able to run two copies of the
job (if it does not use too much memory) at the same time almost as fast
as one copy. If so, look at parallelizing the compute-bound portion of
the job.

What dominates the computation? Some algorithms are easier to
parallelize than others.

Utilizing multiple processors/cores to do tasks which seem to be iterative
(I'm sure there is probably a more formal/correct way to say this) is a very
active and fun area of computer science right now!
 
S

Steve Brecher

Patricia Shanahan said:
Is there any possibility that you have a dual processor, possibly two
cores in one chip?

It seems I do, virtually speaking -- Pentium 4, apparently with
Hyper-Threading Technology (thanks to Arne Vajhøj in
Utilization freezing at close to 50%, even at very high priority, for
a compute intensive job is typical of running a single threaded
application on a dual processor.

If that is what is going on, you should be able to run two copies of
the job (if it does not use too much memory) at the same time almost
as fast as one copy. If so, look at parallelizing the compute-bound
portion of the job.

What dominates the computation? Some algorithms are easier to
parallelize than others.

It's nested loops enumerating cases; there's a computation for each case,
i.e., inside the innermost loop, and the computation results are
accumulated.

It should be possible to dual-thread it, e.g., one thread doing the odd
cases, so to speak, and the other the even ones. I could synchronize access
to the accumulation structures, or perhaps have two of them. For generality
maybe I can even N-thread it. I'll have to think about this...
 
P

Patricia Shanahan

Steve Brecher wrote:
....
It's nested loops enumerating cases; there's a computation for each case,
i.e., inside the innermost loop, and the computation results are
accumulated.

It should be possible to dual-thread it, e.g., one thread doing the odd
cases, so to speak, and the other the even ones. I could synchronize access
to the accumulation structures, or perhaps have two of them. For generality
maybe I can even N-thread it. I'll have to think about this...

Given trends in computer architecture, I suggest N-threading it while
you are about it. When you go to replace that computer, you may find
yourself getting something with multiple cores, each multi-threaded.

For reduction problems (problems that take a long vector and produce a
single answer, such as adding things up), it is generally better, if
permitted by the problem, to have an accumulator for each thread, and
only add them at the end. The less synchronization in the middle of the
problem, the faster it will go.

Consider organizing the work so that each thread operates on a
contiguous chunk of data, in case they get assigned to separate
processors with their own caches.

However, I would go for simplicity, within the at-least-dual requirement.

Patrica
 
C

Chris Uppal

Arne said:
No it is single core.

BUT it has hyperthreading.

Which in WinXP task manager looks like 2 CPU's !

And to utilize HT you still need to multithread.

But don't assume that making the application use the other "cpu" will
necessarily speed anything up. HT is (for most purposes) better regarded as a
cheap marketing gimmick than a valid technology.

Or -- to put it another way -- the CPU usage reported by TaskManager is
misleading. It suggests that 50% of your available horse-power is
unused. My bet would be that it's more like 5% -- if not actually zero.

-- chris
 
P

Patricia Shanahan

Chris said:
But don't assume that making the application use the other "cpu" will
necessarily speed anything up. HT is (for most purposes) better regarded as a
cheap marketing gimmick than a valid technology.

Or -- to put it another way -- the CPU usage reported by TaskManager is
misleading. It suggests that 50% of your available horse-power is
unused. My bet would be that it's more like 5% -- if not actually zero.

Here's a suggestion for a cheap test:

1. Add, if the application does not already contain it, some performance
statistics collection keeping track of how much elapsed time it takes to
do a given quantity of the compute intensive work.

2. Run one copy of the application. Record the statistics.

3. Run two copies of the application, simultaneously. Record the statistics.

If it is likely to benefit from multi-threading, the total work rate
will be significantly higher with two copies than with one. If it is the
sort of case Chris is talking about, each copy will run at slightly
better than half the speed of the single copy.

This test automatically takes into account questions such as how much
time a thread of your application spends waiting for memory, which can
affect how much you gain from hyperthreading.

Patricia
 
L

Luc The Perverse

Chris Uppal said:
But don't assume that making the application use the other "cpu" will
necessarily speed anything up. HT is (for most purposes) better regarded
as a
cheap marketing gimmick than a valid technology.

Or -- to put it another way -- the CPU usage reported by TaskManager is
misleading. It suggests that 50% of your available horse-power is
unused. My bet would be that it's more like 5% -- if not actually zero.

Hey I heard people were getting upwards of 5% increases in . . .things
 
S

Steve Brecher

Patricia Shanahan said:
Chris Uppal wrote:

in an article not as yet presented by my news server :( hence quoted
indirectly...
[...]
But don't assume that making the application use the other "cpu" will
necessarily speed anything up. HT is (for most purposes) better
regarded as a cheap marketing gimmick than a valid technology.

OK. Actually, I am using my P4 system only for development. The real
target is some hardware yet to be acquired which will undoubtedly be
dual-core. The current project is a self-tutorial; it's my first Java and
Eclipse IDE experience; it's a port from C.

I'm curious about why that would be, but as implied above it's rather idle
curiosity.

[now quoting Patricia]
Here's a suggestion for a cheap test:

1. Add, if the application does not already contain it, some
performance statistics collection keeping track of how much elapsed
time it takes to do a given quantity of the compute intensive work.

2. Run one copy of the application. Record the statistics.

Already done.
3. Run two copies of the application, simultaneously. Record the
statistics.

How important is (almost) exact simultaneity? Would starting one manually
via console, then another be sufficient? This would mean a delay of several
seconds; the run time is 3+ minutes. If not, it would be reasonably easy to
multi-thread the code if the the total workload didn't have to be
apportioned among the threads.
If it is likely to benefit from multi-threading, the total work rate
will be significantly higher with two copies than with one. If it is
the sort of case Chris is talking about, each copy will run at
slightly better than half the speed of the single copy.

This test automatically takes into account questions such as how much
time a thread of your application spends waiting for memory, which can
affect how much you gain from hyperthreading.

Would there be reasons other than data caching that each copy would run at
better than half the speed of a single copy?
 
P

Patricia Shanahan

Steve said:
Patricia Shanahan said:
Chris Uppal wrote:

in an article not as yet presented by my news server :( hence quoted
indirectly...
[...]
But don't assume that making the application use the other "cpu" will
necessarily speed anything up. HT is (for most purposes) better
regarded as a cheap marketing gimmick than a valid technology.

OK. Actually, I am using my P4 system only for development. The real
target is some hardware yet to be acquired which will undoubtedly be
dual-core. The current project is a self-tutorial; it's my first Java and
Eclipse IDE experience; it's a port from C.

In that case, you should probably use the opportunity to practice
parallelizing the job, so that you know how to take advantage of a
dual-core processor.
I'm curious about why that would be, but as implied above it's rather idle
curiosity.

[now quoting Patricia]
Here's a suggestion for a cheap test:

1. Add, if the application does not already contain it, some
performance statistics collection keeping track of how much elapsed
time it takes to do a given quantity of the compute intensive work.

2. Run one copy of the application. Record the statistics.

Already done.
3. Run two copies of the application, simultaneously. Record the
statistics.

How important is (almost) exact simultaneity? Would starting one manually
via console, then another be sufficient? This would mean a delay of several
seconds; the run time is 3+ minutes. If not, it would be reasonably easy to
multi-thread the code if the the total workload didn't have to be
apportioned among the threads.

I would think that would be close enough. We are trying to tell the
difference between a throughput change that would justify programming
effort and a few percentage point change.
Would there be reasons other than data caching that each copy would run at
better than half the speed of a single copy?

Very few jobs really use ALL the cycles when they are "running" on a
processor, so giving a hyperthreaded processor a second job should
produce some increase in total throughput.

Patricia
 
S

Steve Brecher

Patricia Shanahan said:
I would think that would be close enough. We are trying to tell the
difference between a throughput change that would justify programming
effort and a few percentage point change.

The single-job calculation takes about 185 sec. Running two of them
manually took 456 and 461 sec., with Task Manager reporting 49-50% CPU usage
for each. So the average of the two was 2.5x the single-job!
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Chris said:
But don't assume that making the application use the other "cpu" will
necessarily speed anything up. HT is (for most purposes) better regarded as a
cheap marketing gimmick than a valid technology.

Or -- to put it another way -- the CPU usage reported by TaskManager is
misleading. It suggests that 50% of your available horse-power is
unused. My bet would be that it's more like 5% -- if not actually zero.

Intel claims HT = 1.3 CPU.

I have seen code that do show the +30%.

Arne
 
P

Patricia Shanahan

Steve said:
The single-job calculation takes about 185 sec. Running two of them
manually took 456 and 461 sec., with Task Manager reporting 49-50% CPU usage
for each. So the average of the two was 2.5x the single-job!

I would look at it in terms of throughput. Single thread does one job
per 185 seconds. Two jobs does about 2 jobs per 460 seconds = 1 job per
230 seconds.

The two job throughput is 230/185 = 1.24 times the single thread
throughput, a 24% gain.

You need to decide whether that is enough to justify the work of
parallelizing the job. However, I think you said you will be moving to
dual core, and the trend seems to be towards multiprocessing, so it
might be worth the investment even for a relatively small gain.

Patricia
 
C

Chris Uppal

Patricia said:
I would look at it in terms of throughput. Single thread does one job
per 185 seconds. Two jobs does about 2 jobs per 460 seconds = 1 job per
230 seconds.

The two job throughput is 230/185 = 1.24 times the single thread
throughput, a 24% gain.

Loss ;-)

-- chris
 
C

Chris Uppal

Steve said:
I'm curious about why that would be, but as implied above it's rather idle
curiosity.

Well, the generally reported figure is in that ball-park.

As for explaining it, I should first warn you that I'm not especially
knowledgeable about hardware/chip design, and I'm also relying on a (possibly
faulty) memory, so take all of the following with the usual pinch of salt, and
verify (or refute) it for yourself before depending on it.

That said, my understanding is that, although the Intel HT stuff duplicates
enough registers to allow two independent execution streams, it does /not/
duplicate the ALUs, or the instruction decode pipeline. So the actual
processing power is shared between the two threads, or the two "CPU"s running
them. That means that the HT architecture only provides a benefit when one
thread is stalled on a cache read, or otherwise has nothing in its instruction
pipeline, /and/ the other thread /does/ have all the data and decoded
instructions necessary to proceed. Since the two threads are competing for the
cache space (and in any case most programs spend a lot of time stalled one way
or another) that doesn't happen too very often.

There /are/ programs which benefit usefully from HT, but the general experience
seems to be that they are not common. The ideal case (I think) would be when
the two threads were executing the same (fairly small) section of code and (not
too big) section of data (so the instruction pipeline and cache would serve
both as well as the same circuitry could serve either one); and the mix of data
accesses is such that the interval between stalls for a cache refill is
approximately equal to the time taken for a cache refill. The less the actual
mix of instructions seen by each CPU resembles that case, the more the whole
system will degrade towards acting like one CPU time-sliced between the two
threads.

Note that, in the worst case, the cache behaviour of the two threads executing
at the same time may be worse than it would be if the same two threads were
time-sliced at coarse intervals by the OS but had the whole of the cache
available to each thread at a time.

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,965
Messages
2,570,148
Members
46,710
Latest member
FredricRen

Latest Threads

Top