P
Chris said:Well, the generally reported figure is in that ball-park.
As for explaining it, I should first warn you that I'm not especially
knowledgeable about hardware/chip design, and I'm also relying on a (possibly
faulty) memory, so take all of the following with the usual pinch of salt, and
verify (or refute) it for yourself before depending on it.
That said, my understanding is that, although the Intel HT stuff duplicates
enough registers to allow two independent execution streams, it does /not/
duplicate the ALUs, or the instruction decode pipeline. So the actual
processing power is shared between the two threads, or the two "CPU"s running
them. That means that the HT architecture only provides a benefit when one
thread is stalled on a cache read, or otherwise has nothing in its instruction
pipeline, /and/ the other thread /does/ have all the data and decoded
instructions necessary to proceed. Since the two threads are competing for the
cache space (and in any case most programs spend a lot of time stalled one way
or another) that doesn't happen too very often.
There /are/ programs which benefit usefully from HT, but the general experience
seems to be that they are not common. The ideal case (I think) would be when
the two threads were executing the same (fairly small) section of code and (not
too big) section of data (so the instruction pipeline and cache would serve
both as well as the same circuitry could serve either one); and the mix of data
accesses is such that the interval between stalls for a cache refill is
approximately equal to the time taken for a cache refill. The less the actual
mix of instructions seen by each CPU resembles that case, the more the whole
system will degrade towards acting like one CPU time-sliced between the two
threads.
Note that, in the worst case, the cache behaviour of the two threads executing
at the same time may be worse than it would be if the same two threads were
time-sliced at coarse intervals by the OS but had the whole of the cache
available to each thread at a time.
Chris Uppal said:Arne Vajhøj wrote:
[me:]Intel claims HT = 1.3 CPU.
"Yeah, right"
;-)
I have seen code that do show the +30%.
Undoubtedly such code does exist. I'm only claiming that it's not the
norm.
Chris said:Undoubtedly such code does exist. I'm only claiming that it's not the norm.
Luc said:Chris Uppal said:Arne Vajhøj wrote:
[me:]Undoubtedly such code does exist. I'm only claiming that it's not theOr -- to put it another way -- the CPU usage reported by TaskManager is
misleading. It suggests that 50% of your available horse-power is
unused. My bet would be that it's more like 5% -- if not actually
zero.
I have seen code that do show the +30%.
norm.
It was no doubt a hand made assembly algorithm specifically designed to take
advantage of the hyperthreading limited abilities.
Patricia Shanahan said:Correct.
I said:In a routine called from inner loops -- this routine is called 800
million times in the timing test case I've been using -- I have
something like this (schematically):
for (int i = 0; i < n; i++) { //n is typically a single-digit value
(min 2) result = AStaticMethod(arg);
...
}
What would be the lowest-overhead way to multi-thread the executions
of AStaticMethod?
Steve said:Patricia Shanahan said:Correct.
OK, so I won't hope for improvement by multi-threading on my P4 with
"Hyper-Threading Technology."
But looking ahead to other hardware...
In a routine called from inner loops -- this routine is called 800 million
times in the timing test case I've been using -- I have something like this
(schematically):
for (int i = 0; i < n; i++) { //n is typically a single-digit value (min 2)
result = AStaticMethod(arg);
...
}
What would be the lowest-overhead way to multi-thread the executions of
AStaticMethod?
Steve said:for (int i = 0; i < n; i++) { //n is typically a single-digit value (min
2) result = AStaticMethod(arg);
...
}
Patricia said:2. Can the two thread share all the caches, branch predictors etc.
without getting in each other's way? That can happen either if they are
happy with the same cache contents or if they don't need much cache.
From this point of view, the two job test may have been unfair, because
two independent jobs are less likely to do a good job of cache sharing
than two threads in the same job.
Also, n becomes known at program startup, and its maximum value is aPatricia Shanahan said:Steve Brecher wrote: ....
result = AStaticMethod(arg); AStaticMethod returns a primitive type.
...
}
What would be the lowest-overhead way to multi-thread the executions
of AStaticMethod?
Rule #1 for optimizing loop nests, commonly followed by optimizing
compilers:
*** Examine the whole loop nest as a unit. ***
An innermost loop with small iteration count is not usually the best
place to begin optimization.
Are you using a "client" or "server" version of Java? My understanding
is that the "server" JVMs do more routine optimizations than the
"client" versions.
...
Even without going multi-threaded, many loop nests can be made more
efficient by changing the order of the loops, loop unrolling etc.
(For readers just joining us: I am a Java newbie.)
With respect and thanks, optimization is the not the issue. The code is a
port of long-standing C code that is highly optimized -- I'm very familiar
with optimization techniques; actually, that is my specialty. I'd like to
try multi-threading it.
The loops enumerate cases. For each case, there are "n" significant
computations, each accomplished in AStaticMethod (excuse the initial
upper-case . Multi-threading the executions of that method would be a
very easy way to begin. Partitioning the enumeration of cases on the other
hand, would be difficult -- at this writing, I don't have a scheme to do
that.
So far my knowledge of Java multi-threading is based on rapid pass through
the relevant material in Flanagan's "Java in a Nutshell" and Sun's "The Java
Tutorials."
If possible, I would like a way to do the multi-threading that creates no
objects for each execution of AStaticMethod. Currently the code, after
startup, creates no objects and incurs no GC.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.