How many threads?

R

Roedy Green

I am about to do some multithreading code to get some parallelism when
waiting for Internet links to respond. The question comes How many
threads? Well it depend on too many things. How much ram, how fast the
connections, what else is going on in the machine.

The optimal choice could change over time, even within a single run.

What I need is some sort of homing mechanism that homes in on the
optimal number. Surely this is a common problem. Are there classes to
handle this automatically?
 
S

shakah

I am about to do some multithreading code to get some parallelism when
waiting for Internet links to respond. The question comes How many
threads? Well it depend on too many things. How much ram, how fast the
connections, what else is going on in the machine.

The optimal choice could change over time, even within a single run.

What I need is some sort of homing mechanism that homes in on the
optimal number. Surely this is a common problem. Are there classes to
handle this automatically?

I think that your problem calls for a blocking-queue/thread-pool
solution, with possibly an upper-bound to the number of threads --
IMHO, you're unlikely to exhaust a local resource (e.g. CPU or memory,
assuming a reasonably up-to-date box) given the amount of idle time
you'll encounter waiting for I/O while making asynch requests over the
Internet.
 
M

Mark Space

shakah said:
On Jul 30, 7:28 pm, Roedy Green <[email protected]>
wrote:
I think that your problem calls for a blocking-queue/thread-pool
solution, with possibly an upper-bound to the number of threads --
IMHO, you're unlikely to exhaust a local resource (e.g. CPU or memory,
assuming a reasonably up-to-date box) given the amount of idle time
you'll encounter waiting for I/O while making asynch requests over the
Internet.

Besides making use of the API in java.util.concurrent, I think you've
identified one of the requirements yourself in the line I quoted above:
change.

Think about designing for change, and which design patterns support high
change situations. The Strategy Pattern, for example. If you make your
"threading strategy" something that can be configured on the fly, then I
think you might have more ability to change things as the code (and the
user requirements) evolve.

You can start off with an ExecutorService from java.util.concurrent like
ThreadPoolExecutor, and then as you learn more substitute your own
custom, well-tuned implementation.
 
A

Arne Vajhøj

Roedy said:
I am about to do some multithreading code to get some parallelism when
waiting for Internet links to respond. The question comes How many
threads? Well it depend on too many things. How much ram, how fast the
connections, what else is going on in the machine.

The optimal choice could change over time, even within a single run.

What I need is some sort of homing mechanism that homes in on the
optimal number. Surely this is a common problem. Are there classes to
handle this automatically?

I think standard practice is no make it configurable to allow
the admin to test and configure what is optimal for that site.

That is of course a Java EE way of thinking.

For your app where the actually work is relative small compared
to how long it will block, then you should go for a pretty
big number of threads. As many as the OS will still perform
well with.

Arne
 
M

Mark Space

Arne said:
I think standard practice is no make it configurable to allow
the admin to test and configure what is optimal for that site.

That is of course a Java EE way of thinking.

I don't think it's just "Java EE". I've seen C++ programmers go through
complex gyrations to implement something similar to Java's reflection in
C++, just because the Strategy Pattern and Data Driven Design are that
important. It something that should be considered for almost every
design. Change is the most fundamental, common aspect of all software
engineering, after all.
 
A

Arne Vajhøj

Mark said:
I don't think it's just "Java EE". I've seen C++ programmers go through
complex gyrations to implement something similar to Java's reflection in
C++, just because the Strategy Pattern and Data Driven Design are that
important. It something that should be considered for almost every
design. Change is the most fundamental, common aspect of all software
engineering, after all.

It is not just Java EE, but it requires a context where there are
somebody that knows about config files, threads and performance testing.

For a Java EE server there should be somebody that knows that.

For the desktop app used by Joe Average in Smalltown it is often
not the case.

Not that Java has much market share in that space anyway.

Arne
 
R

Roger Lindsjö

Roedy said:
I am about to do some multithreading code to get some parallelism when
waiting for Internet links to respond. The question comes How many
threads? Well it depend on too many things. How much ram, how fast the
connections, what else is going on in the machine.

The optimal choice could change over time, even within a single run.

What I need is some sort of homing mechanism that homes in on the
optimal number. Surely this is a common problem. Are there classes to
handle this automatically?

You you really need many threads? How about using nio and poll to have a
single thread manage many connections at once? This way it could be
possible just enqueue the request you need to be done and handle many of
them in parallel with a single (or few) threads.

But hundreds is often no problem. I have written a engine that emulates
user requests against a server. A "user" is one of many javascript that
can do (among other things) open(url), openLink(id), openLink(name),
sleep(time) etc. Assuming the server takes tens to hundreds of
milliseconds to respond I can usually run several hundreds of threds at
once. Thousands of threads seems to be fine if the response time goes
above one or a few seconds.

Note: I have run this on Dual and Quad core Intel and AMD on Linux, not
Windows, so there could be a difference.
 
T

Tom Anderson

But hundreds is often no problem.

Note: I have run this on Dual and Quad core Intel and AMD on Linux, not
Windows, so there could be a difference.

I believe that linux's threading is much, much faster than windows'. But a
few hundred threads could still be fine on windows.

tom
 
C

Christian

Tom said:
I believe that linux's threading is much, much faster than windows'. But
a few hundred threads could still be fine on windows.

tom

Does this believe have any base? Or is it just the usual Unix/Windows flame?
 
R

Roedy Green

You you really need many threads? How about using nio and poll to have a
single thread manage many connections at once? This way it could be
possible just enqueue the request you need to be done and handle many of
them in parallel with a single (or few) threads.

There are two issues:

1. the general one of multithread apps that don't know the optimal
number of threads, and would like some general mechanism to optimise
it dynamically to adjust for changing conditions, e.G. other tasks
running in the same machine soaking up RAM/CPU. The idea is your would
write some custom routine that measured productivity. e.G.
transactions/second, bytes processed per minute etc. The manager would
raise and lower the max threads and collect data, and try to fit it
say to a Cheyenne polynomial to try to predict the optimal value. You
would weight your readings to give more recent ones more weight in the
polynomial evaluation. It might just up the count and down the count
by one every so often just to notice if conditions are changing.


2. my specific one of improving my back end to Xenu link checker that
rechecks links in a clever way to avoid needless work. Eventually I
hope to get rid of Xenu entirely. It is too many flaws.
I am sending HEAD and GET posts out to the Internet mainly to see the
status code that comes back. I gather this nio approach works at the
socket level or lower, rather than HTTP.
 
T

Tom Anderson

Does this believe have any base? Or is it just the usual Unix/Windows
flame?

Stuff i read on the linux kernel mailing list a few years back. Which is
not necessarily an affirmative answer to your first question!

I can't cite proper data on this, i have to confess. We ought to extend
this thread to a linux and a windows newsgroup ...

tom

--
Imagine a city where graffiti wasn't illegal, a city where everybody
could draw wherever they liked. Where every street was awash with a
million colours and little phrases. Where standing at a bus stop was never
boring. A city that felt like a living breathing thing which belonged to
everybody, not just the estate agents and barons of big business. Imagine
a city like that and stop leaning against the wall - it's wet. -- Banksy
 
T

Tom Anderson

and try to fit it say to a Cheyenne polynomial to try to predict the
optimal value.

What's a Cheyenne polynomial? I don't think i've come across those.

tom

--
Imagine a city where graffiti wasn't illegal, a city where everybody
could draw wherever they liked. Where every street was awash with a
million colours and little phrases. Where standing at a bus stop was never
boring. A city that felt like a living breathing thing which belonged to
everybody, not just the estate agents and barons of big business. Imagine
a city like that and stop leaning against the wall - it's wet. -- Banksy
 
K

Knute Johnson

Tom said:
Stuff i read on the linux kernel mailing list a few years back. Which is
not necessarily an affirmative answer to your first question!

I can't cite proper data on this, i have to confess. We ought to extend
this thread to a linux and a windows newsgroup ...

tom

Sorry I came to this thread late but I remember here a while back I did
some experiments on the number of threads on Windows XP. I found once
you got past 75-100 things got really slow. That was on my dual core
machine and one with more processors would probably do much better.

I think in many cases you might do much better with schedulers and fewer
threads. Less system overhead and memory use.
 
R

Roedy Green

Oh, i see. That i can find on wikipedia. Although it still doesn't make a
huge amount of sense to me!

Chebychev polynomial approximation is sort of polynomial curve fit,
but better behaved than ordinary polynomials. They behave more like
real-world curves do.

The problem with curve fitting is in tends to work well for
interpolation but goes wildly up or down off the ends for
extrapolation. Probably reading up on mathematical techniques for
safe extrapolation might be in order.

My idea is thought that if you make an "error", adjusting the number
of threads in a way that makes throughput worse, feedback will soon
pull you back. You can consider any failure a worthwhile data
gathering expedition.

Probably almost any heuristic will work fine so long as you don't
increase or decrease the thread count by too much at a pop.
 
J

Joshua Cranmer

Roedy said:
Chebychev polynomial approximation is sort of polynomial curve fit,
but better behaved than ordinary polynomials. They behave more like
real-world curves do.

To be more precise, Chebyshev polynomials spread out the interpolation
points to favor the end more, as regular polynomial interpolation near
the ends of the range tends to oscillate wildly. That phenomenon is
Runge's Phenomenon. Wikipedia has a good diagram of that, but it doesn't
have one of the Chebyshev interpolation...
My idea is thought that if you make an "error", adjusting the number
of threads in a way that makes throughput worse, feedback will soon
pull you back. You can consider any failure a worthwhile data
gathering expedition.

Doesn't sound like it would work if the were several local extrema. Then
again, the graph (ignoring noise) would probably have only one extrema
to begin with...
 
J

John B. Matthews

Joshua Cranmer said:
To be more precise, Chebyshev polynomials spread out the interpolation
points to favor the end more, as regular polynomial interpolation near
the ends of the range tends to oscillate wildly. That phenomenon is
Runge's Phenomenon. Wikipedia has a good diagram of that, but it doesn't
have one of the Chebyshev interpolation...
[...]

Thanks, I hadn't seen this compelling graph before:

<http://en.wikipedia.org/wiki/Runge's_phenomenon>

From there, I see that "roots of the Chebyshev polynomial ... are often
used as nodes in polynomial interpolation ... to minimize the problem of
Runge's phenomenon":

<http://en.wikipedia.org/wiki/Chebyshev_nodes>

Which leads to an interesting contrast here:

<http://en.wikipedia.org/wiki/Approximation_theory>

[I'm still laughing over my earnest search for "Cheyenne polynomials,"
now enshrined in my browser's history!]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,710
Latest member
bernietqt

Latest Threads

Top