Thread question

R

Roedy Green

The application is I have a list of 1200 books and a list of 21 online
bookstores. I want to find out which bookstores carry the book. I
have code now that given a book and bookstore will find out if it
carries it and records the result. It probes the appropriate page and
scans for clues, positive and negative.

The process is very slow because it mostly spends its time waiting for
the bookstore to respond. I figured I could fairly easily make one
book-x-store probe into a Runnable. Happily there is only very simple
interactions between Runnables.

You could imagine setting all the Runnables loose at once.

I need two features:

1. some sort of throttle on releasing them that I don't swamp the JVM.
2. some way of knowing when the last one completed.

I could do this by having my Runnables increment and decrement global
counts, however I suspect there is something built in to handle this
flawlessly.

There are so many tools. I wonder if anyone would like to point me to
the most appropriate one for this task.

If I get this working, I would like to add similar logic to the
BrokenLinks link checker.
--
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 
R

Roedy Green

There are so many tools. I wonder if anyone would like to point me to
the most appropriate one for this task.

ThreadPoolExecutor is looking promising, though overly elaborate. In
my case threads can die as soon as there is no more work for them and
there will be no need to dynamically grow or shrink the pool.
Now I have to figure out what flavour of BlockingQueue to use.

I'll be curious to find out the optimal number of threads. I don't
like to pester the book stores. I can debug this with 2 threads
without doing too much pestering, but repeated runs to binary search
the optimum number of threads... maybe I should use the fancy
features to dynamically adjust it via a feedback mechanism.

--
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 
R

Roedy Green

Now I have to figure out what flavour of BlockingQueue to use.

It looks like a LinkedBlockingQueue will do well, with the advantage
it will scale well for my other project. This is beginning to look a
heck of a lot simpler than I thought it would be. Most of the work
will be refactoring to bundle up all that needs to be done in a
Runnable.

There are so many tools I think I got the idea to do anything you had
to use 90% of them in some complicated way. But the actuality is you
can do something useful with just a couple.
--
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 
E

Eric Sosman

The application is I have a list of 1200 books and a list of 21 online
bookstores. I want to find out which bookstores carry the book. I
have code now that given a book and bookstore will find out if it
carries it and records the result. It probes the appropriate page and
scans for clues, positive and negative.

The process is very slow because it mostly spends its time waiting for
the bookstore to respond. I figured I could fairly easily make one
book-x-store probe into a Runnable. Happily there is only very simple
interactions between Runnables.

You could imagine setting all the Runnables loose at once.

I need two features:

1. some sort of throttle on releasing them that I don't swamp the JVM.

Use a java.util.concurrent.ExecutorService. There's a large
number of ways to configure them, but I'd suggest starting out
simple, with a fixed-size thread pool plucking tasks from a queue.
2. some way of knowing when the last one completed.

A simple counter will do it. Note that you'll likely want to
be able to proceed without waiting for every last probe to complete:
If you've got twenty answers but qcvfl.com[*] appears hung, it may be
best to abandon the lallygagger and proceed with what you've got. So,
consider having a time-out as an alternative "completion" signal.

[*]"Quaint and Curious Volumes of Forgotten Lore."
 
M

markspace

1. some sort of throttle on releasing them that I don't swamp the JVM.

I wouldn't bother. It's actually well known that for fast efficient IO
you should start as many threads as possible. 21 threads aren't going
to swamp anything.

If you start to have in excess of say 100 to 1000 threads, then maybe
you can think about a throttle. Until there's a chance of that many
threads, you're just gold-plating your software.

<http://en.wikipedia.org/wiki/Gold_plating_(software_engineering)>
 
M

markspace

I wouldn't bother. It's actually well known that for fast efficient IO
you should start as many threads as possible. 21 threads aren't going to
swamp anything.

If you start to have in excess of say 100 to 1000 threads, then maybe
you can think about a throttle. Until there's a chance of that many
threads, you're just gold-plating your software.

<http://en.wikipedia.org/wiki/Gold_plating_(software_engineering)>


Reading the docs carefully, it looks like Executors::newFixedThreadPool
might only allocate new threads as it needs them, up to a maximum.
Rather than say allocate the maximum number of specified threads
immediately. If it does, this would be ideal for throttling your tasks.

Even if it doesn't, a relatively small number of threads, say
"newFixedThreadPool(50)" is probably much much easier than trying to
invent some throttling mechanism yourself.


<http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool(int)>
 
R

Roedy Green

ThreadPoolExecutor is looking promising, though overly elaborate.

I got my code to compile. Then I discovered something quite idiotic
about the class, Quoting Goetz et al "There is no predefined
saturation policy to make "execute" block when the work queue is
full."

Huh? is that not the most common case?
So it looks like I have to compose a custom blocking policy handler.

--
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 
M

markspace

Huh? is that not the most common case?
No.

So it looks like I have to compose a custom blocking policy handler.

Read it again, or just use a fixed size thread pool from Executors.
It'll work fine.
 
P

Paul Cager

On 11/30/2011 2:32 AM, Roedy Green wrote:
...


Here's a basic approach to picking a thread count:
...

In this case it's not just your own resources you have to take into
consideration. Some web sites won't like having 20 simultaneous
connections from the same IP scraping the site.
 
D

Daniel Pitts

The application is I have a list of 1200 books and a list of 21 online
bookstores. I want to find out which bookstores carry the book. I
have code now that given a book and bookstore will find out if it
carries it and records the result. It probes the appropriate page and
scans for clues, positive and negative.

The process is very slow because it mostly spends its time waiting for
the bookstore to respond. I figured I could fairly easily make one
book-x-store probe into a Runnable. Happily there is only very simple
interactions between Runnables.

You could imagine setting all the Runnables loose at once.

I need two features:

1. some sort of throttle on releasing them that I don't swamp the JVM.
2. some way of knowing when the last one completed.

I could do this by having my Runnables increment and decrement global
counts, however I suspect there is something built in to handle this
flawlessly.

There are so many tools. I wonder if anyone would like to point me to
the most appropriate one for this task.

If I get this working, I would like to add similar logic to the
BrokenLinks link checker.
Look into ExecutorService and Executors in java.util.concurrency. They
have exactly what you want.

<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/ExecutorService.html>

<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/Executors.html>

The idea is that you submit jobs to the ExecutorService (which is
basically some sort of Thread Pool), and you can wait for the "Future"
result of those jobs.

You might even consider turning your BookStoreProbe into a Callable:

<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/Callable.html>


More useful things in package summary:
<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/package-summary.html>
 
D

Daniel Pitts

I got my code to compile. Then I discovered something quite idiotic
about the class, Quoting Goetz et al "There is no predefined
saturation policy to make "execute" block when the work queue is
full."

Huh? is that not the most common case?
So it looks like I have to compose a custom blocking policy handler.

There is a somewhat useful alternative, which is the CallerRunPolicy.

Then look at the implementations (in ThreadPoolExecutor) for

http://docs.oracle.com/javase/6/doc...util/concurrent/RejectedExecutionHandler.html

<http://docs.oracle.com/javase/6/docs/api/index.html?java/util/concurrent/ThreadPoolExecutor.html>
 
R

Roedy Green

I used a semaphore to block when the queue and threads are full. Even
with 5 threads it is way faster than before.

I see triples
finishing a
starting b
submitting c

pause

It does not dilly dally recycling a thread.

The big problem is the docs. They are aimed at advanced users. They
expect you to already understand the basic stuff. I feel irritation
about the way you have kludge in blocking. What were they thinking?
It is like selling a car without wheels.
--
Roedy Green Canadian Mind Products
http://mindprod.com
For me, the appeal of computer programming is that
even though I am quite a klutz,
I can still produce something, in a sense
perfect, because the computer gives me as many
chances as I please to get it right.
 
E

Eric Sosman

I'd recommend the simple built-in method "ExecutorService::shutdown()".
It waits until all tasks have completed.

ITYM awaitTermination(), a method I hadn't noticed but which
does as you say *and* has a useful timeout. shutdown(), however,
does not wait: It just "initiates an orderly shutdown" and returns
with the shutdown still (potentially) in progress.
 
M

markspace

ITYM awaitTermination(), a method I hadn't noticed but which
does as you say *and* has a useful timeout. shutdown(), however,
does not wait: It just "initiates an orderly shutdown" and returns
with the shutdown still (potentially) in progress.


You're probably right, it's been a while since I actually used an
executor service.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top