concurrent 1.5

H

Homer

Hi All,

I am trying to write a multi-thread code to read a directory and move
all files to another directory (well and some other stuff but lets talk
about the core part). What I am trying to do is:

1- Read the directory.
2- Create Threads for moving each file.
3- Thread will put the file name somewhere so other thread wont try to
move the same file.
4- Limit number of the threads to 10 but don't kill/create threads when
they are done (recycle).


Is concurrent package is good for doing something like that or I better
write my own?

Any example, class, point to start?


Thanks in advance,

Homer
 
R

Remon van Vliet

Homer said:
Hi All,

I am trying to write a multi-thread code to read a directory and move
all files to another directory (well and some other stuff but lets talk
about the core part). What I am trying to do is:

1- Read the directory.
2- Create Threads for moving each file.
3- Thread will put the file name somewhere so other thread wont try to
move the same file.
4- Limit number of the threads to 10 but don't kill/create threads when
they are done (recycle).


Is concurrent package is good for doing something like that or I better
write my own?

Any example, class, point to start?


Thanks in advance,

Homer

I cant really think of a good reason to do this with more than 1 "move my
files" thread. Why do you require 10? It certainly wont speed up the
process. Anyway, if you insist on this simply have all your move threads
grab filenames of a single blocking queue filled by your "main" thread (in
other words, fill it with the results from reading the directory). This will
ensure they wont start moving a file twice or skip files. The
java.util.concurrent package provides some utility classes for thread
concurrency but it's not directly related to the solution for your problem.

Remon van Vliet
 
M

Matt Humphrey

Remon van Vliet said:
I cant really think of a good reason to do this with more than 1 "move my
files" thread. Why do you require 10? It certainly wont speed up the
process.

I'm curious on what you're basing this assertion. Even though the operation
is disk-bound, it seems to me that giving the disk scheduler multiple
requests may reduce latency and maintain saturation.

Cheers,

Matt Humphrey (e-mail address removed) http://www.iviz.com/
 
A

Alex Hunsley

(piggy-backing on Remon's reply, as I can't see the original post...)


Don't do this. This is a classic example of thread overkill - there is
no benefit to using multiple threads to move a load of files. In fact,
as you have noticed, it only adds hassle - you have to worry about
concurrency issues.
What do you imagine that multiple threading will add to this task?

Also, moving multiple files simultaneously will probably be *slower*
than if you just moved them one at a time from one thread. Why is this?
Well, even just on a physical level, you will have the read/write head
of the hard disk jumping about, reading and writing itty-bitty bits of
files for each thread[1]. Whereas if you just had it doing one file at a
time, it can handle decent sized chunks of file at a time - it would be
quicker (less seek time) and would it would also entail less wear and
tear on your hard disk seek mechanism.
Also, the at a Java level, the JVM is keeping extra state and wasting
time doing context switching between threads, so it's making everything
less efficient that way.

Maybe you've thinking of doing this because you've heard that threads
are often used to do I/O - this is correct. But that tends to be just
one thread, doing some I/O on a disk, while other threads are free to do
computation and generally orchestrate things.
So, in essence, it might make sense for your program to have multiple
threads, but I would limit it to one thread only that actually does the
file moving.
A good example of when your program would use multiple threads: suppose
you want your file mover program to have a GUI that shows progress. In
this case, you'd want to a separate thread that did the file moving, or
else you'd clog up the GUI (event dispatch thread) and your app would
become unresponsive.

One of the other classic misuses of threads is when people want to write
a game, and they think: "oh, games involve independent entities doing
their own thing in a world, so I should use a thread for each game
entity". Threads aren't really applicable here though. I think the
misguided person is thinking that because game entities have their own
properties (sense of state) and they all do things simultaneously in the
game, the magic word "threads" pops up. But it's usually a mistake to go
down this route. In particular, a games engine usually wants each entity
to have its own 'go' at moving and updating its state, all in strict
lock-step, so that at the end of each game 'update' loop, every entity
has had exactly one chance of updating itself. If you did try to use
multiple threads for different game entities, various entities would
possibly proceed through the world at different rates (depending on how
lengthy their update code was!) To stop this you could use concurrency
etc. to make everything work in lock-stop. But then why bother with
threading each entity in the first place?
For a regular game, this would just be wrong.
For some things it might be more applicable - e.g. simulating lifeforms,
and wanting the slowness of their 'thought processes' (i.e. update code
length) to be reflected in the simulation. But this is not usually the
case...

Oops, I'm going off on one. Enough.


[1] ... maybe. Some of it could be affected by optimisations that could
be made anywhere from the JVM to the hard disk controller....
 
A

Alex Hunsley

Matt said:
I'm curious on what you're basing this assertion. Even though the operation
is disk-bound, it seems to me that giving the disk scheduler multiple
requests may reduce latency and maintain saturation.

It may reduce latency, but you may end up with more seeks, and each seek
resulting in a small amount of bytes being accessed. It all of course
depends on what optimisations are happening unseen in your particular
system...
I see what you're getting at though. But even with that in mind, 10
threads wouldn't be helping much. 2 threads perhaps (if > 1 were
beneficial). I personally think 1 thread is best though, unless you have
particular knowledge about what is going on with disks etc...
 
A

Alex Hunsley

Alex said:
(piggy-backing on Remon's reply, as I can't see the original post...)



Don't do this. This is a classic example of thread overkill - there is
no benefit to using multiple threads to move a load of files. In fact,
as you have noticed, it only adds hassle - you have to worry about
concurrency issues.
What do you imagine that multiple threading will add to this task?

Little postscrot here - I actually thought, wrongly, that the OP was
talking about either copying files, or moving files to a different hard
disk. In other words, something that would involve reading and writing
lots of bytes for some files. But if it's a file move from one place to
another on the same disk/partition, most system don't actually move
data, they only change where the link to that data is in the file
system. (Like link/unlink in the *nix world.)
Anyway, still not convinced that > 1 thread would help much...
 
H

Homer

Here is the story:

Source and destination of each file is different in my case (or it can
be) and some are mapped drives (slow connection maybe). Some of those
files are huge (200Mb) and some very small (<5Kb). Unfortunately
delivering of small files is time sensitive. They come around 3:30PM
and need to be transferred for another process in less than two
minutes. Now imagine if I have one thread-program and have couple of
those big ones inside the queue and one small one behind all of them.

That's why I am interested in Multi-Threading in this case.
 
H

Homer

I should also add that while ago I used a similar strategy to write an
FTP Gateway. I know that using this method for FTP makes far more sense
than Hard Drive but I was wondering if I can use the same idea for this
case too. The FTP Gateway code was quite successful though that I was
not aware of concurrent package in that time.
 
A

Alex Hunsley

Homer said:
Here is the story:

Source and destination of each file is different in my case (or it can
be) and some are mapped drives (slow connection maybe). Some of those
files are huge (200Mb) and some very small (<5Kb). Unfortunately
delivering of small files is time sensitive. They come around 3:30PM
and need to be transferred for another process in less than two
minutes. Now imagine if I have one thread-program and have couple of
those big ones inside the queue and one small one behind all of them.

That's why I am interested in Multi-Threading in this case.

Ah, ok, this makes more sense.
So your requirement is that the system can be copying files happily, but
then notice a hi-priority file appears, and then pause the other
non-essential copying activity until the important stuff is done.

First idea that comes to mind:
Have two threads. The system can be copying normal files across from a
file queue in a 'main' thread. As soon as an important file appears, the
main thread pauses its activity (in mid file-copy, if need be)[1], and a
'priority' thread starts copying the important files across. When the
priority thread is finished, the 'main' thread can continue.

You could always have three threads - the extra one to be watching the
source folder, and controlling what the two copying threads get up to.
This would have the extra benefit of moving controlling logic out of a
file copier thread, and then your two copying threads ('main' and
'priority') could be instances of the same object, which copies files in
a file queue.

Of course, another option is just have one thread at any one time, and
give it the ability to remember where it was in the non-priority file
being copied. You could do this in a cheap and cheeky way by using
recursion (i.e. file copier calls itself when it notices a more
important file, and once done with that, it bottoms out and resumes the
original non-priority file). I don't think this is a good design though;
messy, with too many concerns all in one place.


[1] A file being paused in mid-copy may cause confusion - with partially
copied files existing and maybe messing things up. You may want to have
a policy of copying a file to a modified destination name, and only on
completion of the copy rename the file to its regular name.
For example, while a file 'data.txt' is being copied, it is copied to a
destination filename called 'data.txt-PARTIAL'. Upon copy completion,
the destination file is renamed by the program to 'data.txt'.
This way it's clear to humans (and to scripts, if need be) which are
non-complete files.
 
A

Alex Hunsley

Homer said:
I should also add that while ago I used a similar strategy to write an
FTP Gateway. I know that using this method for FTP makes far more sense
than Hard Drive but I was wondering if I can use the same idea for this
case too. The FTP Gateway code was quite successful though that I was
not aware of concurrent package in that time.

Yup, multithreaded strategy does make sense for an FTP/multiuser comms
application!
I wouldn't just throw lots of threads at your file copying problem - I
would use two or three, in a structured way, as described in t'other
message.
 
A

Alex Hunsley

Homer said:
I should also add that while ago I used a similar strategy to write an
FTP Gateway. I know that using this method for FTP makes far more sense

Btw Homer, the concurrent package is pretty handy, yes. Note that J2SE5
includes a java.util.concurrent package, which I think is based on Duog
Lea's package; if you're using J2SE5, just use java.util.concurrent.
See the note at the top of this page:
http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,212
Messages
2,571,101
Members
47,696
Latest member
ecomwebsdesign

Latest Threads

Top