rsync and network file transfer speeds

J

Joshua Jung

Hi, I appreciate the note to research rsync on my last post (thanks Rogan
and Suken!).

I've done some research on rsync and even read the doctoral thesis that
Tridgell wrote on his rsync algorithm. While knowledgeable about what the
rsync algorithm is in practice, I am unsure of whether it will work for
our application.

Our program concept involves regularly transferring huge numbers of files
from our client to our server for special processing and storage. This is
to be done regularly and we want the process to be transparent (at least
on the client). This would mean we do not want duplicate files or file
blocks on the server.

This is where rsync *could* be beneficial. Unfortunately, besides the
currently *unsupported* jarsync I have been unable to find any Java
implementations of rsync. Also my research is finding out that rsync was
originally designed for high-latency type applications (like dial-up
connections). Assuming our network is going to be low-latency (i.e.
broadband based), will rsync make any sense? It just seems to me that the
rolling signature algorithm is only good if the algorithm is faster than
the connection.

The short and sweet of my questions is this:

Assuming our transfer speeds are broadband level, will it be faster to
run the standard rsync algorithm or just do a quick check on the time
stamps of the files on client and server and just upload the entire file
(with zipping of course) if the time-stamps are different?

Any website links or data on the speed of rsync on current
machines/connections would be greatly appreciated. Also, if there is any
other option besides rsync, that would be sweet as well!

Josh <><

[P.S I'd love to test out rsync myself, but I'd like some more advice
before diving that direction :) ]
 
D

Dimitri Maziuk

Joshua Jung sez:
....
The short and sweet of my questions is this:

Assuming our transfer speeds are broadband level, will it be faster to
run the standard rsync algorithm or just do a quick check on the time
stamps of the files on client and server and just upload the entire file
(with zipping of course) if the time-stamps are different?

If you can guarantee that the clocks on both ends are always in sync
(ntp will do that, most of the time), and both ends are in the same
time (and DST) zone, yes: a size + timestamp check is going to be
faster. In general you cannot trust a random internet host's clock,
hence the clever algorithms.
Any website links or data on the speed of rsync on current
machines/connections would be greatly appreciated. Also, if there is any
other option besides rsync, that would be sweet as well!

We're using rsync routinely on a 100Mb/s LAN, with a couple of pretty
slow machines (e.g. a Sun Ultra 10). The only problem is that rsync
seems to be very sensitive to network glitches -- most protocols will
recover from 1-2 sec. loss of connectivity just fine, rsync usually
doesn't. (Not what you'd expect from something designed for dial-up
connections.)

Dima
 
J

Joshua Jung

Joshua Jung sez:
...

If you can guarantee that the clocks on both ends are always in sync
(ntp will do that, most of the time), and both ends are in the same
time (and DST) zone, yes: a size + timestamp check is going to be
faster. In general you cannot trust a random internet host's clock,
hence the clever algorithms.


We're using rsync routinely on a 100Mb/s LAN, with a couple of pretty
slow machines (e.g. a Sun Ultra 10). The only problem is that rsync
seems to be very sensitive to network glitches -- most protocols will
recover from 1-2 sec. loss of connectivity just fine, rsync usually
doesn't. (Not what you'd expect from something designed for dial-up
connections.)

Dima

Are there any other options out there besides the rsync algorithm? We've
noticed that a lot of backup companies are using a feature that backs up
only changes at the byte or block level to reduce overhead and were
curious if there are algorithms that could do this for file comparisons.
Obviously the easy way is to store two copies on the client machine and
diff them but that is quite intensive and there really isn't any added
benefit to having two files on the client machine!

Appreciate any help! Thanks so much.

Josh <><
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top