Efficient file downloading

Kyle Hunter · Feb 22, 2008

Hello,

I'm using open-uri to download files using a buffer. It seems very
inefficient in terms of resource usage (CPU is ~10-20% in usage).

If possible, I'd like some suggestions for downloading a file which
names the outputted file the same as the URL, and does not actually
write if the file comes out to a 404 (or some other exception hits).

Current code:
BUFFER_SIZE=4096
def download(url)
from = open(url)
if (buffer = from.read(BUFFER_SIZE))
puts "Downloading #{url}"
File.open(url.split('/').last, 'wb') do |file|
begin
file.write(buffer)
end while (buffer = from.read(BUFFER_SIZE))
end
end
end

Kyle Hunter · Feb 22, 2008

To clarify, I mean the file-name should be the same as it is on the web,
not the same as the URL.

James Tucker · Feb 22, 2008

Hello,

I'm using open-uri to download files using a buffer. It seems very
inefficient in terms of resource usage (CPU is ~10-20% in usage).

If possible, I'd like some suggestions for downloading a file which
names the outputted file the same as the URL, and does not actually
write if the file comes out to a 404 (or some other exception hits).

Current code:
BUFFER_SIZE=4096

Try making that a lot lot bigger.

Kyle Hunter · Feb 22, 2008

James said:
Try making that a lot lot bigger.

Doh! Thanks James. Brings it down to much more reasonable usage. I
totally overlooked that very small buffer size that was set - thanks.

fedzor · Feb 22, 2008

Hello,

I'm using open-uri to download files using a buffer. It seems very
inefficient in terms of resource usage (CPU is ~10-20% in usage).

If possible, I'd like some suggestions for downloading a file which
names the outputted file the same as the URL, and does not actually
write if the file comes out to a 404 (or some other exception hits).

Current code:
BUFFER_SIZE=4096
def download(url)
from = open(url)
if (buffer = from.read(BUFFER_SIZE))
puts "Downloading #{url}"
File.open(url.split('/').last, 'wb') do |file|
begin
file.write(buffer)
end while (buffer = from.read(BUFFER_SIZE))
end
end
end

$ sudo gem install snoopy
$ snoopy http://en.wikipedia.org/wiki/Main_Page
=> file Main_Page

Ta dah! there's a lot of magic behind it right now, and torrentz
don't work (fixed on my machine, need to release it). It does
segmented downloading, ideal for large files. For smaller ones, it
still works fine.

The problem with open-uri is this: it downloads the whole thing to
your tmp directory first, so using the BUFFER_SIZE thing won't
actually help.

snoopy won't not write the file if there's an error.

-------------------------------------------------------|
~ Ari
Some people want love
Others want money
Me... Well...
I just want this code to compile

Question: Downloading files with open(-uri)?	9	Dec 23, 2006
Dynamic block parsing + scrolling	0	May 30, 2024
Web Page Parsing/Downloading	1	Nov 22, 2013
Dynamic block parsing + scrolling	0	May 30, 2024
Downloading an MP3 from the internet	5	Jan 14, 2009
Downloading a file using ruby	4	Apr 7, 2009
downloading specific file	0	Jun 12, 2008
HTTP Downloader	9	Aug 17, 2010

Efficient file downloading

Kyle Hunter

Kyle Hunter

James Tucker

Kyle Hunter

fedzor

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads