L
Luke Burton
It seems that net/http's implementation is extremely inefficient when
it comes to dealing with large files.
I think this is something worth fixing in subsequent versions. It
shouldn't be as bad as it is. I would also appreciate any hints or
advice on working around the problem.
Below I have attached a test suite that illustrates the problem. I used
WEBrick as the server.
"Host: localhost, port: 12000, request_uri: /ten-meg.bin"
user system total real
TCPSocket 0.030000 0.150000 0.180000 ( 0.468867)
net/http 10.620000 8.630000 19.250000 ( 21.787785)
LB net/http 10.870000 8.900000 19.770000 ( 22.259448)
open-uri 16.400000 11.900000 28.300000 ( 39.834555)
As you can see, a raw TCPSocket is orders of magnitude faster than
net/http and friends. However, I'm using read_body and receiving the
data in chunks, and I would have expected much better performance as a
result. We're talking 20MB/s for TCPSocket versus 400KB/s for net/http.
What's happening here?
Any help appreciated.
Regards,
Luke.
#!/usr/bin/ruby
require 'net/http'
require 'open-uri'
require 'benchmark'
require 'WEBrick'
include WEBrick
uri = URI.parse("http://localhost:12000/ten-meg.bin")
sourceFolder = "/tmp/"
Kernel.system("dd if=/dev/random of=/tmp/ten-meg.bin bs=1024
count=10240")
port = 12000
server = HTTPServer.newPort => port, ocumentRoot => sourceFolder)
# trap the signal for shutdown
trap("INT"){ server.shutdown }
pid = Kernel.fork {
$stdout.reopen('/tmp/WEBrick.stdout')
$stderr.reopen('/tmp/WEBrick.stderr')
server.start
}
at_exit { Process.kill("INT", pid) }
Kernel.sleep 1
p "Host: #{uri.host}, port: #{uri.port}, request_uri:
#{uri.request_uri}"
Benchmark.bm(10) do |time|
out = File.new("/tmp/tcp.tar.bz2", "w")
time.report("TCPSocket") do
s = TCPSocket.open uri.host, uri.port
s.write "GET #{uri.request_uri} HTTP/1.0\r\nHost:
#{uri.host}\r\n\r\n"
temp = s.read.split("\r\n\r\n", 2).last
s.close
out.write(temp)
end
out.close
out = File.new("/tmp/net.tar.bz2", "w")
time.report("net/http") do
Net::HTTP.start uri.host, uri.port do |http|
http.request_get(uri.request_uri) do |response|
response.read_body do |segment|
out.write(segment)
end
end
end
end
out.close
out = File.new("/tmp/luke.out", "w")
time.report("LB net/http") do
http = Net::HTTP.new(uri.host, uri.port)
http.request_get(uri.path) { |response|
response.read_body { |segment|
out.write(segment)
}
}
end
out.close
out = File.new("/tmp/uri.tar.bz2", "w")
time.report("open-uri") do
uri.open do |x|
out.write(x.read)
end
end
out.close
end
it comes to dealing with large files.
I think this is something worth fixing in subsequent versions. It
shouldn't be as bad as it is. I would also appreciate any hints or
advice on working around the problem.
Below I have attached a test suite that illustrates the problem. I used
WEBrick as the server.
"Host: localhost, port: 12000, request_uri: /ten-meg.bin"
user system total real
TCPSocket 0.030000 0.150000 0.180000 ( 0.468867)
net/http 10.620000 8.630000 19.250000 ( 21.787785)
LB net/http 10.870000 8.900000 19.770000 ( 22.259448)
open-uri 16.400000 11.900000 28.300000 ( 39.834555)
As you can see, a raw TCPSocket is orders of magnitude faster than
net/http and friends. However, I'm using read_body and receiving the
data in chunks, and I would have expected much better performance as a
result. We're talking 20MB/s for TCPSocket versus 400KB/s for net/http.
What's happening here?
Any help appreciated.
Regards,
Luke.
#!/usr/bin/ruby
require 'net/http'
require 'open-uri'
require 'benchmark'
require 'WEBrick'
include WEBrick
uri = URI.parse("http://localhost:12000/ten-meg.bin")
sourceFolder = "/tmp/"
Kernel.system("dd if=/dev/random of=/tmp/ten-meg.bin bs=1024
count=10240")
port = 12000
server = HTTPServer.newPort => port, ocumentRoot => sourceFolder)
# trap the signal for shutdown
trap("INT"){ server.shutdown }
pid = Kernel.fork {
$stdout.reopen('/tmp/WEBrick.stdout')
$stderr.reopen('/tmp/WEBrick.stderr')
server.start
}
at_exit { Process.kill("INT", pid) }
Kernel.sleep 1
p "Host: #{uri.host}, port: #{uri.port}, request_uri:
#{uri.request_uri}"
Benchmark.bm(10) do |time|
out = File.new("/tmp/tcp.tar.bz2", "w")
time.report("TCPSocket") do
s = TCPSocket.open uri.host, uri.port
s.write "GET #{uri.request_uri} HTTP/1.0\r\nHost:
#{uri.host}\r\n\r\n"
temp = s.read.split("\r\n\r\n", 2).last
s.close
out.write(temp)
end
out.close
out = File.new("/tmp/net.tar.bz2", "w")
time.report("net/http") do
Net::HTTP.start uri.host, uri.port do |http|
http.request_get(uri.request_uri) do |response|
response.read_body do |segment|
out.write(segment)
end
end
end
end
out.close
out = File.new("/tmp/luke.out", "w")
time.report("LB net/http") do
http = Net::HTTP.new(uri.host, uri.port)
http.request_get(uri.path) { |response|
response.read_body { |segment|
out.write(segment)
}
}
end
out.close
out = File.new("/tmp/uri.tar.bz2", "w")
time.report("open-uri") do
uri.open do |x|
out.write(x.read)
end
end
out.close
end