ruby from command line timing out?

  • Thread starter Jason N.Perkins
  • Start date
J

Jason N.Perkins

I'm running a script from the command line that's going to take a
couple of hours to complete. Between 15 and 20 minutes into its run,
the script throws an execution expired (Timeout::Error). Is there an
environment variable that I should be looking at modifying? The error
message in its entirety is:

/usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired
(Timeout::Error)
from ./spider.rb:6334:in `join'
from ./spider.rb:6334
from ./spider.rb:6334:in `each'
from ./spider.rb:6334
 
F

Francis Hwang

Is it safe to guess, based on the name of the script, that it spiders
web pages? If that's the case, Timeout::Error s are going to happen
quite frequently as a particular web page loads too slowly.

I'm running a script from the command line that's going to take a
couple of hours to complete. Between 15 and 20 minutes into its run,
the script throws an execution expired (Timeout::Error). Is there an
environment variable that I should be looking at modifying? The error
message in its entirety is:

/usr/local/lib/ruby/1.8/timeout.rb:42:in `new': execution expired
(Timeout::Error)
from ./spider.rb:6334:in `join'
from ./spider.rb:6334
from ./spider.rb:6334:in `each'
from ./spider.rb:6334

Francis Hwang
http://fhwang.net/
 
J

Jason N.Perkins

Is it safe to guess, based on the name of the script, that it spiders
web pages? If that's the case, Timeout::Error s are going to happen
quite frequently as a particular web page loads too slowly.

I'm catching those errors with no problem with a 'rescue'. This seems
to be specific to the script itself.
 
J

Jason N.Perkins

Can you post the code?

Sure. The blogs variable is an array of the urls of blogs - I intend to
eventually have these urls stored in MySQL, but for now an array works.
I emptied that array so that those sites that I have in it aren't
getting hit by too many people trying to help out. The threading is
derived from a sample in "Programming Ruby." I'd love any additional
feedback outside of dealing with the timeout issue.


#! /usr/local/bin/ruby -w

require 'open-uri'
require 'thread'

blogs = [ ]

buffer=Queue.new

# load the blogs into the queue
blogs.each do |blog|
buffer.enq( blog )
end

consumers = (1..150).map do |i|
Thread.new("consumer #{i}") do |name|
begin
blog = buffer.deq
open( blog ) do |content|
begin
metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
metas.each do |current_meta|
current_meta = current_meta.to_s

if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
name = $1
current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
content = $1

case name
when "geo.position"
print "#{blog} \t #{content} \n"

when "ICBM"
print "#{blog} \t #{content} \n"
end
end
end
rescue Exception
p "#{blog}: $! \n"
end
end
end until buffer == :END_OF_WORK
end
end

begin
consumers.size.times{ buffer.enq:)END_OF_WORK) }
consumers.each{|th| th.join}
rescue Exception
print $!
end
 
F

Francis Hwang

Jason,

Is the line 6334 that shows up in the traceback this line:
consumers.each{|th| th.join}

And one tip, which may not have anything to do with this problem but
might make your code easier to understand and/or debug: Since threading
is so bloody difficult, I try to make it affect as little of the
program as possible. In a case like your code, for example, I would've
let the threaded part simply handle the loading of the web pages, but
let the parsing happen afterward when all the threads have been joined
again. This is how FeedBlender (http://feedblender.rubyforge.org/) does
it, so that way if there's a bug I can figure out if it's because of
the threading or not.




Can you post the code?

Sure. The blogs variable is an array of the urls of blogs - I intend
to eventually have these urls stored in MySQL, but for now an array
works. I emptied that array so that those sites that I have in it
aren't getting hit by too many people trying to help out. The
threading is derived from a sample in "Programming Ruby." I'd love any
additional feedback outside of dealing with the timeout issue.


#! /usr/local/bin/ruby -w

require 'open-uri'
require 'thread'

blogs = [ ]

buffer=Queue.new

# load the blogs into the queue
blogs.each do |blog|
buffer.enq( blog )
end

consumers = (1..150).map do |i|
Thread.new("consumer #{i}") do |name|
begin
blog = buffer.deq
open( blog ) do |content|
begin
metas = content.read.scan( /<meta([^(>]*)>/m ).uniq
metas.each do |current_meta|
current_meta = current_meta.to_s

if current_meta =~ /\s+name\s*=\s*[\"']([^\"']+)[\"']/
name = $1
current_meta =~ /\s+content\s*=\s*[\"']([^\"']+)[\"']/
content = $1

case name
when "geo.position"
print "#{blog} \t #{content} \n"

when "ICBM"
print "#{blog} \t #{content} \n"
end
end
end
rescue Exception
p "#{blog}: $! \n"
end
end
end until buffer == :END_OF_WORK
end
end

begin
consumers.size.times{ buffer.enq:)END_OF_WORK) }
consumers.each{|th| th.join}
rescue Exception
print $!
end

Francis Hwang
http://fhwang.net/
 
C

Carlos

begin
consumers.size.times{ buffer.enq:)END_OF_WORK) }
consumers.each{|th| th.join}
rescue Exception
print $!
end

I think, when the thread that is being "joined" raises timeout error, the
program will finish and the other threads won't be joined. Maybe you should
put the begin...rescue around the join (inside the each).

Hope this helps. Good luck.
 
J

Jason N.Perkins

Jason,

Is the line 6334 that shows up in the traceback this line:

Yeah, that's the line that's timing out and why I was wondering if
there's a global timeout value for the script that I can either modify
up or turn off completely.
And one tip, which may not have anything to do with this problem but
might make your code easier to understand and/or debug: Since
threading is so bloody difficult, I try to make it affect as little of
the program as possible. In a case like your code, for example, I
would've let the threaded part simply handle the loading of the web
pages, but let the parsing happen afterward when all the threads have
been joined again. This is how FeedBlender
(http://feedblender.rubyforge.org/) does it, so that way if there's a
bug I can figure out if it's because of the threading or not.

OK, I'll give that a try. Thanks, Francis!
 
E

Eric Hodel

--Apple-Mail-6-372814925
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed

Yeah, that's the line that's timing out and why I was wondering if
there's a global timeout value for the script that I can either modify
up or turn off completely.

Timeout::Error comes from timeout.rb.

Your Timeout::Error probably comes out of HTTP, open-uri doesn't
require timeout, and has no timeout blocks.

Try Thread.abort_on_exception = true at the top of your script, and
remove the begin/end block inside the thread.

--
Eric Hodel - (e-mail address removed) - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04

--Apple-Mail-6-372814925
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFB4siuMypVHHlsnwQRAh7nAJ91O4t3wO1AsUTonGqbbu6sO1zGkACcCRks
YIFxph39vYuLQLmngL+1Pb4=
=8IY4
-----END PGP SIGNATURE-----

--Apple-Mail-6-372814925--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top