Best strategy for overcoming excessive gethostbyname timeout.

R

r0g

Hi,

I'm writing a reliability monitoring app but I've run into a problem. I
was hoping to keep it very simple and single threaded at first but
that's looking unlikely now! The crux of it is this...

gethostbyname ignores setdefaulttimeout.

It seems gethostbyname asks the OS to resolve the address and the OS
uses it's own timeout value ( 25 seconds ) rather than the one provided
in setdefaulttimeout. 25 seconds of blocking is way too long for me, I
want the response within 5 seconds or not at all but I can see no
reasonable way to do this without messing with the OS which naturally I
am loathe to do!

The two ideas I've had so far are...

Implement a cache. For this to work I'd need to avoid issuing
gethostbyname until the cached address fails. Of course it will fail a
little bit further down i the calling code when the app tries to use it
and I'd then need it to refresh the cache and try again. That seems very
kludgey to me :/

A pure python DNS lookup. This seems better but is an unknown quantity.
How big a job is it to use non-blocking sockets to write a DNS lookup
function with a customisable timeout? A few lines? A few hundred? I'd
only need to resolve v4 addresses for the foreseeable.

Any comments on these strategies, or any suggestions of methods you
think might work better or be a lot easier to implement, warmly received.

Roger.
 
J

John Bokma

r0g said:
It seems gethostbyname asks the OS to resolve the address and the OS
uses it's own timeout value ( 25 seconds ) rather than the one provided
in setdefaulttimeout. 25 seconds of blocking is way too long for me, I
want the response within 5 seconds or not at all but I can see no
reasonable way to do this without messing with the OS which naturally I
am loathe to do!

use signal.alarm(time) to send SIGALRM to your process:
http://docs.python.org/library/signal.html#signal.alarm
See example at bottom.

John
 
G

Gabriel Genellina

gethostbyname ignores setdefaulttimeout.

How big a job is it to use non-blocking sockets to write a DNS lookup
function with a customisable timeout? A few lines? A few hundred? I'd
only need to resolve v4 addresses for the foreseeable.

This guy reports good results using GNU adns to perform asynchronous
queries:
http://www.catonmat.net/blog/asynchronous-dns-resolution/

Also, look for pydns. Don't be afraid of its age; it always worked fine
for me.
 
R

r0g

John said:
use signal.alarm(time) to send SIGALRM to your process:
http://docs.python.org/library/signal.html#signal.alarm
See example at bottom.

John


Ahh so close. I set the alarm for 3 seconds and it raises the exception,
but only after spending 25 seconds seemingly blocked in gethostbyname.

Here's a snippet, just in case I'm doing it wrong!...

def dns_timeout(a,b):
raise Exception("DNS timeout")

def send_one_ping(my_socket, dest_addr, ID):
signal.signal(signal.SIGALRM, dns_timeout)
signal.alarm(3)
try:
dest_addr = socket.gethostbyname(dest_addr)
except Exception, exc:
print "Exception caught:", exc
signal.alarm(0)


Oh well, even if it doesn't work in this case it's really useful to know
about the signal module, I'd never stumbled across it til now, thanks!

Roger.
 
M

MRAB

r0g said:
Hi,

I'm writing a reliability monitoring app but I've run into a problem. I
was hoping to keep it very simple and single threaded at first but
that's looking unlikely now! The crux of it is this...

gethostbyname ignores setdefaulttimeout.

It seems gethostbyname asks the OS to resolve the address and the OS
uses it's own timeout value ( 25 seconds ) rather than the one provided
in setdefaulttimeout. 25 seconds of blocking is way too long for me, I
want the response within 5 seconds or not at all but I can see no
reasonable way to do this without messing with the OS which naturally I
am loathe to do!

The two ideas I've had so far are...

Implement a cache. For this to work I'd need to avoid issuing
gethostbyname until the cached address fails. Of course it will fail a
little bit further down i the calling code when the app tries to use it
and I'd then need it to refresh the cache and try again. That seems very
kludgey to me :/

A pure python DNS lookup. This seems better but is an unknown quantity.
How big a job is it to use non-blocking sockets to write a DNS lookup
function with a customisable timeout? A few lines? A few hundred? I'd
only need to resolve v4 addresses for the foreseeable.

Any comments on these strategies, or any suggestions of methods you
think might work better or be a lot easier to implement, warmly received.
How about something like this:

def get_host_by_name(hostname, timeout):
def proc():
try:
q.put(socket.gethostbyname(hostname))
except (socket.gaierror, socket.timeout):
pass

q = Queue.Queue()
t = threading.Thread(target=proc)
t.daemon = True
t.start()
try:
return q.get(True, timeout)
except Queue.Empty:
raise socket.timeout
 
R

r0g

Gabriel said:
This guy reports good results using GNU adns to perform asynchronous
queries:
http://www.catonmat.net/blog/asynchronous-dns-resolution/

Also, look for pydns. Don't be afraid of its age; it always worked fine
for me.

Thanks Gabriel, that worked a treat when combined with John's SIGALARM
solution :)

For posterity here's the code...

import signal, socket
try:
import DNS
except:
DNS = False


def DNSResolve( s ):
if DNS:
DNS.ParseResolvConf() # Windows?
r = DNS.DnsRequest(name=s,qtype='A')
a = r.req()
return a.answers[0]['data']
else:
return socket.gethostbyname( s )


def dns_timeout(a,b):
raise Exception("Oh Noes! a DNS lookup timeout!")


def canIHasIP( domain_name, timeout=3 ):
signal.signal(signal.SIGALRM, dns_timeout)
signal.alarm( timeout )
try:
ip = DNSResolve( domain_name )
except Exception, exc:
print exc
return False
signal.alarm(0)
return ip

usage: canIHasIP( domain_name, timeout_in_seconds) i.e.
canIHasIP("google.com",5)


Thanks guys! :D

Roger.
 
R

r0g

<snip>

As usual, everything is working beautifully until I try to make it work
with windows!

Turns out signals.SIGALRM is Unix only and I want to run on both
platforms so I have done as the docs suggested and tried to convert the
code to use threading.Timer to trigger an exception if the DNS lookup is
taking too long.

It's behaviour seems quite odd.

The exception is raised on schedule after a couple of seconds and
appears in the terminal (despite me not printing it to the terminal!).
There's then a pause of nearly 30 seconds before the try/except catches
it and proceeds as normal. I can see no reason for a delay between the
error being raised and caught, unless whatever it's trying to interrupt
is blocking, in which case it shouldn't have been interrupted by the
signal either no? So anyway my questions are...

Why the error message and can it be suppressed?
Why the 30 second delay, what's it doing during that time?

Code...

timeout_timer = threading.Timer(DNS_TIMEOUT, dns_timeout)
timeout_timer.start()
try:
dest_addr = DNSResolve( dest_addr )
except Exception:
print "Caught an exception, returning False"
try:
timeout_timer.cancel()
except:
None
return False

Other code as before.

On the face of it it looks like it might be dwelling in the .req()
method of the pydns library but it can get confusing quickly with
threading so I'm far from sure.

On top of that I'm not sure if my threads are dying and being cleared up
properly. Each Exception I raise shows the Thread number getting higher
e.g. "Exception in thread Thread-8:". Do I need to be doing anything to
clear these up or does Python just use a monotonic counter for thread names?

Is there a less messy way to use signals on both Unix and Windows? The
docs seem to suggest they've kludged the signals module into a workable
form most platforms but on XP I get 'module' object has no attribute
'SIGALRM'

:(

Roger.
 
R

r0g

r0g said:
<snip>

As usual, everything is working beautifully until I try to make it work
with windows!

Turns out signals.SIGALRM is Unix only and I want to run on both
platforms so I have done as the docs suggested and tried to convert the
code to use threading.Timer to trigger an exception if the DNS lookup is
taking too long.

Actually none of that was necessary in the end. Digging into the pydns
source to debug the 30 second pause I happened across the timeout parameter!
""

Phew, that simplifies thing a lot! :)

Roger.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,840
Latest member
BrendanG78

Latest Threads

Top