HTTPSConnection script fails, but only on some servers (long)

P

Paul Winkler

This is driving me up the wall... any help would be MUCH appreciated.
I have a module that I've whittled down into a 65-line script in
an attempt to isolate the cause of the problem.

(Real domain names have been removed in everything below.)

SYNOPSIS:

I have 2 target servers, at https://A.com and https://B.com.
I have 2 clients, wget and my python script.
Both clients are sending GET requests with exactly the
same urls, parameters, and auth info.

wget works fine with both servers.
The python script works with server A, but NOT with server B.
On Server B, it provoked a "Bad Gateway" error from Apache.
In other words, the problem seems to depend on both the client
and the server. Joy.

Logs on server B show malformed URLs ONLY when the client
is my python script, which suggests the script is broken...
but logs on server A show no such problem, which suggests
the problem is elsewhere.

DETAILS

Note, the module was originally written for the express
purpose of working with B.com; A.com was added as a point of reference
to convince myself that the script was not totally insane.
Likewise, wget was tried when I wanted to see if it might be
a client problem.

Note the servers are running different software and return different
headers. wget -S shows this when it (successfully) hits url A:

1 HTTP/1.1 200 OK
2 Date: Tue, 12 Apr 2005 05:23:54 GMT
3 Server: Zope/(unreleased version, python 2.3.3, linux2) ZServer/1.1
4 Content-Length: 37471
5 Etag:
6 Content-Type: text/html;charset=iso-8859-1
7 X-Cache: MISS from XXX.com
8 Keep-Alive: timeout=15, max=100
9 Connection: Keep-Alive

.... and this when it (successfully) hits url B:

1 HTTP/1.1 200 OK
2 Date: Tue, 12 Apr 2005 04:51:30 GMT
3 Server: Jetty/4.2.9 (Linux/2.4.26-g2-r5-cti i386 java/1.4.2_03)
4 Via: 1.0 XXX.com
5 Content-Length: 0
6 Connection: close
7 Content-Type: text/plain

Only things notable to me, apart from the servers are the "Via:" and
"Connection:" headers. Also the "Content-Length: 0" from B is odd, but
that doesn't seem to be a problem when the client is wget.

Sadly I don't grok HTTP well enough to spot anything really
suspicious.

The apache ssl request log on server B is very interesting.
When my script hits it, the request logged is like:

A.com - - [01/Apr/2005:17:04:46 -0500] "GET
https://A.com/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.1" 502 351

.... which apart from the 502, I thought reasonable until I realized
there's
not supposed to be a protocol or domain in there at all. So this is
clearly
wrong. When the client is wget, the log shows something more sensible
like:

A.com - - [01/Apr/2005:17:11:04 -0500] "GET
/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.0" 200 -

.... which looks identical except for not including the spurious
protocol and domain, and the response looks as expected (200 with size
0).

So, that log appears to be strong evidence that the problem is in my
client
script, right? The failing request is coming in with some bad crap in
the path, which Jboss can't handle so it barfs and Apache responds with

Bad Gateway. Right?

So why does the same exact client code work when hitting server B??
No extra gunk in the logs there. AFAICT there is nothing in the script
that could lead to such an odd request only on server A.


THE SCRIPT

#!/usr/bin/python2.3

from httplib import HTTPSConnection
from urllib import urlencode
import re
import base64

url_re = re.compile(r'^([a-z]+)://([A-Za-z0-9._-]+):)[0-9]+)?')

target_urls = {
'B': 'https://B/SkinServlet/zopeskin',
'A': 'https://A/zope/manage_main',
}

auth_info= {'B': ('userXXX', 'passXXX'),
'A': ('userXXX', 'passXXX'),
}

def doRequest(target, **kw):
"""Provide a trivial interface for doing remote calls.
Keyword args are passed as query parameters.
"""
url = target_urls[target]
user, passwd = auth_info[target]
proto,host,port=url_re.match(url).groups()
if port:
port = int(port[1:]) # remove the ':' ...
else:
port = 443
creds = base64.encodestring("%s:%s" % (user, passwd))
headers = {"Authorization": "Basic %s" % creds }
params = urlencode(kw).strip()
if params:
url = '%s?%s' % (url, params)
body = None # only needed for POST
args =('GET', url, body, headers)
print "ARGS: %s" % str(args)
conn = HTTPSConnection(host)
conn.request(*args)
response = conn.getresponse()
data = response.read()
if response.status >= 300:
print
msg = '%i ERROR reported by remote system %s\n' %
(response.status,
url)
msg += data
raise IOError, msg
print "OK!"
return data

if __name__ == '__main__':
print "attempting to connect..."
result1 = doRequest('A', skey='id', rkey='id')
result2 = doRequest('B', action='updateSkinId',
skinId='406', facilityId='1466')
print "done!"


# EOF


So... what the heck is wrong here?

at-wits-end-ly y'rs,

Paul Winkler
 
S

Steve Holden

Paul said:
This is driving me up the wall... any help would be MUCH appreciated.
I have a module that I've whittled down into a 65-line script in
an attempt to isolate the cause of the problem.

(Real domain names have been removed in everything below.)

SYNOPSIS:

I have 2 target servers, at https://A.com and https://B.com.
I have 2 clients, wget and my python script.
Both clients are sending GET requests with exactly the
same urls, parameters, and auth info.

wget works fine with both servers.
The python script works with server A, but NOT with server B.
On Server B, it provoked a "Bad Gateway" error from Apache.
In other words, the problem seems to depend on both the client
and the server. Joy.

Logs on server B show malformed URLs ONLY when the client
is my python script, which suggests the script is broken...
but logs on server A show no such problem, which suggests
the problem is elsewhere.

DETAILS

Note, the module was originally written for the express
purpose of working with B.com; A.com was added as a point of reference
to convince myself that the script was not totally insane.
Likewise, wget was tried when I wanted to see if it might be
a client problem.

Note the servers are running different software and return different
headers. wget -S shows this when it (successfully) hits url A:

1 HTTP/1.1 200 OK
2 Date: Tue, 12 Apr 2005 05:23:54 GMT
3 Server: Zope/(unreleased version, python 2.3.3, linux2) ZServer/1.1
4 Content-Length: 37471
5 Etag:
6 Content-Type: text/html;charset=iso-8859-1
7 X-Cache: MISS from XXX.com
8 Keep-Alive: timeout=15, max=100
9 Connection: Keep-Alive

... and this when it (successfully) hits url B:

1 HTTP/1.1 200 OK
2 Date: Tue, 12 Apr 2005 04:51:30 GMT
3 Server: Jetty/4.2.9 (Linux/2.4.26-g2-r5-cti i386 java/1.4.2_03)
4 Via: 1.0 XXX.com
5 Content-Length: 0
6 Connection: close
7 Content-Type: text/plain

Only things notable to me, apart from the servers are the "Via:" and
"Connection:" headers. Also the "Content-Length: 0" from B is odd, but
that doesn't seem to be a problem when the client is wget.

Sadly I don't grok HTTP well enough to spot anything really
suspicious.

The apache ssl request log on server B is very interesting.
When my script hits it, the request logged is like:

A.com - - [01/Apr/2005:17:04:46 -0500] "GET
https://A.com/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.1" 502 351

... which apart from the 502, I thought reasonable until I realized
there's
not supposed to be a protocol or domain in there at all. So this is
clearly
wrong. When the client is wget, the log shows something more sensible
like:

A.com - - [01/Apr/2005:17:11:04 -0500] "GET
/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.0" 200 -

... which looks identical except for not including the spurious
protocol and domain, and the response looks as expected (200 with size
0).

So, that log appears to be strong evidence that the problem is in my
client
script, right? The failing request is coming in with some bad crap in
the path, which Jboss can't handle so it barfs and Apache responds with

Bad Gateway. Right?

So why does the same exact client code work when hitting server B??
No extra gunk in the logs there. AFAICT there is nothing in the script
that could lead to such an odd request only on server A.


THE SCRIPT

#!/usr/bin/python2.3

from httplib import HTTPSConnection
from urllib import urlencode
import re
import base64

url_re = re.compile(r'^([a-z]+)://([A-Za-z0-9._-]+):)[0-9]+)?')

target_urls = {
'B': 'https://B/SkinServlet/zopeskin',
'A': 'https://A/zope/manage_main',
}

auth_info= {'B': ('userXXX', 'passXXX'),
'A': ('userXXX', 'passXXX'),
}

def doRequest(target, **kw):
"""Provide a trivial interface for doing remote calls.
Keyword args are passed as query parameters.
"""
url = target_urls[target]
user, passwd = auth_info[target]
proto,host,port=url_re.match(url).groups()
if port:
port = int(port[1:]) # remove the ':' ...
else:
port = 443
creds = base64.encodestring("%s:%s" % (user, passwd))
headers = {"Authorization": "Basic %s" % creds }
params = urlencode(kw).strip()
if params:
url = '%s?%s' % (url, params)
body = None # only needed for POST
args =('GET', url, body, headers)
print "ARGS: %s" % str(args)
conn = HTTPSConnection(host)
conn.request(*args)
response = conn.getresponse()
data = response.read()
if response.status >= 300:
print
msg = '%i ERROR reported by remote system %s\n' %
(response.status,
url)
msg += data
raise IOError, msg
print "OK!"
return data

if __name__ == '__main__':
print "attempting to connect..."
result1 = doRequest('A', skey='id', rkey='id')
result2 = doRequest('B', action='updateSkinId',
skinId='406', facilityId='1466')
print "done!"


# EOF


So... what the heck is wrong here?

at-wits-end-ly y'rs,

Paul Winkler
Paul:

I don't claim to have analyzed exactly what's going on here, but the
most significant difference between the two is that you are accessing
site B using HTTP 1.1 via an HTTP 1.0 proxy (as indicated byt he "Via:"
header).

Whether this is a clue or a red herring time alone will tell.

It's possible that wget and your client code aren't using the same proxy
settings, for example.

regards
Steve
 
A

andreas

Well HTTPSConnection does not support proxies. (HTTP/CONNECT + switch to HTTPS)

And it hasn't ever. Although the code seems to make sense there is
no support for handling that switch. Probably a good thing to complain
about (file a new bug report).

In the meantime you should take a look a cURL and pycurl, which do support
all kind of more extreme HTTP (FTP, etc.) handling, like using https over
an proxy.

Andreas

Paul said:
This is driving me up the wall... any help would be MUCH appreciated.
I have a module that I've whittled down into a 65-line script in
an attempt to isolate the cause of the problem.

(Real domain names have been removed in everything below.)

SYNOPSIS:

I have 2 target servers, at https://A.com and https://B.com.
I have 2 clients, wget and my python script.
Both clients are sending GET requests with exactly the
same urls, parameters, and auth info.

wget works fine with both servers.
The python script works with server A, but NOT with server B.
On Server B, it provoked a "Bad Gateway" error from Apache.
In other words, the problem seems to depend on both the client
and the server. Joy.

Logs on server B show malformed URLs ONLY when the client
is my python script, which suggests the script is broken...
but logs on server A show no such problem, which suggests
the problem is elsewhere.

DETAILS

Note, the module was originally written for the express
purpose of working with B.com; A.com was added as a point of reference
to convince myself that the script was not totally insane.
Likewise, wget was tried when I wanted to see if it might be
a client problem.

Note the servers are running different software and return different
headers. wget -S shows this when it (successfully) hits url A:

1 HTTP/1.1 200 OK
2 Date: Tue, 12 Apr 2005 05:23:54 GMT
3 Server: Zope/(unreleased version, python 2.3.3, linux2) ZServer/1.1
4 Content-Length: 37471
5 Etag:
6 Content-Type: text/html;charset=iso-8859-1
7 X-Cache: MISS from XXX.com
8 Keep-Alive: timeout=15, max=100
9 Connection: Keep-Alive

... and this when it (successfully) hits url B:

1 HTTP/1.1 200 OK
2 Date: Tue, 12 Apr 2005 04:51:30 GMT
3 Server: Jetty/4.2.9 (Linux/2.4.26-g2-r5-cti i386 java/1.4.2_03)
4 Via: 1.0 XXX.com
5 Content-Length: 0
6 Connection: close
7 Content-Type: text/plain

Only things notable to me, apart from the servers are the "Via:" and
"Connection:" headers. Also the "Content-Length: 0" from B is odd, but
that doesn't seem to be a problem when the client is wget.

Sadly I don't grok HTTP well enough to spot anything really
suspicious.

The apache ssl request log on server B is very interesting.
When my script hits it, the request logged is like:

A.com - - [01/Apr/2005:17:04:46 -0500] "GET
https://A.com/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.1" 502 351

... which apart from the 502, I thought reasonable until I realized
there's
not supposed to be a protocol or domain in there at all. So this is
clearly
wrong. When the client is wget, the log shows something more sensible
like:

A.com - - [01/Apr/2005:17:11:04 -0500] "GET
/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.0" 200 -

... which looks identical except for not including the spurious
protocol and domain, and the response looks as expected (200 with size
0).

So, that log appears to be strong evidence that the problem is in my
client
script, right? The failing request is coming in with some bad crap in
the path, which Jboss can't handle so it barfs and Apache responds with

Bad Gateway. Right?

So why does the same exact client code work when hitting server B??
No extra gunk in the logs there. AFAICT there is nothing in the script
that could lead to such an odd request only on server A.


THE SCRIPT

#!/usr/bin/python2.3

from httplib import HTTPSConnection
from urllib import urlencode
import re
import base64

url_re = re.compile(r'^([a-z]+)://([A-Za-z0-9._-]+):)[0-9]+)?')

target_urls = {
'B': 'https://B/SkinServlet/zopeskin',
'A': 'https://A/zope/manage_main',
}

auth_info= {'B': ('userXXX', 'passXXX'),
'A': ('userXXX', 'passXXX'),
}

def doRequest(target, **kw):
"""Provide a trivial interface for doing remote calls.
Keyword args are passed as query parameters.
"""
url = target_urls[target]
user, passwd = auth_info[target]
proto,host,port=url_re.match(url).groups()
if port:
port = int(port[1:]) # remove the ':' ...
else:
port = 443
creds = base64.encodestring("%s:%s" % (user, passwd))
headers = {"Authorization": "Basic %s" % creds }
params = urlencode(kw).strip()
if params:
url = '%s?%s' % (url, params)
body = None # only needed for POST
args =('GET', url, body, headers)
print "ARGS: %s" % str(args)
conn = HTTPSConnection(host)
conn.request(*args)
response = conn.getresponse()
data = response.read()
if response.status >= 300:
print
msg = '%i ERROR reported by remote system %s\n' %
(response.status,
url)
msg += data
raise IOError, msg
print "OK!"
return data

if __name__ == '__main__':
print "attempting to connect..."
result1 = doRequest('A', skey='id', rkey='id')
result2 = doRequest('B', action='updateSkinId',
skinId='406', facilityId='1466')
print "done!"


# EOF


So... what the heck is wrong here?

at-wits-end-ly y'rs,

Paul Winkler
Paul:

I don't claim to have analyzed exactly what's going on here, but the
most significant difference between the two is that you are accessing
site B using HTTP 1.1 via an HTTP 1.0 proxy (as indicated byt he "Via:"
header).

Whether this is a clue or a red herring time alone will tell.

It's possible that wget and your client code aren't using the same proxy
settings, for example.

regards
Steve
 
P

Paul Winkler

Thanks for the replies, Steve and Andreas! I will check out pycurl,
thanks very much for the tip.

Meanwhile, I'm trying to prepare a bug report re. httplib and get as
much information as possible.

Something I neglected to mention: when the script hits the problematic
server, it always takes about 3 minutes to get the Bad Gateway
response. Don't know if that's indicative of anything.

I added a bunch of blather to httplib.py to see at what point things
are waiting, or if it was stuck in a loop or what. The result is pretty
clear: we get as far as this point in SSLFile:

def _read(self):
buf = ''
# put in a loop so that we retry on transient errors
while True:
try:
buf = self._ssl.read(self._bufsize)

.... at which point we simply wait for the server for three minutes,
Then a response finally comes back, no exceptions are caught or raised
within _read(), and finally _read() returns buf. I can't easily trace
any deeper because self._ssl apparently comes from _ssl.so and I don't
fancy hacking at the C code.

Do these observations seem consistent with the hypothesis that
HTTPSConnection is failing to handle the HTTP 1.0 proxy?

I will also see what else I can find out from the admin. Maybe there's
more useful info in the logs somewhere. Unfortunately IIRC our jboss
log is always clogged with a few zillion irrelevant messages ... that
should be fun.

-PW
 
P

pyguy2

I have a couple of recipes at the python cookbook site, that allows
python to do proxy auth and ssl. The easiest one is:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/301740

john

Well HTTPSConnection does not support proxies. (HTTP/CONNECT + switch to HTTPS)

And it hasn't ever. Although the code seems to make sense there is
no support for handling that switch. Probably a good thing to complain
about (file a new bug report).

In the meantime you should take a look a cURL and pycurl, which do support
all kind of more extreme HTTP (FTP, etc.) handling, like using https over
an proxy.

Andreas

Paul said:
This is driving me up the wall... any help would be MUCH appreciated.
I have a module that I've whittled down into a 65-line script in
an attempt to isolate the cause of the problem.

(Real domain names have been removed in everything below.)

SYNOPSIS:

I have 2 target servers, at https://A.com and https://B.com.
I have 2 clients, wget and my python script.
Both clients are sending GET requests with exactly the
same urls, parameters, and auth info.

wget works fine with both servers.
The python script works with server A, but NOT with server B.
On Server B, it provoked a "Bad Gateway" error from Apache.
In other words, the problem seems to depend on both the client
and the server. Joy.

Logs on server B show malformed URLs ONLY when the client
is my python script, which suggests the script is broken...
but logs on server A show no such problem, which suggests
the problem is elsewhere.

DETAILS

Note, the module was originally written for the express
purpose of working with B.com; A.com was added as a point of reference
to convince myself that the script was not totally insane.
Likewise, wget was tried when I wanted to see if it might be
a client problem.

Note the servers are running different software and return different
headers. wget -S shows this when it (successfully) hits url A:

1 HTTP/1.1 200 OK
2 Date: Tue, 12 Apr 2005 05:23:54 GMT
3 Server: Zope/(unreleased version, python 2.3.3, linux2) ZServer/1.1
4 Content-Length: 37471
5 Etag:
6 Content-Type: text/html;charset=iso-8859-1
7 X-Cache: MISS from XXX.com
8 Keep-Alive: timeout=15, max=100
9 Connection: Keep-Alive

... and this when it (successfully) hits url B:

1 HTTP/1.1 200 OK
2 Date: Tue, 12 Apr 2005 04:51:30 GMT
3 Server: Jetty/4.2.9 (Linux/2.4.26-g2-r5-cti i386 java/1.4.2_03)
4 Via: 1.0 XXX.com
5 Content-Length: 0
6 Connection: close
7 Content-Type: text/plain

Only things notable to me, apart from the servers are the "Via:" and
"Connection:" headers. Also the "Content-Length: 0" from B is odd, but
that doesn't seem to be a problem when the client is wget.

Sadly I don't grok HTTP well enough to spot anything really
suspicious.

The apache ssl request log on server B is very interesting.
When my script hits it, the request logged is like:

A.com - - [01/Apr/2005:17:04:46 -0500] "GET
https://A.com/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.1" 502 351

... which apart from the 502, I thought reasonable until I realized
there's
not supposed to be a protocol or domain in there at all. So this is
clearly
wrong. When the client is wget, the log shows something more sensible
like:

A.com - - [01/Apr/2005:17:11:04 -0500] "GET
/SkinServlet/zopeskin?action=updateSkinId&facilityId=1466&skinId=406
HTTP/1.0" 200 -

... which looks identical except for not including the spurious
protocol and domain, and the response looks as expected (200 with size
0).

So, that log appears to be strong evidence that the problem is in my
client
script, right? The failing request is coming in with some bad crap in
the path, which Jboss can't handle so it barfs and Apache responds with

Bad Gateway. Right?

So why does the same exact client code work when hitting server B??
No extra gunk in the logs there. AFAICT there is nothing in the script
that could lead to such an odd request only on server A.


THE SCRIPT

#!/usr/bin/python2.3

from httplib import HTTPSConnection
from urllib import urlencode
import re
import base64

url_re = re.compile(r'^([a-z]+)://([A-Za-z0-9._-]+):)[0-9]+)?')

target_urls = {
'B': 'https://B/SkinServlet/zopeskin',
'A': 'https://A/zope/manage_main',
}

auth_info= {'B': ('userXXX', 'passXXX'),
'A': ('userXXX', 'passXXX'),
}

def doRequest(target, **kw):
"""Provide a trivial interface for doing remote calls.
Keyword args are passed as query parameters.
"""
url = target_urls[target]
user, passwd = auth_info[target]
proto,host,port=url_re.match(url).groups()
if port:
port = int(port[1:]) # remove the ':' ...
else:
port = 443
creds = base64.encodestring("%s:%s" % (user, passwd))
headers = {"Authorization": "Basic %s" % creds }
params = urlencode(kw).strip()
if params:
url = '%s?%s' % (url, params)
body = None # only needed for POST
args =('GET', url, body, headers)
print "ARGS: %s" % str(args)
conn = HTTPSConnection(host)
conn.request(*args)
response = conn.getresponse()
data = response.read()
if response.status >= 300:
print
msg = '%i ERROR reported by remote system %s\n' %
(response.status,
url)
msg += data
raise IOError, msg
print "OK!"
return data

if __name__ == '__main__':
print "attempting to connect..."
result1 = doRequest('A', skey='id', rkey='id')
result2 = doRequest('B', action='updateSkinId',
skinId='406', facilityId='1466')
print "done!"


# EOF


So... what the heck is wrong here?

at-wits-end-ly y'rs,

Paul Winkler
Paul:

I don't claim to have analyzed exactly what's going on here, but the
most significant difference between the two is that you are accessing
site B using HTTP 1.1 via an HTTP 1.0 proxy (as indicated byt he "Via:"
header).

Whether this is a clue or a red herring time alone will tell.

It's possible that wget and your client code aren't using the same proxy
settings, for example.

regards
Steve
 
P

Paul Winkler

I have a couple of recipes at the python cookbook site, that allows
python to do proxy auth and ssl. The easiest one is:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/301740

Thanks for that John!
I gave it a whirl, changed the user, passwd, host, and phost and gave
it a run.
It instantly barfs with this:

Traceback (most recent call last):
File "testYetAnotherHttpsClient.py", line 25, in ?
ssl = socket.ssl(proxy, None, None)
File "/usr/lib/python2.3/socket.py", line 73, in ssl
return _realssl(sock, keyfile, certfile)
socket.sslerror: (8, 'EOF occurred in violation of protocol')

Hmm. On reflection, I don't think the problem solved by your script is
the
same as mine. As I understand it, your script connects to an
SSL-protected
server on port 443 by going through a plain HTTP proxy on port 80?
That's not the case for me. The server on port 80 is behind the
server on port 443.
 
P

Paul Winkler

In the meantime you should take a look a cURL and pycurl, which do support
all kind of more extreme HTTP (FTP, etc.) handling, like using https over
an proxy.

Well, I got a pycurl solution working very nicely in twenty minutes,
including the time it took to read the libcurl docs and

I still wish I understood the original problem better. I'd like to
file a bug report against HTTPSConnection, but I'm afraid that without
understanding the server config better it might just be noise in the
collector. Should I go ahead anyway?

Thanks everybody!

-PW
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,131
Members
46,689
Latest member
liammiller

Latest Threads

Top