Unpythonic Python

D

David Abrahams

I started having some weird problems with Python recently; they're so
weird that I can't begin to explain them. All I can do is describe
the symptoms and hope someone else has a clue. So here goes:

FreeBSD 4.2, Python 2.2.2.

I have a nightly cron job that downloads the boost cvs tarball from
SourceForge and bunzip2s it. For about a year everything worked with
no problems. About a month ago the download started getting truncated
with no error reported. Then bunzip2 reports corruption, of course.

I took the salient part of the download script, and added a reporthook
(undocumented in urllib, BTW) to the urlretrieve call:
--
import urllib
import os

def dump(*args):
print args

#print 'downloading...'
os.chdir('/tmp')
urllib.urlretrieve('http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2',
'boost-cvsroot.tar.bz2', dump)
---

When a recent download was truncated, the last lines of the dump were:

(1014, 8192, 34441987)
(1015, 8192, 34441987)
(1016, 8192, 34441987)
(1017, 8192, 34441987)
(1018, 8192, 34441987)
(1019, 8192, 34441987)
(1020, 8192, 34441987)
(1021, 8192, 34441987)
(1022, 8192, 34441987)
(1023, 8192, 34441987)

is 1023 a coincidence? Maybe; here's the tail of another failure:

(2439, 8192, 34455413)
(2440, 8192, 34455413)
(2441, 8192, 34455413)
(2442, 8192, 34455413)
(2443, 8192, 34455413)
(2444, 8192, 34455413)
(2445, 8192, 34455413)
(2446, 8192, 34455413)
(2447, 8192, 34455413)
(2448, 8192, 34455413)

So I figured maybe we needed a newer version of Python. I asked my
sysadmin at stlport.com to upgrade Python to the most recent release,
and all of a sudden my incoming mail started looping (see below).

I am classifying spam with SpamBayes and on my system the only way to
get it sorted into IMAP folders after classification is to send it to
myself. Only messages lacking an X-Spambayes-Classification get
classified and sent back out, so I guess when Python was upgraded the
classification stopped adding the headers? My sysadmin rolled Python
back to 2.2.2 and the mail problems stopped. But I still have the
truncated download problem.

Any clues?
Thanks in advance!

-Dave

--

From: <[email protected]>
Subject: Undeliverable mail: RE: What's wrong with this?
To: <[email protected]>
Date: Mon, 23 Aug 2004 17:08:45 -0700

Failed to deliver to 'dave'
mail loop: too many hops (too many 'Received:' header fields)


Reporting-MTA: dns; stlport.com

Original-Recipient: rfc822;<dave>
Final-Recipient: system;<dave>
Action: failed
Status: 5.0.0
[3. text/rfc822-headers]

Received: by stlport.com (CommuniGate Pro PIPE 4.2)
with PIPE id 817189; Mon, 23 Aug 2004 17:08:45 -0700
Received: by stlport.com (CommuniGate Pro PIPE 4.2)
with PIPE id 817183; Mon, 23 Aug 2004 17:08:26 -0700
<snip>
Received: from [12.163.41.8] (HELO expressmail.office.meta)
by stlport.com (CommuniGate Pro SMTP 4.2)
with SMTP id 817122 for (e-mail address removed); Mon, 23 Aug 2004
17:04:16 -0700
Received-SPF: error
receiver=stlport.com; client-ip=12.163.41.8;
[email protected]
Received: by expressmail.office.meta with Internet Mail Service
(5.5.2653.19)
id <RGXKFLXQ>; Mon, 23 Aug 2004 19:03:42 -0500
Message-ID: <[email protected]>
From: Aleksey Gurtovoy <[email protected]>
To: 'David Abrahams' <[email protected]>
Subject: RE: What's wrong with this?
Date: Mon, 23 Aug 2004 19:03:42 -0500
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
charset="iso-8859-1"
 
T

Thomas Heller

I started having some weird problems with Python recently; they're so
weird that I can't begin to explain them. All I can do is describe
the symptoms and hope someone else has a clue. So here goes:

FreeBSD 4.2, Python 2.2.2.

I have a nightly cron job that downloads the boost cvs tarball from
SourceForge and bunzip2s it. For about a year everything worked with
no problems. About a month ago the download started getting truncated
with no error reported.

There were some problems with anonymous CVS on sourceforge, which also
affected the nightly CVS tarballs. Can it have to do with this?
I also had problems downloading the CVS tarball for ctypes - but it
seems now repaired.

http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#1093021394
Then bunzip2 reports corruption, of course.
Maybe you don't get a bz2 file, but a HTML error message instead?

Only speculating,

Thomas
 
D

David Abrahams

Thomas Heller said:
There were some problems with anonymous CVS on sourceforge, which also
affected the nightly CVS tarballs. Can it have to do with this?
I also had problems downloading the CVS tarball for ctypes - but it
seems now repaired.

http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#1093021394

That's not the problem. I can download the file reliably from other machines.
Maybe you don't get a bz2 file, but a HTML error message instead?

No, it's a truncated bz2. I can use bzip2recover and get some of the contents back.

Thanks, though.
 
R

Rob Williscroft

David Abrahams wrote in in
comp.lang.python:

At the same time, using http ?
Actually it appears that urllib is having some problem on Unix, at
least the one from Python-2.2.x. This fails on Both FreeBSD and
Linux:

urllib.urlretrieve(
'http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2',
'boost-cvsroot.tar.bz2')

Trying again with Python 2.3 on Cygwin.

Is it possible the file is being (re) uploaded (via cvs) during your
cron job's download, thus truncating your download ?

Perhapse you should change to cvs:

os.system( 'cvs ... ' )

FWIW, I tried downlading with IE using the link above I got a
truncated 6 and bit MB's (16:15 BST (UTC +0100)).

Rob.
 
D

David Abrahams

Rob Williscroft said:
David Abrahams wrote in in
comp.lang.python:


At the same time, using http ?

I can download the file reliably using IE from my WinXP box.

I can download the file reliably using urllib from Cygwin Python 2.3.2

The 2nd element returned by urlretrieve is

'Date: Wed, 25 Aug 2004 14:50:17 GMT\r\nServer: Apache/2.0.40 (Red Hat Linux)\r\nLast-Modified: Wed, 25 Aug 20
2 GMT\r\nETag: "b63d5b-20ec84b-18057e80"\r\nAccept-Ranges: bytes\r\nContent-Length: 34523211\r\nContent-Type:
n/x-bzip2\r\nConnection: close\r\n'

As you can see from the above, it works. Is there a known urllib bug
in earlier Pythons?
Is it possible the file is being (re) uploaded (via cvs) during your
cron job's download, thus truncating your download ?

I don't think so.
Perhapse you should change to cvs:

os.system( 'cvs ... ' )

The problem with that is that I want to capture the whole CVS
history, not just today's state.
FWIW, I tried downlading with IE using the link above I got a
truncated 6 and bit MB's (16:15 BST (UTC +0100)).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sorry, what does that mean? Did it show that message in a dialog,
or...?
 
R

Rob Williscroft

David Abrahams wrote in in
comp.lang.python:
I can download the file reliably using IE from my WinXP box.

I can download the file reliably using urllib from Cygwin Python 2.3.2

The 2nd element returned by urlretrieve is

Which version, the one that works or the one that doesn't ?
'Date: Wed, 25 Aug 2004 14:50:17 GMT\r\nServer: Apache/2.0.40 (Red
Hat Linux)\r\nLast-Modified: Wed, 25 Aug 20 2 GMT\r\nETag:

Something is missing here:

Last-Modified: Wed, 25 Aug 20 2 GMT

Contrast:

Wed, 25 Aug 2004 14:50:17 GMT
"b63d5b-20ec84b-18057e80"\r\nAccept-Ranges: bytes\r\nContent-Length:
34523211\r\nContent-Type: n/x-bzip2\r\nConnection: close\r\n'

34 MB's ( I got 6 MB's )
As you can see from the above, it works. Is there a known urllib bug
in earlier Pythons?

Sorry I don't know, but I've seen the same truncation with no python,
and no unix.
I don't think so.

Can you test wether or not this is happening ? I.e if you don't
get the full 34523211 bytes re-download and compare the above
Length, ETag and Last-Modified.
The problem with that is that I want to capture the whole CVS
history, not just today's state.

I was suggesting you get the tarball via cvs, though presumably
sourceforge don't give you the option. http has the problem that
the server will just truncate the download if the source file
gets replaced.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sorry, what does that mean? Did it show that message in a dialog,
or...?

No, I got a download complete, but the file was only 6 MB's, bzip2 -t
told me the file was truncated, the (16:15 ...) is the time I tried
downloading, BST = British Summer Time, though you wouldn't know it
from the weather :).

Further I just ran:

import urllib

filename, headers = \
urllib.urlretrieve(
'http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2',
'boost-cvsroot.tar.bz2')

print filename

print headers

boost-cvsroot.tar.bz2
Date: Wed, 25 Aug 2004 16:53:20 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Last-Modified: Wed, 25 Aug 2004 14:14:02 GMT
ETag: "b63d5b-20ec84b-18057e80"
Accept-Ranges: bytes
Content-Length: 34523211
Content-Type: application/x-bzip2
Connection: close

The script ended at 17::59 BST, Note the difference bettween the two
times in the headers, suggesting the file was modified 1:45 min's
ago ~ the same time my attempted download with IE failed.

Rob.
 
D

David Abrahams

Rob Williscroft said:
David Abrahams wrote in in
comp.lang.python:


Which version, the one that works or the one that doesn't ?

The one that works.
Something is missing here:

Last-Modified: Wed, 25 Aug 20 2 GMT

Contrast:

Wed, 25 Aug 2004 14:50:17 GMT

Where did that come from, what do you think is missing, and why?
34 MB's ( I got 6 MB's )

It's 34MB.
Sorry I don't know, but I've seen the same truncation with no python,
and no unix.
Argh.


Can you test wether or not this is happening ? I.e if you don't
get the full 34523211 bytes re-download and compare the above
Length, ETag and Last-Modified.

I did some tests, but didn't come up with anything conclusive. I set
my cron job to start 3 hours later. We'll see.
I was suggesting you get the tarball via cvs, though presumably
sourceforge don't give you the option.

No they don't.
http has the problem that
the server will just truncate the download if the source file
gets replaced.


No, I got a download complete, but the file was only 6 MB's, bzip2 -t
told me the file was truncated, the (16:15 ...) is the time I tried
downloading, BST = British Summer Time, though you wouldn't know it
from the weather :).

Further I just ran:

import urllib

filename, headers = \
urllib.urlretrieve(
'http://cvs.sourceforge.net/cvstarballs/boost-cvsroot.tar.bz2',
'boost-cvsroot.tar.bz2')

print filename

print headers

boost-cvsroot.tar.bz2
Date: Wed, 25 Aug 2004 16:53:20 GMT
Server: Apache/2.0.40 (Red Hat Linux)
Last-Modified: Wed, 25 Aug 2004 14:14:02 GMT
ETag: "b63d5b-20ec84b-18057e80"
Accept-Ranges: bytes
Content-Length: 34523211
Content-Type: application/x-bzip2
Connection: close

The script ended at 17::59 BST, Note the difference bettween the two
times in the headers, suggesting the file was modified 1:45 min's
ago ~ the same time my attempted download with IE failed.

That's odd! Your (failed) download modified the file being
downloaded??
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top