ftplib limitations?

D

durumdara

Hi!

See this code:

----------------

import os, sys, ftplib

from ftplib import FTP
ftp = FTP()
ftp.connect('ftp.anything.hu', 2121)
ftp.login('?', '?')
print ftp.getwelcome()
ftp.set_pasv(False)
ls = ftp.nlst()

for s in ls:

print "\nFilename:", '"%s"' % s,
fsize = ftp.size(s)
print "Size:", fsize
print "..download:",

d = {}
d['buffer'] = []
d['size'] = 0
d['lastpercentp10'] = 0
def CallBack(Data):
d['size'] = d['size'] + len(Data)
d['buffer'].append(Data)
percent = (d['size'] / float(fsize)) * 100
percentp10 = int(percent/10)
if percentp10 > d['lastpercentp10']:
d['lastpercentp10'] = percentp10
print str(percentp10 * 10) + "%",

ftp.retrbinary("retr " + s, CallBack)
print ""
print "..downloaded, joining"
dbuffer = "".join(d['buffer'])
adir = os.path.abspath("b:\\_BACKUP_")
newfilename = os.path.join(adir, s)
print "..saving into", newfilename
f = open(newfilename, "wb")
f.write(dbuffer)
f.close()
print "..saved"
print "..delete from the server"
ftp.delete(s)
print "..deleted"
#sys.exit()

print "\nFinished"

----------------

This code is login into a site, download and delete all files.

I experienced some problem.
The server is Windows and FileZilla, the client is Win7 and Python2.6.
When I got a file with size 1 303 318 662 byte, python is halt on
"retrbinary" line everytime.
It down all of the file (100%) but the next line never reached.
Some error I got, but this was in yesterday, I don't remember the text
of the error.

I want to ask that have Py2.6 some ftp limitations?

I remembered that Zip have 2 GB limitation, the bigger size of the
archive making infinite loop.

May ftplib also have this, and this cause the problem... Or I need to
add a "NOOP" command in Callback?

Thanks for your help:
dd
 
S

Stefan Schwarzer

Hi durumdara,

def CallBack(Data):
d['size'] = d['size'] + len(Data)
d['buffer'].append(Data)
percent = (d['size'] / float(fsize)) * 100
percentp10 = int(percent/10)
if percentp10 > d['lastpercentp10']:
d['lastpercentp10'] = percentp10
print str(percentp10 * 10) + "%",

ftp.retrbinary("retr " + s, CallBack)
print ""
print "..downloaded, joining"
dbuffer = "".join(d['buffer'])
[...]
This code is login into a site, download and delete all files.

I experienced some problem.
The server is Windows and FileZilla, the client is Win7 and Python2.6.
When I got a file with size 1 303 318 662 byte, python is halt on
"retrbinary" line everytime.

So if I understand correctly, the script works well on
smaller files but not on the large one?
It down all of the file (100%) but the next line never reached.

_Which_ line is never reached? The `print` statement after
the `retrbinary` call?
Some error I got, but this was in yesterday, I don't remember the text
of the error.

Can't you reproduce the error by executing the script once
more? Can you copy the file to another server and see if the
problem shows up there, too?

I can imagine the error message (a full traceback if
possible) would help to say a bit more about the cause of
the problem and maybe what to do about it.

Stefan
 
S

Stefan Schwarzer

Hi durumdara,

So if I understand correctly, the script works well on
smaller files but not on the large one?

I just did an experiment in the interpreter which
corresponds to this script:

import ftplib

of = open("large_file", "wb")

def callback(data):
of.write(data)

ftp = ftplib.FTP("localhost", userid, passwd)
ftp.retrbinary("RETR large_file", callback)

of.close()
ftp.close()

The file is 2 GB in size and is fully transferred, without
blocking or an error message. The status message from the
server is '226-File successfully transferred\n226 31.760
seconds (measured here), 64.48 Mbytes per second', so this
looks ok, too.

I think your problem is related to the FTP server or its
configuration.

Have you been able to reproduce the problem?

Stefan
 
D

durumdara

Hi!
So if I understand correctly, the script works well on
smaller files but not on the large one?

Yes. 500-800 MB is ok. > 1 GB is not ok.
_Which_ line is never reached? The `print` statement after
the `retrbinary` call?

Yes, the print.
Can't you reproduce the error by executing the script once
more? Can you copy the file to another server and see if the
problem shows up there, too?

I got everytime, but I don't have another server to test it.
I can imagine the error message (a full traceback if
possible) would help to say a bit more about the cause of
the problem and maybe what to do about it.

This was:

Filename: "Repositories 20100824_101805 (Teljes).zip" Size: 1530296127
...download: 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Traceback (most recent call last):
File "C:\D\LocalBackup\ftpdown.py", line 31, in <module>
ftp.retrbinary("retr " + s, CallBack)
File "C:\Python26\lib\ftplib.py", line 401, in retrbinary
return self.voidresp()
File "C:\Python26\lib\ftplib.py", line 223, in voidresp
resp = self.getresp()
File "C:\Python26\lib\ftplib.py", line 209, in getresp
resp = self.getmultiline()
File "C:\Python26\lib\ftplib.py", line 195, in getmultiline
line = self.getline()
File "C:\Python26\lib\ftplib.py", line 182, in getline
line = self.file.readline()
File "C:\Python26\lib\socket.py", line 406, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 10054] A lÚtez§ kapcsolatot a tßvoli ßllomßs
kÚnyszerÝtette
n bezßrta

So this message is meaning that the remote station forced close the
existing connection.

Now I'm trying with saving the file into temporary file, not hold in
memory.

Thanks:
dd
 
D

durumdara

Hi!

The file is 2 GB in size and is fully transferred, without
blocking or an error message. The status message from the
server is '226-File successfully transferred\n226 31.760
seconds (measured here), 64.48 Mbytes per second', so this
looks ok, too.

I think your problem is related to the FTP server or its
configuration.

Have you been able to reproduce the problem?

Yes. I tried with saving the file, but I also got this error.
but: Total COmmander CAN download the file, and ncftpget also can
download it without problem...

Hmmmmm... :-(

Thanks:
dd
 
S

Stefan Schwarzer

Hi durumdara,

I can imagine the error message (a full traceback if
possible) would help to say a bit more about the cause of
the problem and maybe what to do about it.

This was:

Filename: "Repositories 20100824_101805 (Teljes).zip" Size: 1530296127
..download: 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Traceback (most recent call last):
File "C:\D\LocalBackup\ftpdown.py", line 31, in <module>
ftp.retrbinary("retr " + s, CallBack)
File "C:\Python26\lib\ftplib.py", line 401, in retrbinary
return self.voidresp()
File "C:\Python26\lib\ftplib.py", line 223, in voidresp
resp = self.getresp()
File "C:\Python26\lib\ftplib.py", line 209, in getresp
resp = self.getmultiline()
File "C:\Python26\lib\ftplib.py", line 195, in getmultiline
line = self.getline()
File "C:\Python26\lib\ftplib.py", line 182, in getline
line = self.file.readline()
File "C:\Python26\lib\socket.py", line 406, in readline
data = self._sock.recv(self._rbufsize)
socket.error: [Errno 10054] A lÚtez§ kapcsolatot a tßvoli ßllomßs
kÚnyszerÝtette
n bezßrta

So this message is meaning that the remote station forced close the
existing connection.

The file transfer protocol uses two connections for data
transfers, a control connection to send commands and
responses, and a data connection for the data payload
itself.

Now it may be that the data connection, after having started
the transfer, works as it should, but the control connection
times out because the duration of the transfer is too long.
A hint at this is that the traceback above contains
`getline` and `readline` calls which strongly suggest that
this socket was involved in some text transfer (presumably
for a status message).

Most FTP servers are configured for a timeout of 5 or 10
minutes. If you find that the file transfers don't fail
reproducably for a certain size limit, it's probably not the
size of the file that causes the problem but some timing
issue (see above).

What to do about it? One approach is to try to get the
timeout value increased. Of course that depends on the
relation between you and the party running the server.
Another approach is to catch the exception and ignore it.
To make sure you only ignore timeout messages, you may want
to check the status code at the start of the error message
and re-raise the exception if it's not the status expected
for a timeout. Something along the lines of:

try:
# transer involving `retrbinary`
except socket.error, exc:
if str(exc).startswith("[Errno 10054] "):
pass
else:
raise

Note, however, that this is a rather brittle way to handle
the problem, as the status code or format of the error
message may depend on the platform your program runs on,
library versions, etc.

In any case you should close and re-open the FTP connection
after you got the error from the server.
Now I'm trying with saving the file into temporary file, not hold in
memory.

If my theory holds, that shouldn't make a difference. But
maybe my theory is wrong. :)

Could you do me a favor and try your download with ftputil
[1]? The code should be something like:

import ftputil

host = ftputil.FTPHost(server, userid, passwd)
for name in host.listdir(host.curdir):
host.download(name, name, 'b')
host.close()

There's neither a need nor - at the moment - a possibility
to specify a callback if you just want the download. (I'm
working on the callback support though!)

For finding the error, it's of course better to just use the
download command for the file that troubles you.

I'm the maintainer of ftputil and if you get the same or
similar error here, I may find a workaround for ftputil. As
it happens, someone reported a similar problem (_if_ it's
the same problem in your case) just a few days ago. [2]

[1] http://ftputil.sschwarzer.net
[2] http://www.mail-archive.com/[email protected]/msg00141.html

Stefan
 
S

Stefan Schwarzer

Hi durumdara,

Yes. I tried with saving the file, but I also got this error.
but: Total COmmander CAN download the file, and ncftpget also can
download it without problem...

I suppose they do the same as in my former suggestion:
"catching" the error and ignoring it. ;-)

After all, if I understood you correctly, you get the
complete file contents, so with ftplib the download
succeeds as well (in a way).

You might want to do something like (untested):

import os
import socket

import ftputil

def my_download(host, filename):
"""Some intelligent docstring."""
# Need timestamp to check if we actually have a new
# file after the attempted download
try:
old_mtime = os.path.getmtime(filename)
except OSError:
old_mtime = 0.0
try:
host.download(filename, filename, 'b')
except socket.error:
is_rewritten = (os.path.getmtime(filename) != old_mtime)
# If you're sure that suffices as a test
is_complete = (host.path.getsize(filename) ==
os.path.getsize(filename))
if is_rewritten and is_complete:
# Transfer presumably successful, ignore error
pass
else:
# Something else went wrong
raise

def main():
host = ftputil.FTPHost(...)
my_download(host, "large_file")
host.close()

If you don't want to use an external library, you can use
`ftplib.FTP`'s `retrbinary` and check the file size with
`ftplib.FTP.size`. This size command requires support for
the SIZE command on the server, whereas ftputil parses the
remote directory listing to extract the size and so doesn't
depend on SIZE support.

Stefan
 
L

Lawrence D'Oliveiro

Now it may be that the data connection, after having started
the transfer, works as it should, but the control connection
times out because the duration of the transfer is too long.

It might not be the fault of the FTP server. If you’re going through a
router doing NAT, that could be where the timeout is happening.
 
S

Stefan Schwarzer

Hi Lawrence,

It might not be the fault of the FTP server. If you’re going through a
router doing NAT, that could be where the timeout is happening.

Good point, thanks! That may explain why it's a low-level
socket error instead of a 4xx timeout message from the
server which I would have expected.

If it's the router, the OP might try to change their router
settings to get rid of the problem.

Stefan
 
L

Lawrence D'Oliveiro

Good point, thanks! That may explain why it's a low-level
socket error instead of a 4xx timeout message from the
server which I would have expected.

The reason why I thought of it was because it kept happening to me back when
I was using a D-Link DSL-500 to provide my ADSL connection. Unfortunately...
If it's the router, the OP might try to change their router
settings to get rid of the problem.

.... if they’re using a typical consumer ADSL router box like the above, they
may not have any NAT table timeout settings to play with to cure the
problem. I certainly couldn’t find any in mine.

In my case, I solved the problem by using a Linux box as my router.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top