long lists

M

Merrigan

Hi All,

Firstly - thank you Sean for the help and the guideline to get the
size comparison, I will definitely look into this.

At the moment I actually have 2 bigger issues that needs sorting...

1. I have the script popping all the files that need to be checked
into a list, and have it parsing the list for everything...Now the
problem is this : The sever needs to check (at the moment) 375 files
and eliminate those that don't need reuploading. This number will
obviously get bigger and bigger as more files gets uploaded. Now, the
problem that I'm having is that the script is taking forever to parse
the list and give the final result. How can I speed this up?

2. This issue is actually because of the first one. While the script
is parsing the lists and files, the connection to the ftp server times
out, and I honestly must say that is is quite annoying. I know I can
set the function to reconnect if it cannot find a connection, but
wouldn't it just be easier just to keep the connection alive? Any idea
how I can keep the connection alive?

Thanks for all the help folks, I really appreciate it!
 
S

Steven D'Aprano

1. I have the script popping all the files that need to be checked into
a list, and have it parsing the list for everything...Now the problem is
this : The sever needs to check (at the moment) 375 files and eliminate
those that don't need reuploading. This number will obviously get bigger
and bigger as more files gets uploaded. Now, the problem that I'm having
is that the script is taking forever to parse the list and give the
final result. How can I speed this up?

By writing faster code???

It's really hard to answer this without more information. In particular:

- what's the format of the list and how do you parse it?

- how does the script decide what files need uploading?
 
M

Merrigan

By writing faster code???

It's really hard to answer this without more information. In particular:

- what's the format of the list and how do you parse it?

- how does the script decide what files need uploading?

Hi, Thanx for the reply,

The Script it available at this url : http://www.lewendewoord.co.za/theScript.py

P.S. I know it looks like crap, but I'm a n00b, and not yet through
the OOP part of the tutorial.

Thanx in advance!
 
M

Marc 'BlackJack' Rintsch

The Script it available at this url : http://www.lewendewoord.co.za/theScript.py

P.S. I know it looks like crap, but I'm a n00b, and not yet through
the OOP part of the tutorial.

One spot of really horrible runtime is the `comp_are()` function, it has
quadratic runtime. Why the funny spelling BTW?

Why are you binding the objects to new names all the time and calling
`str()` repeatedly on string objects? The names `a`, `b` and `fn2up` are
unnecessary, you can use `file1`, `file2` and `filename` instead. And
``str(str(b))`` on a string object is a no-operation. It's the same as
simply writing ``b``.

Those two nested ``for``-loops can be replaced by converting both lists
into `set()` objects, calculating the difference and convert back to a
sorted list:

def compare(remote, local):
return sorted(set(local) - set(remote))

Ciao,
Marc 'BlackJack' Rintsch
 
G

Gabriel Genellina

The Script it available at this url :
http://www.lewendewoord.co.za/theScript.py

I understand this as a learning exercise, since there are lot of utilities
for remote syncing.

Some comments:
- use os.path.join to build file paths, instead of concatenating strings.
- instead of reassigning sys.stdout before the call to retrlines, use the
callback:

saveinfo = sys.stdout
fsock = open(tempDir + "remotelist.txt", "a")
sys.stdout = fsock
ftpconn.cwd(remotedir) #This changes to the remote directory
ftpconn.retrlines("LIST") #This gets a complete list of everything in
the directory
sys.stdout = saveinfo
fsock.close()

becomes:

fsock = open(os.path.join(tempDir,"remotelist.txt"), "a")
ftpconn.cwd(remotedir) #This changes to the remote directory
ftpconn.retrlines("LIST", fsock.write) #This gets a complete list of
everything in the directory
fsock.close()
(Why mode="a"? Shouldn't it be "w"? Isn't the listing for a single
directory?)

- Saving both file lists may be useful, but why do you read them again? If
you already have a list of local filenames and remote filenames, why read
them from the saved copy?
- It's very confusing having "filenames" ending with "\n" - strip that as
you read it. You can use fname = fname.rstrip()
- If you are interested on filenames with a certain extension, only
process those files. That is, filter them *before* the processing begins.

- The time-consuming part appears to be this:

def comp_are():
global toup
temptoup = []
for file1 in remotefiles:
a = file1
for file2 in localfiles:
b = file2
if str(a) == str(b):
pass
if str(b) != str(a):
temptoup.append(str(str(b)))
toup = list(sets.Set(temptoup))
for filename in remotefiles:
fn2up = filename
for item in toup:
if fn2up == item:
toup.remove(item)
else:
pass
toup.sort()

(It's mostly nonsense... what do you expect from str(str(b)) different
from str(b)? and the next line is just a waste of time, can you see why?)
I think you want to compare two lists of filenames, and keep the elements
that are on one "localfiles" list but not on the other. As you appear to
know about sets: it's the set difference between "localfiles" and
"remotefiles". Keeping the same "globalish" thing:

def comp_are():
global toup
toup = list(sets.Set(localfiles) - sets.Set(remotefiles))
toup.sort()

Since Python 2.4, set is a builtin type, and you have sorted(), so you
could write:

def comp_are():
global toup
toup = sorted(set(localfiles) - set(remotefiles))

- Functions may have parameters and return useful things :)
That is, you may write, by example:

remotefiles = getRemoteFiles(host, remotedir)
localfiles = getLocalFiles(localdir)
newfiles = findNewFiles(localfiles, remotefiles)
uploadFiles(host, newfiles)
 
M

Merrigan

The Script it available at this url :
http://www.lewendewoord.co.za/theScript.py

I understand this as a learning exercise, since there are lot of utilities
for remote syncing.

Some comments:
- use os.path.join to build file paths, instead of concatenating strings.
- instead of reassigning sys.stdout before the call to retrlines, use the
callback:

saveinfo = sys.stdout
fsock = open(tempDir + "remotelist.txt", "a")
sys.stdout = fsock
ftpconn.cwd(remotedir) #This changes to the remote directory
ftpconn.retrlines("LIST") #This gets a complete list of everything in
the directory
sys.stdout = saveinfo
fsock.close()

becomes:

fsock = open(os.path.join(tempDir,"remotelist.txt"), "a")
ftpconn.cwd(remotedir) #This changes to the remote directory
ftpconn.retrlines("LIST", fsock.write) #This gets a complete list of
everything in the directory
fsock.close()
(Why mode="a"? Shouldn't it be "w"? Isn't the listing for a single
directory?)

- Saving both file lists may be useful, but why do you read them again? If
you already have a list of local filenames and remote filenames, why read
them from the saved copy?
- It's very confusing having "filenames" ending with "\n" - strip that as
you read it. You can use fname = fname.rstrip()
- If you are interested on filenames with a certain extension, only
process those files. That is, filter them *before* the processing begins.

- The time-consuming part appears to be this:

def comp_are():
global toup
temptoup = []
for file1 in remotefiles:
a = file1
for file2 in localfiles:
b = file2
if str(a) == str(b):
pass
if str(b) != str(a):
temptoup.append(str(str(b)))
toup = list(sets.Set(temptoup))
for filename in remotefiles:
fn2up = filename
for item in toup:
if fn2up == item:
toup.remove(item)
else:
pass
toup.sort()

(It's mostly nonsense... what do you expect from str(str(b)) different
from str(b)? and the next line is just a waste of time, can you see why?)
I think you want to compare two lists of filenames, and keep the elements
that are on one "localfiles" list but not on the other. As you appear to
know about sets: it's the set difference between "localfiles" and
"remotefiles". Keeping the same "globalish" thing:

def comp_are():
global toup
toup = list(sets.Set(localfiles) - sets.Set(remotefiles))
toup.sort()

Since Python 2.4, set is a builtin type, and you have sorted(), so you
could write:

def comp_are():
global toup
toup = sorted(set(localfiles) - set(remotefiles))

- Functions may have parameters and return useful things :)
That is, you may write, by example:

remotefiles = getRemoteFiles(host, remotedir)
localfiles = getLocalFiles(localdir)
newfiles = findNewFiles(localfiles, remotefiles)
uploadFiles(host, newfiles)

Hmmm, thanks a lot. This has really been helpful. I have tried putting
it in the set, and whoops, it workes. Now, I think I need to start
learning some more.

now the script is running a lot slower...
Now to get the rest of it up and running...

Thanx for the help!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,830
Latest member
HeleneMull

Latest Threads

Top