file locking...

B

bruce

Hi.

Got a bit of a question/issue that I'm trying to resolve. I'm asking this of
a few groups so bear with me.

I'm considering a situation where I have multiple processes running, and
each process is going to access a number of files in a dir. Each process
accesses a unique group of files, and then writes the group of files to
another dir. I can easily handle this by using a form of locking, where I
have the processes lock/read a file and only access the group of files in
the dir based on the open/free status of the lockfile.

However, the issue with the approach is that it's somewhat synchronous. I'm
looking for something that might be more asynchronous/parallel, in that I'd
like to have multiple processes each access a unique group of files from the
given dir as fast as possible.

So.. Any thoughts/pointers/comments would be greatly appreciated. Any
pointers to academic research, etc.. would be useful.

thanks
 
Z

zugnush

You could do something like this so that every process will know if
the file "belongs" to it without prior coordination, it means a lot
of redundant hashing though.

In [36]: import md5

In [37]: pool = 11

In [38]: process = 5

In [39]: [f for f in glob.glob('*') if int(md5.md5(f).hexdigest(),16)
% pool == process ]
Out[39]:
 
N

Nigel Rantor

zugnush said:
You could do something like this so that every process will know if
the file "belongs" to it without prior coordination, it means a lot
of redundant hashing though.

In [36]: import md5

In [37]: pool = 11

In [38]: process = 5

In [39]: [f for f in glob.glob('*') if int(md5.md5(f).hexdigest(),16)
% pool == process ]
Out[39]:

You're also relying on the hashing being perfectly distributed,
otherwise some processes aren't going to be performing useful work even
though there is useful work to perform.

In other words, why would you rely on a scheme that limits some
processes to certain parts of the data? If we're already talking about
trying to get away without some global lock for synchronisation this
seems to go against the original intent of the problem...

n
 
L

Lawrence D'Oliveiro

Nigel said:
In other words, why would you rely on a scheme that limits some
processes to certain parts of the data?

That could be part of the original requirements, it's not clear from the
description so far.
 
T

Thomas Guettler

Hi Bruce,

you can do it like Maildir [1] you move (os.rename()) file or directories.

Maybe something like this: You have three directories: "todo", "in-process" and "done".
A process tries to os.rename from todo to in-process. If it fails, some other
process has done it before. If the process is done it moves the file/directory
to "done".

To avoid stressing the directories, too much, It might be good to use subdirectories
like todo/NN/MM/. I think git (version control system created by Linus Torvalds)
does something like this.


Thomas

[1] http://wiki.dovecot.org/MailboxFormat/Maildir
This page describes Maildir and some unneeded parts of the specification.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,817
Latest member
AdalbertoT

Latest Threads

Top