out of memory with processing module

B

Brian

I'm using the third-party "processing" module in Python 2.5, which may
have become the "multiprocessing" module in Python 2.6, to speed up
the execution of a computation that takes over a week to run. The
relevant code may not be relevant, but it is:

q1, q2 = processing.Queue(), processing.Queue()
p1 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[1:(numClusters - 1)/2], questions,
copy.copy(maxMatch), q1))
p2 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[(numClusters - 1)/2:], questions, copy.copy
(maxMatch), q2))
p1.start()
p2.start()
maxMatch1 = q1.get()[0]
maxMatch2 = q2.get()[0]
p1.join()
p2.join()
if maxMatch1[1] > maxMatch2[1]:
maxMatch = maxMatch1
else:
maxMatch = maxMatch2

This code just splits up the calculation of the cluster that best
matches 'user' into two for loops, each in its own process, rather
than one. (It's not important what the cluster is.)

The error I get is:

[21661.903889] Out of memory: kill process 14888 (python) score 610654
or a child
[21661.903930] Killed process 14888 (python)
Traceback (most recent call last):
....etc. etc. ...

Running this process from tty1, rather than GNOME, on my Ubuntu Hardy
system allowed the execution to get a little further than under GNOME.

The error was surprising because with just 1 GB of memory and a single
for loop I didn't run into the error, but with 5 GB and two processes,
I do. I believe that in the 1 GB case there was just a lot of
painfully slow swapping going on that allowed it to continue.
'processing' appears to throw its hands up immediately, instead.

Why does the program fail with 'processing' but not without it? Do you
have any ideas for resolving the problem? Thanks for your help.
 
A

alessiogiovanni.baroni

I'm using the third-party "processing" module in Python 2.5, which may
have become the "multiprocessing" module in Python 2.6, to speed up
the execution of a computation that takes over a week to run. The
relevant code may not be relevant, but it is:

            q1, q2 = processing.Queue(), processing.Queue()
            p1 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[1:(numClusters - 1)/2], questions,
copy.copy(maxMatch), q1))
            p2 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[(numClusters - 1)/2:], questions, copy.copy
(maxMatch), q2))
            p1.start()
            p2.start()
            maxMatch1 = q1.get()[0]
            maxMatch2 = q2.get()[0]
            p1.join()
            p2.join()
            if maxMatch1[1] > maxMatch2[1]:
                maxMatch = maxMatch1
            else:
                maxMatch = maxMatch2

This code just splits up the calculation of the cluster that best
matches 'user' into two for loops, each in its own process, rather
than one. (It's not important what the cluster is.)

The error I get is:

[21661.903889] Out of memory: kill process 14888 (python) score 610654
or a child
[21661.903930] Killed process 14888 (python)
Traceback (most recent call last):
...etc. etc. ...

Running this process from tty1, rather than GNOME, on my Ubuntu Hardy
system allowed the execution to get a little further than under GNOME.

The error was surprising because with just 1 GB of memory and a single
for loop I didn't run into the error, but with 5 GB and two processes,
I do. I believe that in the 1 GB case there was just a lot of
painfully slow swapping going on that allowed it to continue.
'processing' appears to throw its hands up immediately, instead.

Why does the program fail with 'processing' but not without it? Do you
have any ideas for resolving the problem? Thanks for your help.

If your program crashes with more of one process, maybe you handle the
Queue objects
not properly? If you can, post the code of _findMaxMatch.
 
B

Brian

I'm using the third-party "processing" module in Python 2.5, which may
have become the "multiprocessing" module in Python 2.6, to speed up
the execution of a computation that takes over a week to run. The
relevant code may not be relevant, but it is:
            q1, q2 = processing.Queue(), processing.Queue()
            p1 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[1:(numClusters - 1)/2], questions,
copy.copy(maxMatch), q1))
            p2 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[(numClusters - 1)/2:], questions, copy.copy
(maxMatch), q2))
            p1.start()
            p2.start()
            maxMatch1 = q1.get()[0]
            maxMatch2 = q2.get()[0]
            p1.join()
            p2.join()
            if maxMatch1[1] > maxMatch2[1]:
                maxMatch = maxMatch1
            else:
                maxMatch = maxMatch2
This code just splits up the calculation of the cluster that best
matches 'user' into two for loops, each in its own process, rather
than one. (It's not important what the cluster is.)
The error I get is:
[21661.903889] Out of memory: kill process 14888 (python) score 610654
or a child
[21661.903930] Killed process 14888 (python)
Traceback (most recent call last):
...etc. etc. ...
Running this process from tty1, rather than GNOME, on my Ubuntu Hardy
system allowed the execution to get a little further than under GNOME.
The error was surprising because with just 1 GB of memory and a single
for loop I didn't run into the error, but with 5 GB and two processes,
I do. I believe that in the 1 GB case there was just a lot of
painfully slow swapping going on that allowed it to continue.
'processing' appears to throw its hands up immediately, instead.
Why does the program fail with 'processing' but not without it? Do you
have any ideas for resolving the problem? Thanks for your help.

If your program crashes with more of one process, maybe you handle the
Queue objects
not properly? If you can, post the code of _findMaxMatch.


Thanks for your interest. Here's _findMaxMatch:

def _findMaxMatch(reciprocal, user, clusters, sources, maxMatch,
queue):
for clusternumminusone, cluster in enumerate(clusters):
clusterFirstData, clusterSecondData = cluster.getData(sources)
aMatch = gum.calculateMatchGivenData(user.data, None, None,
None, user2data=clusterSecondData)[2]
if reciprocal:
maxMatchB = gum.calculateMatchGivenData(clusterFirstData,
None, None, None, user2data=user.secondUserData)[2]
aMatch = float(aMatch + maxMatchB) / 2
if aMatch > maxMatch[1]:
maxMatch = [clusternumminusone + 1, aMatch]
queue.put([maxMatch])
 
B

Brian

I'm using the third-party "processing" module in Python 2.5, which may
have become the "multiprocessing" module in Python 2.6, to speed up
the execution of a computation that takes over a week to run. The
relevant code may not be relevant, but it is:
            q1, q2 = processing.Queue(), processing.Queue()
            p1 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[1:(numClusters - 1)/2], questions,
copy.copy(maxMatch), q1))
            p2 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[(numClusters - 1)/2:], questions, copy.copy
(maxMatch), q2))
            p1.start()
            p2.start()
            maxMatch1 = q1.get()[0]
            maxMatch2 = q2.get()[0]
            p1.join()
            p2.join()
            if maxMatch1[1] > maxMatch2[1]:
                maxMatch = maxMatch1
            else:
                maxMatch = maxMatch2
This code just splits up the calculation of the cluster that best
matches 'user' into two for loops, each in its own process, rather
than one. (It's not important what the cluster is.)
The error I get is:
[21661.903889] Out of memory: kill process 14888 (python) score 610654
or a child
[21661.903930] Killed process 14888 (python)
Traceback (most recent call last):
...etc. etc. ...
Running this process from tty1, rather than GNOME, on my Ubuntu Hardy
system allowed the execution to get a little further than under GNOME.
The error was surprising because with just 1 GB of memory and a single
for loop I didn't run into the error, but with 5 GB and two processes,
I do. I believe that in the 1 GB case there was just a lot of
painfully slow swapping going on that allowed it to continue.
'processing' appears to throw its hands up immediately, instead.
Why does the program fail with 'processing' but not without it? Do you
have any ideas for resolving the problem? Thanks for your help.

If your program crashes with more of one process, maybe you handle the
Queue objects
not properly? If you can, post the code of _findMaxMatch.


Thanks for your interest. Here's _findMaxMatch:

def _findMaxMatch(reciprocal, user, clusters, sources, maxMatch,
queue):
for clusternumminusone, cluster in enumerate(clusters):
clusterFirstData, clusterSecondData = cluster.getData(sources)
aMatch = gum.calculateMatchGivenData(user.data, None, None,
None, user2data=clusterSecondData)[2]
if reciprocal:
maxMatchB = gum.calculateMatchGivenData(clusterFirstData,
None, None, None, user2data=user.secondUserData)[2]
aMatch = float(aMatch + maxMatchB) / 2
if aMatch > maxMatch[1]:
maxMatch = [clusternumminusone + 1, aMatch]
queue.put([maxMatch])
 
A

alessiogiovanni.baroni

I'm using the third-party "processing" module in Python 2.5, which may
have become the "multiprocessing" module in Python 2.6, to speed up
the execution of a computation that takes over a week to run. The
relevant code may not be relevant, but it is:
            q1, q2 = processing.Queue(), processing.Queue()
            p1 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[1:(numClusters - 1)/2], questions,
copy.copy(maxMatch), q1))
            p2 = processing.Process(target=_findMaxMatch, args=
(reciprocal, user, clusters[(numClusters - 1)/2:], questions, copy.copy
(maxMatch), q2))
            p1.start()
            p2.start()
            maxMatch1 = q1.get()[0]
            maxMatch2 = q2.get()[0]
            p1.join()
            p2.join()
            if maxMatch1[1] > maxMatch2[1]:
                maxMatch = maxMatch1
            else:
                maxMatch = maxMatch2
This code just splits up the calculation of the cluster that best
matches 'user' into two for loops, each in its own process, rather
than one. (It's not important what the cluster is.)
The error I get is:
[21661.903889] Out of memory: kill process 14888 (python) score 610654
or a child
[21661.903930] Killed process 14888 (python)
Traceback (most recent call last):
...etc. etc. ...
Running this process from tty1, rather than GNOME, on my Ubuntu Hardy
system allowed the execution to get a little further than under GNOME..
The error was surprising because with just 1 GB of memory and a single
for loop I didn't run into the error, but with 5 GB and two processes,
I do. I believe that in the 1 GB case there was just a lot of
painfully slow swapping going on that allowed it to continue.
'processing' appears to throw its hands up immediately, instead.
Why does the program fail with 'processing' but not without it? Do you
have any ideas for resolving the problem? Thanks for your help.
If your program crashes with more of one process, maybe you handle the
Queue objects
not properly? If you can, post the code of _findMaxMatch.

Thanks for your interest. Here's _findMaxMatch:

def _findMaxMatch(reciprocal, user, clusters, sources, maxMatch,
queue):
    for clusternumminusone, cluster in enumerate(clusters):
        clusterFirstData, clusterSecondData = cluster.getData(sources)
        aMatch = gum.calculateMatchGivenData(user.data, None, None,
None, user2data=clusterSecondData)[2]
        if reciprocal:
            maxMatchB = gum.calculateMatchGivenData(clusterFirstData,
None, None, None, user2data=user.secondUserData)[2]
            aMatch = float(aMatch + maxMatchB) / 2
        if aMatch > maxMatch[1]:
            maxMatch = [clusternumminusone + 1, aMatch]
    queue.put([maxMatch])

You can post the entire error message + full traceback?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top