Object cleanup

P

psaffrey

I am writing a screen scraping application using BeautifulSoup:

http://www.crummy.com/software/BeautifulSoup/

(which is fantastic, by the way).

I have an object that has two methods, each of which loads an HTML documentand scrapes out some information, putting strings from the HTML documents into lists and dictionaries. I have a set of these objects from which I am aggregating and returning data.

With a large number of these objects, the memory footprint is very large. The "soup" object is a local variable to each scraping method, so I assumed it would be cleaned up after the method had returned. However, I've found that using guppy, after the methods have returned most of the memory is being taken up with BeautifulSoup objects of one type or another. I'm not declaring BeautifulSoup objects anywhere else.

I've tried assigning None into the "soup" objects at the end of the method calls and calling garbage collection manually, but this doesn't seem to help. I'd like to find out exactly what object "owns" the various BeautifulSoup structures, but I'm quite a new guppy user and I can't figure out how to do this.

How do I force the memory for these soup objects to be freed? Is there antyhing else I should be looking at to find out the cause of these problems?

Peter
 
S

Steven D'Aprano

However, I've found that using guppy, after the methods have returned
most of the memory is being taken up with BeautifulSoup objects of one
type or another. I'm not declaring BeautifulSoup objects anywhere else.

What's guppy?

I've tried assigning None into the "soup" objects at the end of the
method calls and calling garbage collection manually, but this doesn't
seem to help. I'd like to find out exactly what object "owns" the
various BeautifulSoup structures, but I'm quite a new guppy user and I
can't figure out how to do this.

How do I force the memory for these soup objects to be freed? Is there
antyhing else I should be looking at to find out the cause of these
problems?

If objects aren't being garbage collected, you probably have cycles of
objects with __del__ methods. Python's reference garbage collector can't
delete objects in cycles, and the alternative garbage collector can't
delete objects with __del__ methods.

Try removing the __del__ methods, and see if that fixes the problem.

Also, see the gc module.

http://docs.python.org/library/gc.html
 
S

Steven D'Aprano

In <[email protected]>


Have you tried deleting them, using the "del" command?

del doesn't actually delete objects, it deletes names. The object won't
be deleted until the last name (or other reference) to the object is
gone. There is no way to force Python to delete an object while it is
still in use.

Normally you don't notice the difference because most objects only have a
single reference:

py> class K:
.... def __del__(self):
.... print("Goodbye cruel world!")
....
py> k = K()
py> del k
Goodbye cruel world!


But see what happens when there are multiple references to the object:

py> k = K()
py> x = k
py> y = [1, 2, x, 3]
py> z = {'spam': y}
py> del k
py> del x
py> del y
py> del z
Goodbye cruel world!

The destructor doesn't get called into the last reference is gone.
 
P

psaffrey

Thanks for all the responses.

It looks like none of the BeautifulSoup objects have __del__ methods, so I don't think that can be the problem.

To answer your other question, guppy was the best match I came up with when looking for a memory profile for Python (or more specifically "Heapy"):

http://guppy-pe.sourceforge.net/#Heapy

The destructor doesn't get called into the last reference is gone.

That makes sense, so now I need to track down why there are references to the object when I don't think there should be. Are there any systematic methods for doing this?

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top