Semi-newbie, rolling my own __deepcopy__

L

ladasky

Hi, folks,

First, the obligatory cheerleading -- then, my questions...

I love Python! I am only an occasional programmer. Still, the logic
of the language is clear enough that I can retain pretty much all that
I have learned from one infrequent programming session to the next.
That's quite an accomplishment for a language this powerful. Also, I'm
finally beginning to grasp OOP. I could never quite get the hang of it
in C++ or Java. Recently, I discovered __getitem__ and pickle. Oh,
yeah.

Anyway, my present problem is that I want to make copies of instances
of my own custom classes. I'm having a little trouble understanding
the process. Not that I think that it matters -- but in case it does,
I'll tell you that I'm running Python 2.3.4 on a Win32 machine.

I started naively, thinking that I could just call copy.deepcopy() and
be done with it. After getting a TypeError from the interpreter, I
read the deepcopy docs and discovered that I need to implement a
__deepcopy__ method in my class. But the docs are a bit vague here.
What exactly should this __deepcopy__ do? I tried looking for examples
of __deepcopy__ code on the Net, but I'm not quite understanding what
I'm finding there. I guess that I'm getting deeper into the guts of
Python than I planned.

AFAIK, I'm supposed to add a "def __deepcopy__(self, memo):" to my
class definition. This will get called when I invoke
copy.deepcopy(myObject). The object to be copied is self, I presume.
What exactly is memo? The docs say that it's a dictionary which "keeps
track of what has already been copied." Somewhere I remember reading
that the namespace of an object is a dictionary. So is memo identical
to the dictionary of the new object that I'm trying to create? What
exactly do I add to memo? I think that I should make shallow copies of
methods, but deep copies of data structures (the contents of which I'm
likely to change). Do I iterate through and copy the items in
dir(self)? Do I update memo manually, or does passing memo into copy()
or deepcopy() automatically update memo's contents? Are there any
items that I *shouldn't* copy from self to memo? Should __deepcopy__
return memo?

Sorry for all the confusion -- and thanks for your help!
 
M

Michael Spencer

Hi, folks,

First, the obligatory cheerleading -- then, my questions...

I love Python! I am only an occasional programmer. Still, the logic
of the language is clear enough that I can retain pretty much all that
I have learned from one infrequent programming session to the next.
That's quite an accomplishment for a language this powerful. Also, I'm
finally beginning to grasp OOP. I could never quite get the hang of it
in C++ or Java. Recently, I discovered __getitem__ and pickle. Oh,
yeah.

Anyway, my present problem is that I want to make copies of instances
of my own custom classes. I'm having a little trouble understanding
the process. Not that I think that it matters -- but in case it does,
I'll tell you that I'm running Python 2.3.4 on a Win32 machine.

I started naively, thinking that I could just call copy.deepcopy() and
be done with it. After getting a TypeError from the interpreter, I
read the deepcopy docs and discovered that I need to implement a
__deepcopy__ method in my class. But the docs are a bit vague here.
What exactly should this __deepcopy__ do? I tried looking for examples
of __deepcopy__ code on the Net, but I'm not quite understanding what
I'm finding there. I guess that I'm getting deeper into the guts of
Python than I planned.

AFAIK, I'm supposed to add a "def __deepcopy__(self, memo):" to my
class definition. This will get called when I invoke
copy.deepcopy(myObject). The object to be copied is self, I presume.
What exactly is memo? The docs say that it's a dictionary which "keeps
track of what has already been copied." Somewhere I remember reading
that the namespace of an object is a dictionary. So is memo identical
to the dictionary of the new object that I'm trying to create? What
exactly do I add to memo? I think that I should make shallow copies of
methods, but deep copies of data structures (the contents of which I'm
likely to change). Do I iterate through and copy the items in
dir(self)? Do I update memo manually, or does passing memo into copy()
or deepcopy() automatically update memo's contents? Are there any
items that I *shouldn't* copy from self to memo? Should __deepcopy__
return memo?

Sorry for all the confusion -- and thanks for your help!

--
Rainforest laid low.
"Wake up and smell the ozone,"
Says man with chainsaw.
John J. Ladasky Jr., Ph.D.
If you google for:
python __deepcopy__ cookbook
you will find a couple of examples of this method in use, among them:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259179

class deque(object):

def __init__(self, iterable=()):
if not hasattr(self, 'data'):
self.left = self.right = 0
self.data = {}
self.extend(iterable)

[...snip methods...]

def __deepcopy__(self, memo={}):
from copy import deepcopy
result = self.__class__()
memo[id(self)] = result
result.__init__(deepcopy(tuple(self), memo))
return result

HTH
Michael
 
S

Steven Bethard

Michael said:
def __deepcopy__(self, memo={}):
from copy import deepcopy
result = self.__class__()
memo[id(self)] = result
result.__init__(deepcopy(tuple(self), memo))
return result

I know this is not your recipe, but is there any reason to use
self.__class__()
instead of
type(self)()
if you know you're inside a new-style class?

STeVe
 
M

Michael Spencer

Steven said:
Michael said:
def __deepcopy__(self, memo={}):
from copy import deepcopy
result = self.__class__()
memo[id(self)] = result
result.__init__(deepcopy(tuple(self), memo))
return result


I know this is not your recipe, but is there any reason to use
self.__class__()
instead of
type(self)()
if you know you're inside a new-style class?

STeVe
I don't know - aren't they identical? I would write self.__class__ (without
claiming that that's better)

BTW, I had a different question about the method:

wouldn't:
result = self.__class__.__new__()
or in your form:
result = type(self).__new__()

be better (i.e., clearer and possibly safer) than calling __init__ twice (but I
haven't tried it!)

Michael
 
L

ladasky

Michael said:
(e-mail address removed) wrote: [snip]
Anyway, my present problem is that I want to make copies of instances
of my own custom classes. I'm having a little trouble understanding
the process. Not that I think that it matters -- but in case it does,
I'll tell you that I'm running Python 2.3.4 on a Win32 machine.
[snip]
If you google for:
python __deepcopy__ cookbook
you will find a couple of examples of this method in use, among them:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259179

1 class deque(object):
2
3 def __init__(self, iterable=()):
4 if not hasattr(self, 'data'):
5 self.left = self.right = 0
6 self.data = {}
7 self.extend(iterable)

[...snip methods...]

8 def __deepcopy__(self, memo={}):
9 from copy import deepcopy
10 result = self.__class__()
11 memo[id(self)] = result
12 result.__init__(deepcopy(tuple(self), memo))
13 return result

HTH
Michael

Hi, Michael,

I was indeed finding code like this in my web searches, though not this
particular example. I'm glad that you cut out code that is irrelevant
to the deepcopy operation. Still, I want to understand what is going
on here, and I don't. I've numbered the lines in your example.

So, entering deepcopy, I encounter the first new concept (for me) on
line 10. We obtain the class/type of self. On line 11 we create a
dictionary item in memo, [id(self):type(self)]. So now I'm confused as
to the purpose of memo. Why should it contain the ID of the *original*
object?

Things get even stranger for me on line 12. Working from the inside of
the parentheses outward, an attempt is made to convert self to a tuple.
Shouldn't this generate a TypeError when given a complex, non-iterable
item like the deque class? I just tried running one of my programs,
which assigns the name "x" to one of my custom objects, and when I
execute tuple(x), a TypeError is what I get.

Anyway, I like what I think I see when you finally call __init__ for
the copied object, on lines 4-6. If __deepcopy__ calls __init__, then
the new deque object should already have a "data" attribute. So "data"
is only initialized if it doesn't already exist.

I do not understand whether the "iterable" variable, which appears on
lines 3 and 7, is relevant to the copying operation. Line 12 calls
__init__ with a single argument, not two, so iterable should be
declared as an empty tuple. Again I see what I think should generate a
TypeError on line 7, when an attempt is made to extend the non-sequence
object "self" with the iterable tuple.

Is there another section of the Python docs that will clarify all this
for me? I got hung up on all the "static", "public", etc. declarations
in Java early on. Python has taken me an awful lot farther, but
perhaps I'm hitting the conceptual wall again.
 
S

Steven Bethard

Michael said:
(e-mail address removed) wrote:
[snip]
Anyway, my present problem is that I want to make copies of
instances
of my own custom classes. I'm having a little trouble
understanding
the process. Not that I think that it matters -- but in case it
does,
I'll tell you that I'm running Python 2.3.4 on a Win32 machine.
[snip]

If you google for:
python __deepcopy__ cookbook
you will find a couple of examples of this method in use, among them:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/259179

1 class deque(object):
2
3 def __init__(self, iterable=()):
4 if not hasattr(self, 'data'):
5 self.left = self.right = 0
6 self.data = {}
7 self.extend(iterable)

[...snip methods...]

8 def __deepcopy__(self, memo={}):
9 from copy import deepcopy
10 result = self.__class__()
11 memo[id(self)] = result
12 result.__init__(deepcopy(tuple(self), memo))
13 return result
So, entering deepcopy, I encounter the first new concept (for me) on
line 10. We obtain the class/type of self. On line 11 we create a
dictionary item in memo, [id(self):type(self)]. So now I'm confused as
to the purpose of memo. Why should it contain the ID of the *original*
object?

Note that you obtain *and then call* the class/type of self:

py> i, s = 100, 'string'
py> i.__class__, s.__class__
(<type 'int'>, <type 'str'>)
py> i.__class__(), s.__class__()
(0, '')

So you've created a new object calling the constructor with no
arguments. So 'result' is a new instance of the class.
Things get even stranger for me on line 12. Working from the inside of
the parentheses outward, an attempt is made to convert self to a tuple.
Shouldn't this generate a TypeError when given a complex, non-iterable
item like the deque class? I just tried running one of my programs,
which assigns the name "x" to one of my custom objects, and when I
execute tuple(x), a TypeError is what I get.

tuple(x) works on any object that defines __iter__:

py> class C(object):
.... def __iter__(self):
.... yield 1
.... yield 2
.... yield 5
....
py> tuple(C())
(1, 2, 5)

So line 12 is making a tuple of the values in the object, and then
calling result's __init__ again, this time with (a deepcopied) tuple as
an argument. This adds the content to the previously uninitialized object.

HTH,

STeVe
 
M

Michael Spencer

(e-mail address removed) wrote:
....
I see Steve Bethard has answered most of the points in your last eMail
On line 11 we create a
dictionary item in memo, [id(self):type(self)]...So now I'm confused as
to the purpose of memo. Why should it contain the ID of the *original*
object?

No, you create memo[id(self):type(self)()] i.e., a mapping of the old object to
a new object of the same type. This new object does not yet contain the data of
the old one, because you called its __init__ method with no arguments.

BTW, as I mentioned in a previous comment, I believe this would be more plainly
written as type(self).__new__(), to emphasize that you are constructing the
object without initializing it. (There is a explanation of __new__'s behaviour
at http://www.python.org/2.2/descrintro.html#__new__). This works only for
new-style classes (also explained in the same reference). For old-style
'classic' classes you would use new types.InstanceType

....
Is there another section of the Python docs that will clarify all this
for me? I got hung up on all the "static", "public", etc. declarations
in Java early on. Python has taken me an awful lot farther, but
perhaps I'm hitting the conceptual wall again.

I assume you have seen http://docs.python.org/lib/module-copy.html which
explains this topic and, in particular, why the memo dict is passed around.

Hang in there. It's not an easy topic, and this example contains several
subtleties as you have discovered.

If you want further help, why not post the actual class you are working with,
and your __deepcopy__ attempt. It would be much easier to react to something
concrete than a general howto.

Michael
 
S

Steven Bethard

Michael said:
BTW, as I mentioned in a previous comment, I believe this would be more
plainly written as type(self).__new__(), to emphasize that you are
constructing the object without initializing it. (There is a
explanation of __new__'s behaviour at
http://www.python.org/2.2/descrintro.html#__new__).

There is also now documentation in the standard location:

http://docs.python.org/ref/customization.html

And just to clarify Michael's point here, writing this as __new__ means
that __init__ is not called twice:

py> class C(object):
.... def __new__(cls):
.... print '__new__'
.... return super(C, cls).__new__(cls)
.... def __init__(self):
.... print '__init__'
....
py> c = C()
__new__
__init__
py> c2 = type(c)(); c2.__init__()
__new__
__init__
__init__
py> c3 = type(c).__new__(C); c3.__init__()
__new__
__init__

But definitely check the docs for more information on __new__. Some of
the interworkings are kind of subtle.

STeVe
 
L

ladasky

I'm back...

Thanks to Michael Spencer and Steven Bethard for their excellent help.
It has taken me a few sessions of reading, and programming, and I've
had to pick up the exploded fragments of my skull from time to time.
But I now have succeeded in making deepcopy work for a simple class
that I wrote myself. Unfortunately, I'm still getting errors when I
try to copy the object I really want. This will be a long post
including code examples and tracebacks, so please bear with me.

First things first. For days, I didn't realize or appreciate that
there is a difference between "class C:" and "class C(object):".
Making this one change, made the difference between working code and
incomprehensible exceptions being thrown by the interpreter. Where can
I read a description of the 'object' class? If subclassing 'object' is
necessary to make common functions like __new__ and deepcopy work out
of the box, why doesn't Chapter 9 of the Python tutorial discuss this?

Now, here's some working code, with minimal distractions from the main
point, which is the deepcopy process:

----------------------------------------------------------------------------

from copy import deepcopy
from random import randint
requiredItems = ["a", "b", "c"]

class Test(object):

def __init__(self, items = {}):
self.addedItems = []
for key in items:
if not hasattr(self, key):
setattr(self, key, items[key])
if key not in requiredItems:
self.addedItems.append(key)
for var in requiredItems:
if not hasattr(self, var):
setattr(self, var, randint(1,5))

def contents(self):
items = {}
for key in (requiredItems + self.addedItems):
items[key] = getattr(self, key)
return items

def __deepcopy__(self, memo={}):
newTestObj = Test.__new__(Test)
memo[id(self)] = newTestObj
newTestObj.__init__(deepcopy(self.contents(), memo))
return newTestObj

def show(self):
report = ""
for x in (requiredItems + self.addedItems):
report=report+" "+str(x)+" = "+str(getattr(self,x))
return report

foo = Test()
bar = Test({"c":10})
snafu = Test({"c":20, "d":30})
print "foo:", foo.show()
print "bar:", bar.show()
print "snafu:", snafu.show()
clone = deepcopy(foo)
print "clone, should be deepcopy of foo:", clone.show()
clone.b = 40
print "clone, should be changed:", clone.show()
print "foo, should NOT change:", foo.show()

----------------------------------------------------------------------------

Here's a sample output from the above program:
foo: a = 2 b = 5 c = 2
bar: a = 3 b = 1 c = 10
snafu: a = 2 b = 2 c = 20 d = 30
clone, should be deepcopy of foo: a = 2 b = 5 c = 2
clone, should be changed: a = 2 b = 40 c = 2
foo, should NOT change: a = 2 b = 5 c = 2
Exit code: 0

You can see that I'm playing with the extension of the objects by
passing dictionaries containing novel attributes. You can ignore that,
though doing so was in fact my stepping stone to the useful 'contents'
method. I've come to appreciate that there are several ways to pass
attributes between objects. I like this dictionary approach, because
it explicitly passes the names of the attributes along with their
values.

It took me a while to figure out this expression...

newTestObj = Test.__new__(Test)

That's interesting, and also a bit scary. It suggests that you can ask
the Test object/class to create a new object of some class *other than
Test*! Is this because '__new__' is actually a method of the 'object'
class, and not of 'Test'? Or are there some strange games to be played
here? (What the heck are metaclasses? Are they relevant here? Do I
*really* want to know?)

Anyway, this test program works fine when the attributes that I want to
copy are numeric objects, lists or strings. But the real object that I
want to copy also contains *arrays* as attributes. When I change the
program thus...

Near the top with the other import statements, add:
from array import array
from random import random

Change the last line of __init__ to read:
setattr(self, var, array("d", [random() for x in range(3)]))

.... I am provoking an exception from deep within Python's guts. New
instances are created fine, but they aren't getting deepcopied:
foo: a = array('d', [0.52605955021044, 0.48584632687459, etc.
bar: a = array('d', [0.91072903604066, 0.63424430516644, etc.
snafu: a = array('d', [0.63255804677449, 0.67492348886257, etc.

Traceback (most recent call last):
File "deepcopy experiment.py", line 66, in ?
clone = deepcopy(foo)
File "C:\Program Files\Python2_3\lib\copy.py", line 190,
in deepcopy
y = copier(memo)
File "deepcopy experiment.py", line 43, in __deepcopy__
newTestObj.__init__(deepcopy(self.contents(), memo))
File "C:\Program Files\Python2_3\lib\copy.py", line 179,
in deepcopy
y = copier(x, memo)
File "C:\Program Files\Python2_3\lib\copy.py", line 270,
in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Program Files\Python2_3\lib\copy.py", line 206,
in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "C:\Program Files\Python2_3\lib\copy.py", line 338,
in _reconstruct
y = callable(*args)
File "C:\Program Files\Python2_3\lib\copy_reg.py", line 92,
in __newobj__
return cls.__new__(cls, *args)
TypeError: array() takes at least 1 argument (0 given)
Exit code: 1

I have also tried this with an object with mixed attributes, which are
bundled into a dictionary with numerics first, and then an array (code
not shown). In this case, deepcopy works fine on the numerics, and
does not throw the TypeError until the array is reached.

Does the array class not implement deepcopy correctly? Is this a
language bug, or do I need to do something extra with arrays? Maybe I
should just tolerate the overhead, junk the arrays, and revert to using
lists?

I'm SO close to having this work -- awaiting your sage advice once
again...
 
M

Michael Spencer

I'm back...
[wondering why copy.deepcopy barfs on array instances]
http://www.python.org/doc/2.3.3/lib/module-copy.html
deepcopy:
....
This version does not copy types like module, class, function, method, stack
trace, stack frame, file, socket, window, *array*, or any similar types.
....

which you can see, by doing:
>>> a = array("d",[random() for x in range(3)])
>>> b = deepcopy(a)
Traceback (most recent call last):
File "<input>", line 1, in ?
File "C:\Python24\lib\copy.py", line 172, in deepcopy
y = copier(memo)
TypeError: __deepcopy__() takes no arguments (1 given)

In any case there's is no difference between deep and shallow - copying an
array, since it can contain only scalars, rather than compound objects:

http://www.python.org/doc/2.3.3/lib/module-copy.html
The difference between shallow and deep copying is only relevant for
compound objects (objects that contain other objects, like lists or
class instances).


array does have a __deepcopy__ method, albeit not compatible with copy.deepcopy.
You can use this to make the (shallow) copy.


Michael
 
M

Michael Spencer

Michael said:
http://www.python.org/doc/2.3.3/lib/module-copy.html
deepcopy:
...
This version does not copy types like module, class, function, method,
stack trace, stack frame, file, socket, window, *array*, or any similar
types.
...
On reflection, I realize that this says that the array type is not
deep-copyable, not array instances. Still, it does appear that array instances
don't play nicely with deepcopy

It appears that if you want to deepcopy an object that may contain arrays,
you're going to have to 'roll your own' deep copier. Something like this would
do it:

class Test1(object):

def __init__(self, **items):
self.items = items

def __deepcopy__(self, memo={}):
mycls = self.__class__
newTestObj = mycls.__new__(mycls)
memo[id(self)] = newTestObj

# We need a deep copy of a dictionary, that may contain
# items that cannot be deep copied. The following code
# emulates copy._deepcopy_dict, so it should act the same
# way as deepcopy does.

x = self.items
y = {}
memo[id(x)] = y
for key, value in x.iteritems():
try:
newkey = deepcopy(key, memo)
except TypeError:
newkey = copy(key)
try:
newvalue = deepcopy(value, memo)
except TypeError:
newvalue = copy(value)
y[newkey] = newvalue

newTestObj.__init__(**y)
return newTestObj

def __repr__(self):
return '%s object at %s: %s' % (self.__class__.__name__
, hex(id(self)), self.items)
>>> t = Test1(a=array("d",[1,2,3]))
>>> t Test1 object at 0x196c7f0: {'a': array('d', [1.0, 2.0, 3.0])}
>>> t1 = deepcopy(t)
>>> t1 Test1 object at 0x1b36b50: {'a': array('d', [1.0, 2.0, 3.0])}
>>>

BTW: are you sure you really need to copy those arrays?

Michael
 
L

ladasky

Michael said:
It appears that if you want to deepcopy an object that may contain arrays,
you're going to have to 'roll your own' deep copier. Something like this
would do it:

[method deleted]

Whew! Michael, that was way more than I bargained for. New issues for
me: the "**" notation, and the __repr__ method. I'll have a look at
this -- but meanwhile, I converted my arrays over to lists. My program
is now working. Although it's a bit slower than it would be with
arrays, I'll live with the performance hit for now, because I can do
the deepcopy operation without further fuss.
BTW: are you sure you really need to copy those arrays?

Yes, I do. The arrays (now lists) contain the weights for a customized
neural net class that I wrote. The process of evolving the neural nets
is to mutate copies of them. I want to keep the original nets until
I'm sure I want to discard them, so copying is required.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top