deep cloning, how?

R

Rajinder Yadav

I am trying to figure out how to perform a deep clone


class A
attr_accessor :name
end

a1 = A.new
a1.name = "yoyoma"
a2 = a1.dup
a1.name.chop!
puts a2.name


I found the following way to write a deep clone method

class A
attr_accessor :name
def dup
Marshal::load(Marshal.dump(self))
end
end

If I wanted to write my own specialize deep_cloner, how would I do this. If I
try the obvious way to do it,

def dup
@name = self.name.dup
end

I get an error when 'puts a2.name' is executed saying:

NoMethodError: undefined method `name' for "yoyom":String


Can someone explain what's going on when dup is called. What gets passes to dup
(i assume self) and why my code is wrong?


--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely
 
B

Brian Candler

Rajinder said:
Can someone explain what's going on when dup is called. What gets passes
to dup
(i assume self) and why my code is wrong?

dup is a method of your existing object, and should return the new
object instance.

class A
attr_accessor :name
def dup
res = self.class.new
res.name = name.dup
res
end
end

a1 = A.new
a1.name = "yoyoma"
a2 = a1.dup
a1.name.chop!
puts a2.name

There is a subtle distinction between 'dup' and 'clone' which I'll leave
someone else to explain...
 
R

Robert Klemme

2009/10/14 Rajinder Yadav said:
I am trying to figure out how to perform a deep clone


class A
=A0 attr_accessor :name
end

a1 =3D A.new
a1.name =3D "yoyoma"
a2 =3D a1.dup
a1.name.chop!
puts a2.name


I found the following way to write a deep clone method

class A
=A0attr_accessor :name
=A0def dup
=A0 =A0 Marshal::load(Marshal.dump(self))
=A0end
end

If I wanted to write my own specialize deep_cloner, how would I do this. = If
I try the obvious way to do it,

def dup
=A0 @name =3D self.name.dup
end

No, this is by far not the obvious way since you would at least have
to make sure there is a copy of self and this is returned from dup. I
would rather do

def dup
copy =3D super
copy.name =3D @name.dup
copy
end

or even

def dup
self.class.new(@name.dup)
end

Although that approach is fragile depending on the code in #initialize.
I get an error when 'puts a2.name' is executed saying:

NoMethodError: undefined method `name' for "yoyom":String


Can someone explain what's going on when dup is called. What gets passes = to
dup (i assume self) and why my code is wrong?

Hope the above sheds some light.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
C

Caleb Clausen

No, this is by far not the obvious way since you would at least have
to make sure there is a copy of self and this is returned from dup. I
would rather do

def dup
copy = super
copy.name = @name.dup
copy
end

or even

def dup
self.class.new(@name.dup)
end

Although that approach is fragile depending on the code in #initialize.

The documentation of Object#dup seems to suggest that subclasses
should not override dup, preferring to override clone instead. I'm not
sure why this should be or why overriding dup would be bad. But
anyway, I would suggest this:

def deep_clone
copy=clone
[email protected]
copy
end

just so you can keep the existing semantics of clone as a shallow copy.

I'm really not sure why there are 2 methods to create shallow copies
in ruby and what all the differences are supposed to be. Other than
not overriding dup(?), the only other difference between them that I
can discover is that clone copies the metaclass of the object, whereas
dup reverts the copy's metaclass to being just its class. I've been
wondering about the difference between the 2 recently; I hope someone
out there can provide some enlightenment on why there are 2 and what
the differences are.
 
R

Robert Klemme

The documentation of Object#dup seems to suggest that subclasses
should not override dup, preferring to override clone instead.

Where do you take that from? In the docs referenced below I cannot find
anything like that. The only indication I can see is that #dup uses
#initialize_copy and we should probably override that instead of #dup
itself.
I'm not
sure why this should be or why overriding dup would be bad. But
anyway, I would suggest this:

def deep_clone
copy=clone
[email protected]
copy
end

just so you can keep the existing semantics of clone as a shallow copy.

I'm really not sure why there are 2 methods to create shallow copies
in ruby and what all the differences are supposed to be. Other than
not overriding dup(?), the only other difference between them that I
can discover is that clone copies the metaclass of the object, whereas
dup reverts the copy's metaclass to being just its class. I've been
wondering about the difference between the 2 recently; I hope someone
out there can provide some enlightenment on why there are 2 and what
the differences are.

There are more differences namely in the area of frozen and tainted state.

http://www.ruby-doc.org/core/classes/Object.html#M000351
http://www.ruby-doc.org/core/classes/Object.html#M000352

Kind regards

robert
 
P

Paul Smith

Where do you take that from? =A0In the docs referenced below I cannot fin= d
anything like that. =A0The only indication I can see is that #dup uses
#initialize_copy and we should probably override that instead of #dup
itself.


There are more differences namely in the area of frozen and tainted state=
 
R

Rajinder Yadav

Brian said:
dup is a method of your existing object, and should return the new
object instance.

class A
attr_accessor :name
def dup
res = self.class.new
res.name = name.dup
res
end
end

Brain, I am starting to see where I went wrong, this clears it up, thanks!
a1 = A.new
a1.name = "yoyoma"
a2 = a1.dup
a1.name.chop!
puts a2.name

There is a subtle distinction between 'dup' and 'clone' which I'll leave
someone else to explain...


--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely
 
R

Rajinder Yadav

Caleb said:
The documentation of Object#dup seems to suggest that subclasses
should not override dup, preferring to override clone instead. I'm not
sure why this should be or why overriding dup would be bad. But
anyway, I would suggest this:

def deep_clone
copy=clone
[email protected]
copy
end

cool, do a shallow clone and then do a specialized deep cloning, i like this! I
did not get as far as you did about dup and clone and not to redefine dup, this
is news to me but something worth looking into.
just so you can keep the existing semantics of clone as a shallow copy.

I'm really not sure why there are 2 methods to create shallow copies
in ruby and what all the differences are supposed to be. Other than
not overriding dup(?), the only other difference between them that I
can discover is that clone copies the metaclass of the object, whereas
dup reverts the copy's metaclass to being just its class. I've been
wondering about the difference between the 2 recently; I hope someone
out there can provide some enlightenment on why there are 2 and what
the differences are.


--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely
 
R

Rajinder Yadav

Robert said:
Where do you take that from? In the docs referenced below I cannot find
anything like that. The only indication I can see is that #dup uses
#initialize_copy and we should probably override that instead of #dup
itself.


There are more differences namely in the area of frozen and tainted state.

http://www.ruby-doc.org/core/classes/Object.html#M000351
http://www.ruby-doc.org/core/classes/Object.html#M000352

Thanks for the links and solutions Robert. This one got some good replies.
Kind regards

robert


--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely
 
C

Caleb Clausen

Where do you take that from? In the docs referenced below I cannot find
anything like that. The only indication I can see is that #dup uses
#initialize_copy and we should probably override that instead of #dup
itself.

I'm looking at these 2 sentences:

In general, +clone+ and +dup+ may have different
semantics in descendent classes. While +clone+ is used to duplicate
an object, including its internal state, +dup+ typically uses the
class of the descendent object to create the new instance.

Frankly, I've never been real sure what this is supposed to mean, so
my reading may well be wrong. In fact, it probably is.

initialize_copy apparently is used by both dup and clone. You're
right, that should be defined (overridden?) instead of dup/clone
themselves. I rarely remember that.

Ah, yes. But only frozen state, not tainted.
 
R

Robert Klemme

I'm looking at these 2 sentences:

In general, +clone+ and +dup+ may have different
semantics in descendent classes. While +clone+ is used to duplicate
an object, including its internal state, +dup+ typically uses the
class of the descendent object to create the new instance.

Frankly, I've never been real sure what this is supposed to mean, so
my reading may well be wrong. In fact, it probably is.

initialize_copy apparently is used by both dup and clone. You're
right, that should be defined (overridden?) instead of dup/clone
themselves. I rarely remember that.

Mee, too. :) Just for the reference

irb(main):019:0> class X
irb(main):020:1> def initialize_copy(*a) p [self,a] end
irb(main):021:1> end
=> nil
irb(main):022:0> x=X.new
=> #<X:0x8670108>
irb(main):023:0> x.dup
[#<X:0x865d9cc>, [#<X:0x8670108>]]
=> #<X:0x865d9cc>
irb(main):024:0> x.clone
[#<X:0x864ddb0>, [#<X:0x8670108>]]
=> #<X:0x864ddb0>

And it's noteworthy that frozen state is only established _after_
initialize_copy has returned:

irb(main):025:0> class X
irb(main):026:1> def initialize_copy(old)
irb(main):027:2> @x=1
irb(main):028:2> end
irb(main):029:1> end
=> nil
irb(main):030:0> X.new.freeze.clone
=> # said:
Ah, yes. But only frozen state, not tainted.

Right you are.

Kind regards

robert
 
R

Rajinder Yadav

Robert said:
I'm looking at these 2 sentences:

In general, +clone+ and +dup+ may have different
semantics in descendent classes. While +clone+ is used to duplicate
an object, including its internal state, +dup+ typically uses the
class of the descendent object to create the new instance.

Frankly, I've never been real sure what this is supposed to mean, so
my reading may well be wrong. In fact, it probably is.

initialize_copy apparently is used by both dup and clone. You're
right, that should be defined (overridden?) instead of dup/clone
themselves. I rarely remember that.

Mee, too. :) Just for the reference

irb(main):019:0> class X
irb(main):020:1> def initialize_copy(*a) p [self,a] end
irb(main):021:1> end
=> nil
irb(main):022:0> x=X.new
=> #<X:0x8670108>
irb(main):023:0> x.dup
[#<X:0x865d9cc>, [#<X:0x8670108>]]
=> #<X:0x865d9cc>
irb(main):024:0> x.clone
[#<X:0x864ddb0>, [#<X:0x8670108>]]
=> #<X:0x864ddb0>

And it's noteworthy that frozen state is only established _after_
initialize_copy has returned:

Celeb, Robert thanks for bringing this point home. I keep having to update my
Ruby notes =) .... let me try the initialize_copy way!
irb(main):025:0> class X
irb(main):026:1> def initialize_copy(old)
irb(main):027:2> @x=1
irb(main):028:2> end
irb(main):029:1> end
=> nil
irb(main):030:0> X.new.freeze.clone


Right you are.

Kind regards

robert


--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely
 
B

Brian Candler

Caleb Clausen wrote:0
In general, +clone+ and +dup+ may have different
semantics in descendent classes. While +clone+ is used to duplicate
an object, including its internal state, +dup+ typically uses the
class of the descendent object to create the new instance.

Frankly, I've never been real sure what this is supposed to mean

I *think* what it means is: clone just copies all the instance
variables, whilst dup calls self.class.new().

It's quite common for initialize() to have all sorts of side effects,
creating new objects and so on. So you can expect dup to do all this,
whilst you can expect clone to create an identical object with all the
instance variables pointing at the same objects.

Nothing enforces that of course, so it's just a convention.

The only *real* differences I can see are:
- clone also copies the frozen state of the object
- clone makes a copy of the singleton class

(whereas in dup, by default the newly-created object has an empty
singleton class; it's assumed that if there are any methods to be added
to that, your own dup method will do that for you, possibly with the
assistance of your initialize method)
initialize_copy apparently is used by both dup and clone. You're
right, that should be defined (overridden?) instead of dup/clone
themselves. I rarely remember that.

I'm not sure I agree with that. The *default* implementation of both dup
and clone does this, as it's the only reasonable thing for Object to do
without any knowledge of its subclasses. But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

Since I never use clone, it's a moot point for me as to what it should
do in a subclass.

Regards,

Brian.
 
L

lith

But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

I'd understand the description in such a way that user should
override
neither #dup not #clone but instead create a #initialize_copy method
to
implement anything class-specific (including a non-shallow copy).
Since
that method is called by #clone and #dup and the frozen/tainted state
could be easily reset, I personally still don't quite understand why
there are two methods.
 
R

Robert Klemme

Brian, I disagree. The proper way is to implement #initialize_copy.
That way you can make sure you do not get aliasing effects even if
source and copy are frozen because in #initialize_copy frozen state is
not applied.
I'd understand the description in such a way that user should
override
neither #dup not #clone but instead create a #initialize_copy method
to
implement anything class-specific (including a non-shallow copy).

Also for shallow copy in order to avoid aliasing! IMHO a proper setup
looks like this:

class A
attr_reader :x
attr_accessor :y

def initialize
@x = []
@y = 10
end

def initialize_copy(source)
super
# p self
@x = source.x.dup
end
end

class B < A
attr_accessor :z

def initialize
super()
@z = {}
end

def initialize_copy(source)
super
@z = source.z.dup
end
end

Note that the copy is initialized with the same set of references when
entering #initialize_copy so you need only deal with members that
could cause aliasing issues (unfrozen strings and collections for
example).
Since
that method is called by #clone and #dup and the frozen/tainted state
could be easily reset, I personally still don't quite understand why
there are two methods.

You cannot reset frozen state - for good reasons.

Kind regards

robert
 
C

Caleb Clausen

Caleb Clausen wrote:0

I *think* what it means is: clone just copies all the instance
variables, whilst dup calls self.class.new().

It's quite common for initialize() to have all sorts of side effects,
creating new objects and so on. So you can expect dup to do all this,
whilst you can expect clone to create an identical object with all the
instance variables pointing at the same objects.

Object#dup does not call new; I think it's more like:
self.class.allocate.initialize_copy(self). See what happens here:

irb(main):001:0> class K
irb(main):002:1> def initialize
irb(main):003:2> p :initialize
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> k=K.new
:initialize
=> #<K:0xb7ce8ee0>
irb(main):008:0> k2=k.dup
=> #<K:0xb7ce0f38>

My reading of those 2 sentences I quoted has now changed. Now I
believe that all it's saying is that clone copies the singleton class
whereas dup reverts the copy to the object's original class. Tho I
still don't fully understand what 'internal state' is supposed to
mean. Are instance variables not part of the internal state? Yet both
dup and clone copy them.
I'm not sure I agree with that. The *default* implementation of both dup
and clone does this, as it's the only reasonable thing for Object to do
without any knowledge of its subclasses. But I think the spirit of dup
described above is that dup defined in a subclass should initialize it
using its constructor.

Since I never use clone, it's a moot point for me as to what it should
do in a subclass.

I never used to use clone either, til I discovered a case where I
needed to copy the singleton class. Now I'm of the opinion that one
should default to clone when a copy is needed, and fall back to dup
only when clone is unsuitable.
 
R

Rajinder Yadav

lith said:
I'd understand the description in such a way that user should
override
neither #dup not #clone but instead create a #initialize_copy method
to
implement anything class-specific (including a non-shallow copy).
Since
that method is called by #clone and #dup and the frozen/tainted state
could be easily reset, I personally still don't quite understand why
there are two methods.

This is exactly the approach I am now following after the various discussions
and insights. I just leave dup and clone to keep their *default* behavior
intact, so they both end up calling initialize_copy and you don't get some
bizarre Frankenstein clone if you were to redefine dup or clone. I am thinking
about someone else using my code and what will cause less headache for them in
the end.

--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely
 
R

Rajinder Yadav

Robert said:
Brian, I disagree. The proper way is to implement #initialize_copy.
That way you can make sure you do not get aliasing effects even if
source and copy are frozen because in #initialize_copy frozen state is
not applied.


Also for shallow copy in order to avoid aliasing! IMHO a proper setup
looks like this:

Robert, I like this setup, thanks for the sample code to look over, just
discovered why adding 'super' is important, which was missing from my notes and
an oversight on my part.

It is sufficient to call 'super' and not 'super source'? if you are passing
stuff up the hierarchy construction chain.

I am going to conjecture 'super' ends up becoming 'super self', which make sense
because the parent constructor don't care about sub class data members. Does
that make any sense to you?

class A
attr_reader :x
attr_accessor :y

def initialize
@x = []
@y = 10
end

def initialize_copy(source)
super
# p self
@x = source.x.dup
end
end

class B < A
attr_accessor :z

def initialize
super()
@z = {}
end

def initialize_copy(source)
super
@z = source.z.dup
end
end

Note that the copy is initialized with the same set of references when
entering #initialize_copy so you need only deal with members that
could cause aliasing issues (unfrozen strings and collections for
example).
Since
that method is called by #clone and #dup and the frozen/tainted state
could be easily reset, I personally still don't quite understand why
there are two methods.

You cannot reset frozen state - for good reasons.

Kind regards

robert


--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely
 
R

Robert Klemme

Robert, I like this setup, thanks for the sample code to look over, just
discovered why adding 'super' is important, which was missing from my notes and
an oversight on my part.

It is sufficient to call 'super' and not 'super source'? if you are passing
stuff up the hierarchy construction chain.

You seem to be mixing two things: super in #initialize and
#initialize_copy. In #initialize_copy you can simply write "super"
(without brackets) because that will make sure the argument list is
propagated. You can do this because #initialize_copy will always only
have one argument, the object that was duped / cloned.

In the constructor I explicitly wrote "super()" because the super class
#initialize does not have arguments and "super" will break as soon as
you add parameters to the sub class constructor. Of course, if you
change both classes in parallel you can stick with "super".
I am going to conjecture 'super' ends up becoming 'super self', which make sense
because the parent constructor don't care about sub class data members. Does
that make any sense to you?

No. Neither for #initialize nor for #initialize_copy you want self as
argument to super.

Kind regards

robert
 
R

Rick DeNatale

Object#dup does not call new; I think it's more like:
self.class.allocate.initialize_copy(self). See what happens here:

irb(main):001:0> class K
irb(main):002:1> def initialize
irb(main):003:2> p :initialize
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> k=K.new
:initialize
=> #<K:0xb7ce8ee0>
irb(main):008:0> k2=k.dup
=> #<K:0xb7ce0f38>

And clone doesn't call initialize EITHER:

class A
def initialize(iv)
@iv = iv
puts "initialize called"
end

def initialize_copy(arg)
puts "initialize copy called, my iv is #{@iv}"

end
end

puts "Creating original"
a = A.new(42)
puts "calling dup"
a1 = a.dup
puts "calling clone"
a2 = a.clone

outputs

Creating original
initialize called
calling dup
initialize copy called, my iv is 42
calling clone
initialize copy called, my iv is 42

It you look at the source code in object.c It becomes apparent that
Object#dup and Object#clone do pretty much the same thing except for
propagating the frozen bit and singleton classes:

VALUE
rb_obj_clone(obj)
VALUE obj;
{
VALUE clone;

if (rb_special_const_p(obj)) {
rb_raise(rb_eTypeError, "can't clone %s", rb_obj_classname(obj));
}
clone = rb_obj_alloc(rb_obj_class(obj));
RBASIC(clone)->klass = rb_singleton_class_clone(obj);
RBASIC(clone)->flags = (RBASIC(obj)->flags | FL_TEST(clone,
FL_TAINT)) & ~(FL_FREEZE|FL_FINALIZE);
init_copy(clone, obj);
RBASIC(clone)->flags |= RBASIC(obj)->flags & FL_FREEZE;

return clone;
}

VALUE
rb_obj_dup(obj)
VALUE obj;
{
VALUE dup;

if (rb_special_const_p(obj)) {
rb_raise(rb_eTypeError, "can't dup %s", rb_obj_classname(obj));
}
dup = rb_obj_alloc(rb_obj_class(obj));
init_copy(dup, obj);

return dup;
}
static void
init_copy(dest, obj)
VALUE dest, obj;
{
if (OBJ_FROZEN(dest)) {
rb_raise(rb_eTypeError, "[bug] frozen object (%s) allocated",
rb_obj_classname(dest));
}
RBASIC(dest)->flags &= ~(T_MASK|FL_EXIVAR);
RBASIC(dest)->flags |= RBASIC(obj)->flags & (T_MASK|FL_EXIVAR|FL_TAINT);
if (FL_TEST(obj, FL_EXIVAR)) {
rb_copy_generic_ivar(dest, obj);
}
rb_gc_copy_finalizer(dest, obj);
switch (TYPE(obj)) {
case T_OBJECT:
case T_CLASS:
case T_MODULE:
if (ROBJECT(dest)->iv_tbl) {
st_free_table(ROBJECT(dest)->iv_tbl);
ROBJECT(dest)->iv_tbl = 0;
}
if (ROBJECT(obj)->iv_tbl) {
ROBJECT(dest)->iv_tbl = st_copy(ROBJECT(obj)->iv_tbl);
}
}
rb_funcall(dest, id_init_copy, 1, obj);
}


This code is from 1.8.6 just cuz that's what I happened to grab.

In both cases the same subroutine is used to create the state of the
new object prior to calling intialize_copy and that subroutine
basically allocates the new object, copies instance variables "under
the table" and then invokes initialize_copy, no initialize method is
ever called on the result object.

Which makes me thing that the whole "+dup+ typically uses the class
of the descendent object to create the new instance" is meaningless,
or untrue. Probably this is a vestige of an older implementation.


--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top