deep cloning, how?

R

Robert Klemme

Object#dup does not call new; I think it's more like:
self.class.allocate.initialize_copy(self). See what happens here:

irb(main):001:0> class K
irb(main):002:1> def initialize
irb(main):003:2> p :initialize
irb(main):004:2> end
irb(main):005:1> end
=> nil
irb(main):006:0> k=K.new
:initialize
=> #<K:0xb7ce8ee0>
irb(main):008:0> k2=k.dup
=> #<K:0xb7ce0f38>

And clone doesn't call initialize EITHER:

class A
def initialize(iv)
@iv = iv
puts "initialize called"
end

def initialize_copy(arg)
puts "initialize copy called, my iv is #{@iv}"

end
end

puts "Creating original"
a = A.new(42)
puts "calling dup"
a1 = a.dup
puts "calling clone"
a2 = a.clone

outputs

Creating original
initialize called
calling dup
initialize copy called, my iv is 42
calling clone
initialize copy called, my iv is 42

It you look at the source code in object.c It becomes apparent that
Object#dup and Object#clone do pretty much the same thing except for
propagating the frozen bit and singleton classes:

VALUE
rb_obj_clone(obj)
VALUE obj;
{
VALUE clone;

if (rb_special_const_p(obj)) {
rb_raise(rb_eTypeError, "can't clone %s", rb_obj_classname(obj));
}
clone = rb_obj_alloc(rb_obj_class(obj));
RBASIC(clone)->klass = rb_singleton_class_clone(obj);
RBASIC(clone)->flags = (RBASIC(obj)->flags | FL_TEST(clone,
FL_TAINT)) & ~(FL_FREEZE|FL_FINALIZE);
init_copy(clone, obj);
RBASIC(clone)->flags |= RBASIC(obj)->flags & FL_FREEZE;

return clone;
}

VALUE
rb_obj_dup(obj)
VALUE obj;
{
VALUE dup;

if (rb_special_const_p(obj)) {
rb_raise(rb_eTypeError, "can't dup %s", rb_obj_classname(obj));
}
dup = rb_obj_alloc(rb_obj_class(obj));
init_copy(dup, obj);

return dup;
}
static void
init_copy(dest, obj)
VALUE dest, obj;
{
if (OBJ_FROZEN(dest)) {
rb_raise(rb_eTypeError, "[bug] frozen object (%s) allocated",
rb_obj_classname(dest));
}
RBASIC(dest)->flags &= ~(T_MASK|FL_EXIVAR);
RBASIC(dest)->flags |= RBASIC(obj)->flags & (T_MASK|FL_EXIVAR|FL_TAINT);
if (FL_TEST(obj, FL_EXIVAR)) {
rb_copy_generic_ivar(dest, obj);
}
rb_gc_copy_finalizer(dest, obj);
switch (TYPE(obj)) {
case T_OBJECT:
case T_CLASS:
case T_MODULE:
if (ROBJECT(dest)->iv_tbl) {
st_free_table(ROBJECT(dest)->iv_tbl);
ROBJECT(dest)->iv_tbl = 0;
}
if (ROBJECT(obj)->iv_tbl) {
ROBJECT(dest)->iv_tbl = st_copy(ROBJECT(obj)->iv_tbl);
}
}
rb_funcall(dest, id_init_copy, 1, obj);
}


This code is from 1.8.6 just cuz that's what I happened to grab.

In both cases the same subroutine is used to create the state of the
new object prior to calling intialize_copy and that subroutine
basically allocates the new object, copies instance variables "under
the table" and then invokes initialize_copy, no initialize method is
ever called on the result object.

Which makes me thing that the whole "+dup+ typically uses the class
of the descendent object to create the new instance" is meaningless,
or untrue. Probably this is a vestige of an older implementation.

That's what I'd guess, too. Basically the documentation should state
for #dup and #clone something like this: "It is not normally necessary
to override this method in subclasses. Customization of copying is done
via method #initialize_copy."

Kind regards

robert
 
R

Rajinder Yadav

Rick said:
And clone doesn't call initialize EITHER:

I had made this assumption, otherwise it would have made more sense to
overload initialize to accept the source object that gets passed to
initialize_copy ... the code would be ugly as you would need to do type checking
at runtime ( if iv.class = A using your sample ) to execute the correct code.

my c++ and copy constructor concept got in the way earlier, ruby doesn't quite
do what a c++ developer would expect, the initialize_copy is a cleaner way to do
this lacking static type checking

thanks for the sample code to validate this point
class A
def initialize(iv)
@iv = iv
puts "initialize called"
end

def initialize_copy(arg)
puts "initialize copy called, my iv is #{@iv}"

end
end

puts "Creating original"
a = A.new(42)
puts "calling dup"
a1 = a.dup
puts "calling clone"
a2 = a.clone

outputs

Creating original
initialize called
calling dup
initialize copy called, my iv is 42
calling clone
initialize copy called, my iv is 42

It you look at the source code in object.c It becomes apparent that
Object#dup and Object#clone do pretty much the same thing except for
propagating the frozen bit and singleton classes:

VALUE
rb_obj_clone(obj)
VALUE obj;
{
VALUE clone;

if (rb_special_const_p(obj)) {
rb_raise(rb_eTypeError, "can't clone %s", rb_obj_classname(obj));
}
clone = rb_obj_alloc(rb_obj_class(obj));
RBASIC(clone)->klass = rb_singleton_class_clone(obj);
RBASIC(clone)->flags = (RBASIC(obj)->flags | FL_TEST(clone,
FL_TAINT)) & ~(FL_FREEZE|FL_FINALIZE);
init_copy(clone, obj);
RBASIC(clone)->flags |= RBASIC(obj)->flags & FL_FREEZE;

return clone;
}

VALUE
rb_obj_dup(obj)
VALUE obj;
{
VALUE dup;

if (rb_special_const_p(obj)) {
rb_raise(rb_eTypeError, "can't dup %s", rb_obj_classname(obj));
}
dup = rb_obj_alloc(rb_obj_class(obj));
init_copy(dup, obj);

return dup;
}
static void
init_copy(dest, obj)
VALUE dest, obj;
{
if (OBJ_FROZEN(dest)) {
rb_raise(rb_eTypeError, "[bug] frozen object (%s) allocated",
rb_obj_classname(dest));
}
RBASIC(dest)->flags &= ~(T_MASK|FL_EXIVAR);
RBASIC(dest)->flags |= RBASIC(obj)->flags & (T_MASK|FL_EXIVAR|FL_TAINT);
if (FL_TEST(obj, FL_EXIVAR)) {
rb_copy_generic_ivar(dest, obj);
}
rb_gc_copy_finalizer(dest, obj);
switch (TYPE(obj)) {
case T_OBJECT:
case T_CLASS:
case T_MODULE:
if (ROBJECT(dest)->iv_tbl) {
st_free_table(ROBJECT(dest)->iv_tbl);
ROBJECT(dest)->iv_tbl = 0;
}
if (ROBJECT(obj)->iv_tbl) {
ROBJECT(dest)->iv_tbl = st_copy(ROBJECT(obj)->iv_tbl);
}
}
rb_funcall(dest, id_init_copy, 1, obj);
}


This code is from 1.8.6 just cuz that's what I happened to grab.

In both cases the same subroutine is used to create the state of the
new object prior to calling intialize_copy and that subroutine
basically allocates the new object, copies instance variables "under
the table" and then invokes initialize_copy, no initialize method is
ever called on the result object.

Which makes me thing that the whole "+dup+ typically uses the class
of the descendent object to create the new instance" is meaningless,
or untrue. Probably this is a vestige of an older implementation.


--
Kind Regards,
Rajinder Yadav

http://DevMentor.org
Do Good ~ Share Freely
 
B

Brian Candler

Robert said:
Brian, I disagree. The proper way is to implement #initialize_copy.
That way you can make sure you do not get aliasing effects even if
source and copy are frozen because in #initialize_copy frozen state is
not applied.

I don't understand what you mean by that. If #dup calls self.class.new
then you obviously get a new and hence unfrozen object.

It is certainly true that the *default* implementation of both #dup and
#clone (defined in Object) calls initialize_copy. A generic #dup must
behave this way; it doesn't know what the new() method arguments are in
any particular subclass of Object. I don't think this should be taken as
necessarily implying that you are expected to leave #dup alone in your
own classes, and only override #initialize_copy instead.

The way I read the documentation implies to me that #dup in user defined
classes *should* call new. Silly example:

class NewsReader
def initialize(url, state_filename)
@url = url
@http_client = HTTPClient.new(@url)
@state_filename = state_filename
@state_file = File.open(@state_filename)
end
def dup
self.class.new(@url, @state_filename.dup)
end
end

Here the logic of how to build a NewsReader, including building all the
associated helper objects, is built into the #initialize method. I don't
think you would want to duplicate all this logic in #initialize_copy.
Furthermore, I think I would expect #clone only to copy the top object,
and leave all the instance variables aliased.

Obviously there are no hard-and-fast rules here, and with Ruby there are
many ways to achieve the same goal.

I'd certainly agree this is an area where Ruby's documentation falls
short.

Taking another example: I don't think you'll disagree that 99% of the
time you are expected to leave Object.new alone and instead define
#initialize in your own classes. But you wouldn't find that out from the
documentation:

$ ri Object.new
------------------------------------------------------------ Object::new
Object::new()
 
R

Robert Klemme

I don't understand what you mean by that. If #dup calls self.class.new
then you obviously get a new and hence unfrozen object.

It is certainly true that the *default* implementation of both #dup and
#clone (defined in Object) calls initialize_copy. A generic #dup must
behave this way; it doesn't know what the new() method arguments are in
any particular subclass of Object. I don't think this should be taken as
necessarily implying that you are expected to leave #dup alone in your
own classes, and only override #initialize_copy instead.

The way I read the documentation implies to me that #dup in user defined
classes *should* call new. Silly example:

class NewsReader
def initialize(url, state_filename)
@url = url
@http_client = HTTPClient.new(@url)
@state_filename = state_filename
@state_file = File.open(@state_filename)
end
def dup
self.class.new(@url, @state_filename.dup)
end
end

Here the logic of how to build a NewsReader, including building all the
associated helper objects, is built into the #initialize method.

Brian, the approach shown above does not work well with subclasses. The
code attempts to be safe with regard to inheritance (by doing
self.class.new instead of NewsReader.new) but it will fail miserably as
soon as a sub class constructor has a different argument list (which is
not too uncommon).

I completely agree with Rick here: the comment in Object#dup is probably
outdated. The most reasonable way to customize object cloning *and*
dupping is to implement #initialize_copy in a way to at least ensure no
aliasing of unfrozen members takes place.
I don't
think you would want to duplicate all this logic in #initialize_copy.

You would not duplicate the logic from #initialize in #initialize_copy
because #initialize_copy does a completely different job: it copies
state of an instance which is known to be consistent and just needs to
ensure that aliasing of object references does not break your class
invariants later accidentally. This is the reason why in
#initialize_copy different logic should be applied - even for shallow
copies! Method #initialize OTOH needs to work with its arguments which
were provided from the outside (outside of this class that is) and may
not meet expectations or valid ranges.
Furthermore, I think I would expect #clone only to copy the top object,
and leave all the instance variables aliased.

As far as I can see both #clone and #dup are meant to do shallow copies
but I may be wrong here. At least this is what the contract ob Object
promises and I tend to be cautious about changing such things. Even if
you redefine semantics to being deep copy for certain classes then
implementing it in #initialize_copy is superior to other approaches IMHO.
Obviously there are no hard-and-fast rules here, and with Ruby there are
many ways to achieve the same goal.

That's true. But I would say at least when considering inheritance some
ways are better than others. In fact I have been doing self.class.new
most of the time in #dup because I completely forgot about
#initialize_copy. But I will certainly change that habit from now on.
I'd certainly agree this is an area where Ruby's documentation falls
short.
Right.

Taking another example: I don't think you'll disagree that 99% of the
time you are expected to leave Object.new alone and instead define
#initialize in your own classes. But you wouldn't find that out from the
documentation:

$ ri Object.new
------------------------------------------------------------ Object::new
Object::new()
------------------------------------------------------------------------
Not documented

$ ri Object#initialize
Nothing known about Object#initialize

Funny that you mention it: #new and #initialize on one side and #dup /
#clone and #initialize_copy on the other side have one thing in common:
object allocation is separated from initialization. I believe this was
a wise decision because that way allocation policies can be implemented
easier than in languages like C++ and Java where both are inseparable.

For example, you can add your own #deep_dup to the language:

class Object
def deep_dup
cp = self.class.allocate

instance_variables.each do |var|
cp.instance_variable_set(instance_variable_get(var))
end

cp.initialize_deep_copy(self)

cp
end

def initialize_deep_copy(source)
# nothing to do here
end
end

class String
def initialize_deep_copy(source)
replace source
end
end

# note this implementation is not robust against
# circles in the object graph!
class Array
def initialize_deep_copy(source)
source.each do |y|
self << y.deep_dup
end
end
end

a = %w{foo bar baz}
b = a.dup
b[2].replace "CHANGED"

p a, b

a = %w{foo bar baz}
b = a.deep_dup
b[2].replace "CHANGED"

p a, b


Kind regards

robert
 
R

Rajinder Yadav

Funny that you mention it: #new and #initialize on one side and #dup /
#clone and #initialize_copy on the other side have one thing in common:
object allocation is separated from initialization. =A0I believe this was= a
wise decision because that way allocation policies can be implemented eas= ier
than in languages like C++ and Java where both are inseparable.

I wonder as I mention already maybe this design has more to do with
the fact that Ruby does not perform static type checking like C++ /
Java does at compile time. In C++ you just declare a copy constructor
(initialize/constructor), if you have other (overloaded) constructor
code, then static type checking ensure the correct code logic is
executed, thus allowing you to write a cleaner clone method. In Ruby
if initialize was called during cloning, you would need to add the
logic to perform the dynamic type checking test using Object.class.
Who would want to write this boilerplate code over and over? So Ruby's
was around this was to use initialize_copy as I am going to assume
here.

I think cloning in the initializer code would be a better design if
Ruby did static type checking. The fact Ruby still does (dynamic) type
checking at runtime, means Ruby code gets penalized for performance.

It seems the way Ruby does dup/clone/initialize/initialize_copy *throw
in subclassing* is a source of confusion for many and not really
intuitive, barring good or bad design. The length of this thread and
replies would seem to indicate this is a weakness in Ruby design, or I
am simply biased with my C++ background? Definitely better updated
documentation would help to ensure the correct policy to follow in
Ruby.
Kind regards

=A0 =A0 =A0 =A0robert

--=20
Kind Regards,
Rajinder Yadav

http://devmentor.org
Do Good! ~ Share Freely
 
R

Robert Klemme

I wonder as I mention already maybe this design has more to do with
the fact that Ruby does not perform static type checking like C++ /
Java does at compile time. In C++ you just declare a copy constructor
(initialize/constructor), if you have other (overloaded) constructor
code, then static type checking ensure the correct code logic is
executed, thus allowing you to write a cleaner clone method.

C++'s copy constructor is not a "clone method". For example, it will
happily "clone" any subclass instance. Cloning typically ensures at
least the class of the new instance is the same as for the original.

Static typing is just one reason why Ruby and C++ differ here: another
important reason is the memory model of both languages. In Ruby you
only have object references which can only be copied by value. In C++
on the other hand you have a whole toolbox of options (value objects,
pointers, references - plus constant variants). You can see that when
looking at Java: it has static typing but just one way to access objects
- via references. This is the same model as in Ruby and alas, also Java
has a method clone() which behaves similar (although the programming
model is different), i.e. it creates a new instance of the same class
with all members set to the same references as the original.

Side note: I find Java's cloning is broken in several ways. If you want
to make a class Cloneable you can only use "final" for primitive value
members because otherwise you cannot prevent aliasing between old and
new instance. Then, interface Cloneable does not contain method clone
which does not make the compiler catch a missing public method clone().
Lastly, I would have preferred the return type to be generic; although
I do have to admit that I did not think this through completely. I
guess Sun's engineers had good reasons not to change this.
In Ruby
if initialize was called during cloning, you would need to add the
logic to perform the dynamic type checking test using Object.class.

In other words: you would have to manually implement method overloading.
Who would want to write this boilerplate code over and over? So Ruby's
was around this was to use initialize_copy as I am going to assume
here.

You make it sound like a workaround but it isn't. For a language like
Ruby this is a good solution - and compared to Java's it's almost
perfect. It just lacks the public recognition. :)
I think cloning in the initializer code would be a better design if
Ruby did static type checking. The fact Ruby still does (dynamic) type
checking at runtime, means Ruby code gets penalized for performance.

I don't follow you here. If you want a language with static type
checking you'll have to look elsewhere. We don't have static type
checking in Ruby - in fact it's one of the core assets of the language.
Ruby with static typing would not be Ruby. Reasoning about which
approach would be best if Ruby had static typing is pretty useless.
It seems the way Ruby does dup/clone/initialize/initialize_copy *throw
in subclassing* is a source of confusion for many and not really
intuitive, barring good or bad design. The length of this thread and
replies would seem to indicate this is a weakness in Ruby design, or I
am simply biased with my C++ background? Definitely better updated
documentation would help to ensure the correct policy to follow in
Ruby.

I would attribute this confusion to the documentation and to the fact
that this is a rare topic to come up. I cannot remember a "how to
properly clone objects" thread in the last years that would have covered
the topic as thoroughly as we did here.

I don't think we are facing a weakness in Ruby's design here. C++
cannot be a role model for Ruby (regardless of whether you consider
C++'s approach good or bad) because both languages are very different as
I have tried to show above. It may be that your "C++ background" clouds
your view on Ruby. :)

Thanks for the interesting discussion!

Kind regards

robert
 
R

Robert Klemme


He must have copied it from me. :)

Seriously, although I agree to almost everything he says I would like to
add that cloning (done properly, for example as done in Ruby) does have
advantages over copy construction as well (just to name the most
prominent one: you do not need to know the class of the object to
clone). In fact, they are two different concepts and sometimes one is
more appropriate and sometimes the other one.

Cheers

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top