String doesnt auto dup on modification

S

Stefan Lang

2009/1/21 Tom Cloyd said:
If this is an utterly dumb question, just ignore it. However, I AM perplexed
by this response. Here's why:

I thought it was OK for an object to receive input, and output a modified
version of same. If they don't get to do that, their use seems rather
limited. In my current app, I create a log object, and various classes write
to it. I don't create new objects every time I want to add a log entry. Why
would I do that? Makes no sense to me. I might want to do exactly the same
thing to a string. You seem to be saying this is bad form. I can see that
there are cases where you want the string NOT to be modified, but you see to
be saying that to modify the original string at all is bad.

It makes perfect sense to me to pass an object (string, in this case) across
an encapsulation boundary specifically to modify it.

What am I missing here?

There's nothing wrong with it if the purpose of the method
is to manipulate the string and it's documented clearly.

Every rule has exceptions :)

Stefan
 
R

Robert Klemme

As and when i discover such bugs in my code, I start adding dup(), and
yes sometimes these lines bomb when another datatype is passed (I had
asked this in a thread recently: respond_to? dup was passing, but the
dup was failing).
Anyway, i realize its more my incompetence, and i must be careful with
destructive methods, but I just thought maybe there's some other way to
do this, so i am not leaving it to my memory.

I would not call this "incompetence": we live and learn. Basically this
is a typical trade off issue: you trade efficiency (no copy) for safety
(no aliasing). As often with trade offs there is no clear 100% rule
which exactly tells you what is *always* correct. Instead you have to
think about it - when creating something like a library even more so -
and then deliberately decide which way you go.

Kind regards

robert
 
R

Robert Klemme

Or as was said earlier on: simply don't use destructive string methods
unless you really have to. Pretend that all strings are frozen.

You can enforce this easily enough in your unit tests: e.g. you could
pass in strings that really are frozen, and check that your code still
works :)

Well, this is only half of the story: it does not save you from outside
code changing the instance under your hands. Aliasing works both ways,
i.e. you can screw up the receiver but also the caller. :)

Cheers

robert
 
R

Robert Dober

I'm writing my first largeish app. One issue that gets me frequently is
this:

I define a string in one class. Some other class references it, and
modifies it. I (somehow) expected that when another referer modifies the
reference, ruby would automatically dup() the string.

Anyway, through trial and error, I start dup()'ing strings myself. I am
aware of freeze().

But would like to know how others handle this generally in large apps.

- Do you keep freezing Strings you make in your classes to avoid
accidental change

- Do you habitually dup() your string ?
I try to, and I try to get rid of all references to the original
string as soon as possible.
This is because incremental GC works so well nowadays and allows for
some clean code.
Freezing a string seems like a good idea sometimes, but if that means
holding on to the object longer than needed this might not be such a
good idea after all.

R.
 
T

Tom Cloyd

Robert said:
I try to, and I try to get rid of all references to the original
string as soon as possible.
This is because incremental GC works so well nowadays and allows for
some clean code.
Freezing a string seems like a good idea sometimes, but if that means
holding on to the object longer than needed this might not be such a
good idea after all.

R.
Robert, for those of us who are considerably more clueless, what is
"incremental GC"?

Thanks,

t.

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tom Cloyd, MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< (e-mail address removed) >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
R

RK Sentinel

I try to, and I try to get rid of all references to the original
string as soon as possible.
This is because incremental GC works so well nowadays and allows for
some clean code.
Freezing a string seems like a good idea sometimes, but if that means
holding on to the object longer than needed this might not be such a
good idea after all.

R.

Interesting point. So freezing a string prevents collection as long as
there are referers (obvious), but duping it helps release the original
one, but you still have a new string in memory. So net you still are
taking the same memory.

Is there a writeup on Ruby GC collection, my knowledge of GC is java
based, and it is 5 years old (based on the Inside the VM book and
various other articles on sun.com). Is ruby's GC "generational" ? In
which iirc, an older object would have moved to an older generation and
be less likely to be collected.

Any links to ruby's GC would be appreciated.
 
M

Mike Gold

RK said:
- Do you keep freezing Strings you make in your classes to avoid
accidental change

- Do you habitually dup() your string ?

One possibility is copy-on-write.

require 'delegate'

class CopyOnWriteString < DelegateClass(String)
DESTRUCTIVE_METHODS =
String.public_instance_methods(false).grep(/!/).map(&:to_sym) +
[
:[]=,
:<<,
:concat,
:initialize_copy,
:replace,
:setbyte,
# ... and probably others ...
]

DESTRUCTIVE_METHODS.each { |m|
define_method(m) { |*args, &block|
__setobj__(__getobj__.dup)
__getobj__.send(m, *args, &block)
}
}
end

class Person
def initialize(name)
@name = name
end
def name
CopyOnWriteString.new(@name)
end
end

person = Person.new("fred")
name = person.name

p name #=> "fred"
p person.name #=> "fred"

name << " flintstone"
p name #=> "fred flintstone"
p person.name #=> "fred"

(I've used some 1.8.7+ only features.)
 
B

Brian Candler

RK said:
Thanks for all the helpful replies. Its my first venture: a widget
library.

Here's an example: Sometimes a string is passed to a class, say, using
its set_buffer method (which is an _optional_ method).

set_buffer just assigns it to @buffer. But deep within the class this
variable is being edited using insert() or slice!() (and this IS
necessary) since the widget is an editing widget.

Thanks, so I guess the API is something like this:

edit_field.buffer = "foo"

... some time later, after user has clicked OK ...

f.write(edit_field.buffer)

Now, if there is a compelling reason for this object to perform
"in-place" editing on the buffer then by all means do, and document
this, but it will lead to the aliasing problems you describe.

It may be simpler and safer just to use non-destructive methods inside
your class.

# destructive
@buffer.slice!(x,y)
# non-destructive alternative
@buffer = @buffer.slice(x,y)

# destructive
@buffer.insert(pos, text)
# non-destructive alternative
@buffer = buffer[0,pos] + text + buffer[pos..-1]

In effect, this is doing a 'dup' each time. It has to; since Ruby
doesn't do reference-counting it has no idea whether any other object in
the system is holding a reference to the original object or not.

The only problem with this is if @buffer is a multi-megabyte object and
you don't want to keep copying it. In this case, doing a single dup
up-front would allow you to use the destructive methods safely.

class Editor
def buffer
@buffer
end
def buffer=(x)
@buffer = x.dup
end
end

The overhead of a single copy is small, and in any case this is probably
what is needed here (e.g. if the user makes some edits but clicks
'cancel' instead of 'save' then you may want to keep the old string
untouched)

You could try deferring the dup until the first time you call a
destructive method on the string, but the complexity overhead is
unlikely to be worth it.
 
R

RK Sentinel

Brian said:
The overhead of a single copy is small, and in any case this is probably
what is needed here (e.g. if the user makes some edits but clicks
'cancel' instead of 'save' then you may want to keep the old string
untouched)

You could try deferring the dup until the first time you call a
destructive method on the string, but the complexity overhead is
unlikely to be worth it.

Yes, I've got the set_buffer doing a dup (if its a string).

At the same time, the get_buffer also does a dup, since often the Field
is created blank (i did mention that set_buffer is an optional method
for editing a default value, if present).

Its a real TextField or Field. So you would be typing away in the field.
Each character you type is inserted in (or removed if its del or BS) -
exactly as I am typing away in this editbox.

The CopyOnWriteString a impressive, shows what all can be done with
Ruby.
 
M

Mike Gold

Brian said:
You could try deferring the dup until the first time you call a
destructive method on the string, but the complexity overhead is
unlikely to be worth it.

Careful. Intuition is worse than useless here. The only way to know is
to measure the particular case in question.

class Person
def initialize(name)
@name = name
end
def name_cow
CopyOnWriteString.new(@name)
end
def name_dup
@name.dup
end
end

require 'benchmark'

n = 10_000
sizes = [100, 1000, 10_000, 100_000]
objects = sizes.inject(Hash.new) { |acc, size|
acc.merge!(size => Person.new("x"*size))
}

sizes.each { |size|
object = objects[size]
puts "-"*40
puts "iterations: #{n} size: #{size}"
Benchmark.bm { |x|
x.report("cow w/o change") {
n.times { object.name_cow }
}
x.report("dup w/o change") {
n.times { object.name_dup }
}
x.report("cow w/ change") {
n.times { object.name_cow << "y" }
}
x.report("dup w/ change") {
n.times { object.name_dup << "y" }
}
}
}

ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
----------------------------------------
iterations: 10000 size: 100
user system total real
cow w/o change 0.031000 0.000000 0.031000 ( 0.031000)
dup w/o change 0.032000 0.000000 0.032000 ( 0.031000)
cow w/ change 0.171000 0.000000 0.171000 ( 0.172000)
dup w/ change 0.047000 0.000000 0.047000 ( 0.047000)
----------------------------------------
iterations: 10000 size: 1000
user system total real
cow w/o change 0.032000 0.000000 0.032000 ( 0.031000)
dup w/o change 0.046000 0.000000 0.046000 ( 0.047000)
cow w/ change 0.172000 0.000000 0.172000 ( 0.172000)
dup w/ change 0.063000 0.000000 0.063000 ( 0.062000)
----------------------------------------
iterations: 10000 size: 10000
user system total real
cow w/o change 0.031000 0.000000 0.031000 ( 0.032000)
dup w/o change 0.109000 0.000000 0.109000 ( 0.109000)
cow w/ change 0.282000 0.000000 0.282000 ( 0.281000)
dup w/ change 0.156000 0.000000 0.156000 ( 0.156000)
----------------------------------------
iterations: 10000 size: 100000
user system total real
cow w/o change 0.031000 0.000000 0.031000 ( 0.032000)
dup w/o change 0.672000 0.000000 0.672000 ( 0.672000)
cow w/ change 1.406000 0.000000 1.406000 ( 1.406000)
dup w/ change 1.219000 0.000000 1.219000 ( 1.219000)

Destructive methods are less common in real code, and especially so when
the string comes from a attr_reader method. It is likely that the case
to optimize is the non-destructive call (the first of each quadruplet
above). But we have to profile the specific situation.
 
R

Robert Dober

Basically yes.
But one has to be careful, as we somehow have the instinct not to
create lots of short time objects.

To Tom, sorry for the sloppy abbreaviation, but it means incremental
Garbage Collector, applying different strategies of collection
depending on object age.
If you have 45m to spare, there was a most interesting talk at
Rubytalk by Glenn Vanderbourg, have a look by all means:
http://rubyconf2008.confreaks.com/how-ruby-can-be-fast.html

Cheers
Robert
 
T

Tom Cloyd

Robert said:
Basically yes.
But one has to be careful, as we somehow have the instinct not to
create lots of short time objects.

To Tom, sorry for the sloppy abbreaviation, but it means incremental
Garbage Collector, applying different strategies of collection
depending on object age.
If you have 45m to spare, there was a most interesting talk at
Rubytalk by Glenn Vanderbourg, have a look by all means:
http://rubyconf2008.confreaks.com/how-ruby-can-be-fast.html

Cheers
Robert
Robert,

Thanks! I felt bad about the question, because reading on in to a couple
of the posts following, I figured it out, and realized I DID know what
the abbreviation meant. The "incremental" threw me off a bit. Thanks for
the link. I'd love to go check out that link, and will likely do so
later today.

Thank again for your help, as always. I'm endlessly grateful for the
helpfulness of the folk on this list.

t.

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tom Cloyd, MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< (e-mail address removed) >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
S

Simon Krahnke

* Mike Gold said:
String.public_instance_methods(false).grep(/!/).map(&:to_sym) +

Is that a new feature? make the to_sym method act as block. Which to_sym
method?
DESTRUCTIVE_METHODS.each { |m|
define_method(m) { |*args, &block|
__setobj__(__getobj__.dup)
__getobj__.send(m, *args, &block)

WTF? What's this __[gs]etobj__ about?
(I've used some 1.8.7+ only features.)

mfg, simon .... tia
 
S

Sebastian Hungerecker

Simon said:
Is that a new feature?

Symbol#to_proc is new in ruby core since 1.8.7+. It had been previously
defined by ActiveSupport and, I think, facets.
make the to_sym method act as block. Which to_sym method?

The to_sym method of each item in the array. symbol_to_proc is defined
somewhat like:
lambda {|x, *args| x.send(self, *args)}
So foo.map(&:to_sym) behaves like foo.map {|x| x.send:)to_sym)}

DESTRUCTIVE_METHODS.each { |m|
define_method(m) { |*args, &block|
__setobj__(__getobj__.dup)
__getobj__.send(m, *args, &block)

WTF? What's this __[gs]etobj__ about?

Those are methods defined by DelegateClass(). They are used to get and set the
object that is delegated to. In this case they cause the proxy-object to
delegate to a copy of the original string instead of the original string
itself when a destructive method is called.

HTH,
Sebastian
 
S

Simon Krahnke

* Sebastian Hungerecker said:
Symbol#to_proc is new in ruby core since 1.8.7+. It had been previously
defined by ActiveSupport and, I think, facets.
WTF? What's this __[gs]etobj__ about?

Those are methods defined by DelegateClass(). [...]

Thanks, I never saw DelegateClass before, now I understand.

mfg, simon .... l
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,183
Messages
2,570,967
Members
47,518
Latest member
RomanGratt

Latest Threads

Top