[QUIZ] Twisting a Rope (#137)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by John Miller

This week's task is to implement the Rope data structure as a Ruby class. This
topic comes out of the ICFP programming competition
(http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
character string this year.

What is a Rope:

You may not realize it, but for many changes the content to a String, Ruby
creates a new copy of the original with the modifications applied. For small
strings that are created once and read often this is actually a very efficient
way to do thing, but what happens when the string starts to get long, and is
undergoing a lot of changes? First, the program will spend more and more of its
processing cycles just copying bits around. Second, the garbage collector will
be called more and more often to pick up the little stringy scraps you've left
all over the memory.

Ropes (the name is a pun for a heavy duty string) are tree structures where a
node represents the concatenation of its left branch with its right, and leaves
are flat strings. (This is a little sloppy. A rope may contain shared subtrees,
and is thus really a directed acyclic graph, where the out-edges of each vertex
are ordered. We will continue to be sloppy.) E.g. To prepend Text A to Text B,
one creates a Node (call it N1) with A as its left branch and B as its right.
To further append Text C create a new Node N2 with its left branch pointing to
N1 and its right to C. Easy, right? To find out more see Boehm, Atkinson and
Plass "Ropes: an Alternative to Strings" at:

http://rubyurl.com/2FRbO

The task comes in three parts, each increasing in difficulty:

Part one:

Create a Rope class that can do the following:

a. 'append' or 'prepend' a String or another Rope
(alias the << operator to the append function)
b. Return the fully concatenated text with 'to_s'
c. define 'slice' to call to_s.slice
d. provide a 'length' method

Part two:

Add the following:

a. Add the ability to 'index' a single character given a 0-based offset
from the beginning of the string.
b. Add the ability to 'shift' a single character from the front of a Rope.
(Remove and return the character)
c. Add your own 'slice' method that returns a String. Implement as many of
the String method's forms as possible. To run the example code this
function will only need to understand the slice(offset,length) form.
Major Bonus for Regex and Substring forms.
d. "Balance" the tree with a 'normalize' method.
(see Boehm, Atkinson and Plass 1319 Rebalancing)

Part three: (bonus)

Add the following:

a. Change the 'slice' method to return a Rope. Ideally this method should
do as little string copying as possible. (Doing this will well
dramatically increase the speed of the example code)
b. Allow 'shift' to optionally accept an integer number of characters to
remove and return.
c. modify the '<<' operator so that can efficiently append a few
characters at a time. (see Boehm, Atkinson and Plass 1318 para. 4)
d. *Major Bonus* Add the ability to append and prepend IO classes in a
lazy fashion. (see Boehm, Atkinson and Plass 1318 para. 2)

The following code may help you explore how efficient your code is and show
where Ropes are useful. `ruby -r /path/to/your/rope/class this_script.rb Rope`
will run the test with your code. Run the script without arguments to see how
well String does. Also play around with the SIZE and CHUNKS constants to get a
feel for how they affect performance.

require 'benchmark'

#This code make a String/Rope of CHUNCKS chunks of text
#each chunck is SIZE bytes long. Each chunck starts with
#an 8 byte number. Initially the chuncks are shuffled the
#qsort method sorts them into ascending order.
#
#pass the name of the class to use as a parameter
#ruby -r rope.rb this_file Rope

puts 'preparing data...'
TextClass = Object.const_get(ARGV.shift || :String)

def qsort(text)
return TextClass.new if text.length == 0
pivot = text.slice(0,8).to_s.to_i
less = TextClass.new
more = TextClass.new
offset = 8+SIZE
while (offset < text.length)
i = text.slice(offset,8).to_s.to_i
(i < pivot ? less : more) << text.slice(offset,8+SIZE)
offset = offset + 8+SIZE
end
print "*"
return qsort(less) << text.slice(0,8+SIZE) << qsort(more)
end

SIZE = 512 * 1024
CHUNCKS = 128
CHARS = %w[R O P E]
data = TextClass.new
bulk_string =
TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join)
puts 'Building Text...'
build = Benchmark.measure do
(0..CHUNCKS).sort_by { rand }.each do |n|
data<< sprintf("%08i",n) << bulk_string
end
data.normalize if data.respond_to? :normalize
end
GC.start
sort = Benchmark.measure do
puts "Sorting Text..."
qsort(data)
puts"\nEND"
end

puts "Build: #{build}Sort: #{sort}"
 
J

John Miller

My apologies,

The first line of the quiz should read:
"You may not realize it, but for many changes to the content of a
String,"

I am really looking forward to seeing some ingenious solutions to this
problem. The concept seemed very simple to me but my first attempt kept
becoming mired down in edge cases. I hope everyone has a good weekend
and a good Labor Day weekend to those in the US (plenty of time to work
on this weeks quiz ;)

John Miller
 
J

James Edward Gray II

The first line of the quiz should read:
"You may not realize it, but for many changes to the content of a
String,"

I've fixed this on the web site. Thanks for pointing it out.

James Edward Gray II
 
C

Carl Porth

I've done parts one and two so far. I'll try to add more in the next
few days.

For simplicity and speed, append and prepend don't modify the
receiver.

If anyone has any questions about my code, I'll be happy to answer.

Carl

require "rubygems"
require "facets"
require "kernel/with"
require "symbol/to_proc"

class String
def shift
return nil if empty?
returning self[0].chr do
self[0] = ""
end
end
end

class Rope
attr_reader :left, :right, :length

def Rope.new(*args)
if args.size <= 2
super
else # build balanced tree
mid = args.size / 2
args[mid,2] = Rope.new(*args[mid,2])
Rope.new(*args)
end
end

def initialize(left="",right="")
@left, @right = left, right
@length = @left.length + @right.length
end

def replace(rope)
initialize(rope.left,rope.right)
self
end

# clean out empty strings and rebuild tree
def normalize
Rope.new(*flat_strings.reject(&:empty?))
end

def to_s
flat_strings.join('')
end

def append(str)
Rope.new(self,str)
end
alias_method :<<, :append

def prepend(str)
Rope.new(str,self)
end

def slice(start,length=@length-start)
if start > left.length # right
right.slice(start-left.length,length-left.length)
elsif left.length < length # left and right
left.slice(start,left.length) +
right.slice(start-left.length,length-left.length)
else # left
left.slice(start,length)
end
end

def shift
if letter = left.shift || right.shift
@length -= 1
letter
else
nil
end
end

def index(letter,start=0)
if start > left.length # right
left.length + right.index(letter,start - left.length)
else # left
left.index(letter,start) || left.length + right.index(letter)
end
rescue
nil
end

# TODO implement rest of methods
def method_missing(method, *args, &block)
result = to_s.send(method, *args, &block)
if result.is_a?(String)
if method.to_s =~ /!$/
replace(Rope.new(result))
else
Rope.new(result)
end
else
result
end
end

protected

# traverse the tree
def traverse(&block)
@left.is_a?(Rope) ? @left.traverse(&block) : block.call(@left)
@right.is_a?(Rope) ? @right.traverse(&block) : block.call(@right)
end

# collect all the flat strings
def flat_strings
returning [] do |ary|
traverse { |str| ary << str }
end
end

end


The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=

by John Miller

This week's task is to implement the Rope data structure as a Ruby class. This
topic comes out of the ICFP programming competition
(http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
character string this year.

What is a Rope:

You may not realize it, but for many changes the content to a String, Ruby
creates a new copy of the original with the modifications applied. For small
strings that are created once and read often this is actually a very efficient
way to do thing, but what happens when the string starts to get long, and is
undergoing a lot of changes? First, the program will spend more and more of its
processing cycles just copying bits around. Second, the garbage collector will
be called more and more often to pick up the little stringy scraps you've left
all over the memory.

Ropes (the name is a pun for a heavy duty string) are tree structures where a
node represents the concatenation of its left branch with its right, and leaves
are flat strings. (This is a little sloppy. A rope may contain shared subtrees,
and is thus really a directed acyclic graph, where the out-edges of each vertex
are ordered. We will continue to be sloppy.) E.g. To prepend Text A to Text B,
one creates a Node (call it N1) with A as its left branch and B as its right.
To further append Text C create a new Node N2 with its left branch pointing to
N1 and its right to C. Easy, right? To find out more see Boehm, Atkinson and
Plass "Ropes: an Alternative to Strings" at:

http://rubyurl.com/2FRbO

The task comes in three parts, each increasing in difficulty:

Part one:

Create a Rope class that can do the following:

a. 'append' or 'prepend' a String or another Rope
(alias the << operator to the append function)
b. Return the fully concatenated text with 'to_s'
c. define 'slice' to call to_s.slice
d. provide a 'length' method

Part two:

Add the following:

a. Add the ability to 'index' a single character given a 0-based offset
from the beginning of the string.
b. Add the ability to 'shift' a single character from the front of a Rope.
(Remove and return the character)
c. Add your own 'slice' method that returns a String. Implement as many of
the String method's forms as possible. To run the example code this
function will only need to understand the slice(offset,length) form.
Major Bonus for Regex and Substring forms.
d. "Balance" the tree with a 'normalize' method.
(see Boehm, Atkinson and Plass 1319 Rebalancing)

Part three: (bonus)

Add the following:

a. Change the 'slice' method to return a Rope. Ideally this method should
do as little string copying as possible. (Doing this will well
dramatically increase the speed of the example code)
b. Allow 'shift' to optionally accept an integer number of characters to
remove and return.
c. modify the '<<' operator so that can efficiently append a few
characters at a time. (see Boehm, Atkinson and Plass 1318 para. 4)
d. *Major Bonus* Add the ability to append and prepend IO classes in a
lazy fashion. (see Boehm, Atkinson and Plass 1318 para. 2)

The following code may help you explore how efficient your code is and show
where Ropes are useful. `ruby -r /path/to/your/rope/class this_script.rb Rope`
will run the test with your code. Run the script without arguments to see how
well String does. Also play around with the SIZE and CHUNKS constants to get a
feel for how they affect performance.

require 'benchmark'

#This code make a String/Rope of CHUNCKS chunks of text
#each chunck is SIZE bytes long. Each chunck starts with
#an 8 byte number. Initially the chuncks are shuffled the
#qsort method sorts them into ascending order.
#
#pass the name of the class to use as a parameter
#ruby -r rope.rb this_file Rope

puts 'preparing data...'
TextClass = Object.const_get(ARGV.shift || :String)

def qsort(text)
return TextClass.new if text.length == 0
pivot = text.slice(0,8).to_s.to_i
less = TextClass.new
more = TextClass.new
offset = 8+SIZE
while (offset < text.length)
i = text.slice(offset,8).to_s.to_i
(i < pivot ? less : more) << text.slice(offset,8+SIZE)
offset = offset + 8+SIZE
end
print "*"
return qsort(less) << text.slice(0,8+SIZE) << qsort(more)
end

SIZE = 512 * 1024
CHUNCKS = 128
CHARS = %w[R O P E]
data = TextClass.new
bulk_string =
TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join)
puts 'Building Text...'
build = Benchmark.measure do
(0..CHUNCKS).sort_by { rand }.each do |n|
data<< sprintf("%08i",n) << bulk_string
end
data.normalize if data.respond_to? :normalize
end
GC.start
sort = Benchmark.measure do
puts "Sorting Text..."
qsort(data)
puts"\nEND"
end

puts "Build: #{build}Sort: #{sort}"
 
C

Carl Porth

I've modified my Rope so it runs with the benchmark (and fixed some
bugs).

http://pastie.caboo.se/93256

with the included benchmark:

badcarl@navi> ruby -r lib/rope.rb benchmark.rb String
Build: 0.150000 0.080000 0.230000 ( 0.234209)
Sort: 1.700000 1.850000 3.550000 ( 3.613295)

badcarl@navi> ruby -r lib/rope.rb benchmark.rb Rope
Build: 0.000000 0.000000 0.000000 ( 0.009324)
Sort: 0.280000 0.080000 0.360000 ( 0.372736)

I'm getting around 10x speedup on sorting.


I've done parts one and two so far. I'll try to add more in the next
few days.

For simplicity and speed, append and prepend don't modify the
receiver.

If anyone has any questions about my code, I'll be happy to answer.

Carl

require "rubygems"
require "facets"
require "kernel/with"
require "symbol/to_proc"

class String
def shift
return nil if empty?
returning self[0].chr do
self[0] = ""
end
end
end

class Rope
attr_reader :left, :right, :length

def Rope.new(*args)
if args.size <= 2
super
else # build balanced tree
mid = args.size / 2
args[mid,2] = Rope.new(*args[mid,2])
Rope.new(*args)
end
end

def initialize(left="",right="")
@left, @right = left, right
@length = @left.length + @right.length
end

def replace(rope)
initialize(rope.left,rope.right)
self
end

# clean out empty strings and rebuild tree
def normalize
Rope.new(*flat_strings.reject(&:empty?))
end

def to_s
flat_strings.join('')
end

def append(str)
Rope.new(self,str)
end
alias_method :<<, :append

def prepend(str)
Rope.new(str,self)
end

def slice(start,length=@length-start)
if start > left.length # right
right.slice(start-left.length,length-left.length)
elsif left.length < length # left and right
left.slice(start,left.length) +
right.slice(start-left.length,length-left.length)
else # left
left.slice(start,length)
end
end

def shift
if letter = left.shift || right.shift
@length -= 1
letter
else
nil
end
end

def index(letter,start=0)
if start > left.length # right
left.length + right.index(letter,start - left.length)
else # left
left.index(letter,start) || left.length + right.index(letter)
end
rescue
nil
end

# TODO implement rest of methods
def method_missing(method, *args, &block)
result = to_s.send(method, *args, &block)
if result.is_a?(String)
if method.to_s =~ /!$/
replace(Rope.new(result))
else
Rope.new(result)
end
else
result
end
end

protected

# traverse the tree
def traverse(&block)
@left.is_a?(Rope) ? @left.traverse(&block) : block.call(@left)
@right.is_a?(Rope) ? @right.traverse(&block) : block.call(@right)
end

# collect all the flat strings
def flat_strings
returning [] do |ary|
traverse { |str| ary << str }
end
end

end

The three rules of Ruby Quiz:
1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.
2. Support Ruby Quiz by submitting ideas as often as you can:

3. Enjoy!
Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- =-=-=
by John Miller
This week's task is to implement the Rope data structure as a Ruby class. This
topic comes out of the ICFP programming competition
(http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
character string this year.
What is a Rope:
You may not realize it, but for many changes the content to a String, Ruby
creates a new copy of the original with the modifications applied. For small
strings that are created once and read often this is actually a very efficient
way to do thing, but what happens when the string starts to get long, and is
undergoing a lot of changes? First, the program will spend more and more of its
processing cycles just copying bits around. Second, the garbage collector will
be called more and more often to pick up the little stringy scraps you've left
all over the memory.
Ropes (the name is a pun for a heavy duty string) are tree structures where a
node represents the concatenation of its left branch with its right, and leaves
are flat strings. (This is a little sloppy. A rope may contain shared subtrees,
and is thus really a directed acyclic graph, where the out-edges of each vertex
are ordered. We will continue to be sloppy.) E.g. To prepend Text A to Text B,
one creates a Node (call it N1) with A as its left branch and B as its right.
To further append Text C create a new Node N2 with its left branch pointing to
N1 and its right to C. Easy, right? To find out more see Boehm, Atkinson and
Plass "Ropes: an Alternative to Strings" at:

The task comes in three parts, each increasing in difficulty:
Part one:
Create a Rope class that can do the following:
a. 'append' or 'prepend' a String or another Rope
(alias the << operator to the append function)
b. Return the fully concatenated text with 'to_s'
c. define 'slice' to call to_s.slice
d. provide a 'length' method
Part two:
Add the following:
a. Add the ability to 'index' a single character given a 0-based offset
from the beginning of the string.
b. Add the ability to 'shift' a single character from the front of a Rope.
(Remove and return the character)
c. Add your own 'slice' method that returns a String. Implement as many of
the String method's forms as possible. To run the example code this
function will only need to understand the slice(offset,length) form.
Major Bonus for Regex and Substring forms.
d. "Balance" the tree with a 'normalize' method.
(see Boehm, Atkinson and Plass 1319 Rebalancing)
Part three: (bonus)
Add the following:
a. Change the 'slice' method to return a Rope. Ideally this method should
do as little string copying as possible. (Doing this will well
dramatically increase the speed of the example code)
b. Allow 'shift' to optionally accept an integer number of characters to
remove and return.
c. modify the '<<' operator so that can efficiently append a few
characters at a time. (see Boehm, Atkinson and Plass 1318 para. 4)
d. *Major Bonus* Add the ability to append and prepend IO classes in a
lazy fashion. (see Boehm, Atkinson and Plass 1318 para. 2)
The following code may help you explore how efficient your code is and show
where Ropes are useful. `ruby -r /path/to/your/rope/class this_script.rb Rope`
will run the test with your code. Run the script without arguments to see how
well String does. Also play around with the SIZE and CHUNKS constants to get a
feel for how they affect performance.
require 'benchmark'
#This code make a String/Rope of CHUNCKS chunks of text
#each chunck is SIZE bytes long. Each chunck starts with
#an 8 byte number. Initially the chuncks are shuffled the
#qsort method sorts them into ascending order.
#
#pass the name of the class to use as a parameter
#ruby -r rope.rb this_file Rope
puts 'preparing data...'
TextClass = Object.const_get(ARGV.shift || :String)
def qsort(text)
return TextClass.new if text.length == 0
pivot = text.slice(0,8).to_s.to_i
less = TextClass.new
more = TextClass.new
offset = 8+SIZE
while (offset < text.length)
i = text.slice(offset,8).to_s.to_i
(i < pivot ? less : more) << text.slice(offset,8+SIZE)
offset = offset + 8+SIZE
end
print "*"
return qsort(less) << text.slice(0,8+SIZE) << qsort(more)
end
SIZE = 512 * 1024
CHUNCKS = 128
CHARS = %w[R O P E]
data = TextClass.new
bulk_string =
TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join)
puts 'Building Text...'
build = Benchmark.measure do
(0..CHUNCKS).sort_by { rand }.each do |n|
data<< sprintf("%08i",n) << bulk_string
end
data.normalize if data.respond_to? :normalize
end
GC.start
sort = Benchmark.measure do
puts "Sorting Text..."
qsort(data)
puts"\nEND"
end
puts "Build: #{build}Sort: #{sort}"
 
E

Eric Mahurin

------=_Part_17535_22666276.1188802265486
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

This week's task is to implement the Rope data structure as a Ruby class.

My implementation is attached along with a modified test. I made a
few changes to what was asked, but this also gave more flexibility.
Here is what I did:

* Just as the implementation in the paper referenced in this quiz, my
implementation gives immutable ropes. #<< is non-destructive, so I
had to change the test to do <<= instead of just <<. Also, I didn't
implement #shift, because it must be destructive. The same
functionality can be achieved with 2 calls to #slice (one could be #at
if you only need to shift one element). There are very good reasons
to make ropes immutable. I'd imagine almost every other
implementation with modifiable ropes will fail when someone starts
modifying a sub-rope that is shared. You'd really need a good COW
(copy-on-write) scheme to allow both transparent sharing and
modification. I didn't see that it was worth the
effort/complexity/risk. I chose the simple functional programming
approach (immutable objects).

* I chose to automatically balance the ropes during concatenation (no
normalize). I used the same tree rotations that are used with AVL
trees. Another option could be to treat these as red-black trees
which might save on some rotations. One reason I automatically
balanced is that it simplifies the interface. The user of the API
doesn't have to worry about when to normalize. A second reason is
that every other rope operation is O(log(n)), so there probably isn't
much benefit in making only concatenation O(1).

* I don't allow ropes to directly use Strings. Instead, a StringRope
is used as a proxy for a String. To use a String directly, I would
have had to check the class, use respond_to, modify String to look
like a rope, etc. Instead, I just went with the pure duck-typing
approach and made multiple Rope-like classes that use different types
of data. My basis for these is the class ArrayRope. There is no
reason why a rope data-structure can't be used with any sequence of
objects instead of just characters. ArrayRope takes an Array-like
object. An rope built out of these is to Array as a conventional rope
is to String. I also added an IORope class to lazily work with files.
Using IORope you could make a text editor that didn't have to read
the whole file in at once. There is no reason you can't mix and match
any of these leaf rope classes (depth 0) within a rope tree.

* #each iterates through elements (i.e. characters) in the rope. It
annoys me that String#each (and IO#each for that matter) iterates over
lines - it should be characters (not necessarily bytes). All of the
Enumerable methods are accessible too.

* I used #at instead of #index because #index is used for searching in
String/Array. Array#at is an exact match.

The main thing I didn't do was implement any regex stuff. I don't see
how this is doable since all of the regex methods are completely tied
to String (not duck-typed). You'd have to convert the whole rope to
string to do anything (which defeats the purpose of the rope).

------=_Part_17535_22666276.1188802265486
Content-Type: application/x-ruby; name=quiz137.rb
Content-Transfer-Encoding: base64
X-Attachment-Id: f_f64jpfn2
Content-Disposition: attachment; filename="quiz137.rb"

CmNsYXNzIFJvcGUKICAgIGluY2x1ZGUgRW51bWVyYWJsZQogICAgIyBmb3JtIGEgYmluYXJ5IHRy
ZWUgZnJvbSB0d28gcm9wZXMgKHBvc3NpYmx5IHN1Yi10cmVlcykKICAgIGRlZiBpbml0aWFsaXpl
KGxlZnQscmlnaHQpCiAgICAgICAgQGxlZnQgPSBsZWZ0CiAgICAgICAgQHJpZ2h0ID0gcmlnaHQK
ICAgICAgICBAbGxlbmd0aCA9IEBsZWZ0Lmxlbmd0aAogICAgICAgIEBsZW5ndGggPSBAbGxlbmd0
aCtAcmlnaHQubGVuZ3RoCiAgICAgICAgQGRlcHRoID0gW2xlZnQuZGVwdGgsIHJpZ2h0LmRlcHRo
XS5tYXgrMQogICAgZW5kCiAgICAjIG51bWJlciBvZiBlbGVtZW50cyBpbiB0aGlzIHJvcGUKICAg
IGRlZiBsZW5ndGgKICAgICAgICBAbGVuZ3RoCiAgICBlbmQKICAgICMgZGVwdGggb2YgdGhlIHRy
ZWUgKHRvIGhlbHAga2VlcCB0aGUgdHJlZSBiYWxhbmNlZCkKICAgIGRlZiBkZXB0aAogICAgICAg
IEBkZXB0aAogICAgZW5kCiAgICAjIGxlZnQgcm9wZSAobm90IG5lZWRlZCB3aGVuIGRlcHRoPT0w
KQogICAgZGVmIGxlZnQKICAgICAgICBAbGVmdAogICAgZW5kCiAgICAjIHJpZ2h0IHJvcGUgKG5v
dCBuZWVkZWQgd2hlbiBkZXB0aD09MCkKICAgIGRlZiByaWdodAogICAgICAgIEByaWdodAogICAg
ZW5kCiAgICAjIGFwcGVuZGVkIHJvcGUgKG5vbi1tb2RpZnlpbmcpCiAgICBkZWYgPDwob3RoZXIp
CiAgICAgICAgIyBiYWxhbmNlIGFzIGFuIEFWTCB0cmVlCiAgICAgICAgYmFsYW5jZSA9IG90aGVy
LmRlcHRoLUBkZXB0aAogICAgICAgIGlmIGJhbGFuY2U+KzEKICAgICAgICAgICAgbGVmdCA9IG90
aGVyLmxlZnQKICAgICAgICAgICAgcmlnaHQgPSBvdGhlci5yaWdodAogICAgICAgICAgICBpZiBs
ZWZ0LmRlcHRoPnJpZ2h0LmRlcHRoCiAgICAgICAgICAgICAgICAjIHJvdGF0ZSBvdGhlciB0byBy
aWdodCBiZWZvcmUgcm90YXRpbmcgc2VsZitvdGhlciB0byBsZWZ0CiAgICAgICAgICAgICAgICAo
c2VsZiA8PCBsZWZ0LmxlZnQpIDw8IChsZWZ0LnJpZ2h0IDw8IHJpZ2h0KQogICAgICAgICAgICBl
bHNlCiAgICAgICAgICAgICAgICAjIHJvdGF0ZSBzZWxmK290aGVyIHRvIGxlZnQKICAgICAgICAg
ICAgICAgIChzZWxmIDw8IGxlZnQpIDw8IHJpZ2h0CiAgICAgICAgICAgIGVuZAogICAgICAgIGVs
c2lmIGJhbGFuY2U8LTEKICAgICAgICAgICAgaWYgQHJpZ2h0LmRlcHRoPkBsZWZ0LmRlcHRoCiAg
ICAgICAgICAgICAgICAjIHJvdGF0ZSBzZWxmIHRvIGxlZnQgYmVmb3JlIHJvdGF0aW5nIHNlbGYr
b3RoZXIgdG8gcmlnaHQKICAgICAgICAgICAgICAgIChAbGVmdCA8PCBAcmlnaHQubGVmdCkgPDwg
KEByaWdodC5yaWdodCA8PCBvdGhlcikKICAgICAgICAgICAgZWxzZQogICAgICAgICAgICAgICAg
IyByb3RhdGUgc2VsZitvdGhlciB0byByaWdodAogICAgICAgICAgICAgICAgQGxlZnQgPDwgKEBy
aWdodCA8PCBvdGhlcikKICAgICAgICAgICAgZW5kCiAgICAgICAgIyB1bmNvbW1lbnQgdGhpcyB0
byBhbGxvdyBzaG9ydCBzdHJpbmdzIHRvIGNvbmNhdCBmbGF0CiAgICAgICAgI2Vsc2lmIG90aGVy
LmRlcHRoLnplcm8/CiAgICAgICAgIyAgICAjIGFsbG93IEByaWdodCBhbmQgb3RoZXIgdG8gYmUg
bWVyZ2VkICh3aGVuIHRoZSBsZW5ndGggaXMgc2hvcnQpCiAgICAgICAgIyAgICBAbGVmdCA8PCAo
QHJpZ2h0IDw8IG90aGVyKQogICAgICAgIGVsc2UKICAgICAgICAgICAgc2VsZi5jbGFzcy5uZXco
c2VsZiwgb3RoZXIpCiAgICAgICAgZW5kCiAgICBlbmQKICAgICMgc2xpY2Ugb2YgdGhlIHJvcGUK
ICAgIGRlZiBzbGljZShzdGFydCwgbGVuKQogICAgICAgIHJldHVybiBzZWxmIGlmIHN0YXJ0Lnpl
cm8/IGFuZCBsZW49PUBsZW5ndGgKICAgICAgICByc3RhcnQgPSBzdGFydC1AbGxlbmd0aAogICAg
ICAgIHJldHVybiBAcmlnaHQuc2xpY2UocnN0YXJ0LCBsZW4pIGlmIHJzdGFydD4wCiAgICAgICAg
bGxlbiA9IEBsbGVuZ3RoLXN0YXJ0CiAgICAgICAgcmxlbiA9IGxlbi1sbGVuCiAgICAgICAgaWYg
cmxlbj4wCiAgICAgICAgICAgIEBsZWZ0LnNsaWNlKHN0YXJ0LCBsbGVuKSA8PCBAcmlnaHQuc2xp
Y2UoMCwgcmxlbikKICAgICAgICBlbHNlCiAgICAgICAgICAgIEBsZWZ0LnNsaWNlKHN0YXJ0LCBs
ZW4pCiAgICAgICAgZW5kCiAgICBlbmQKICAgICMgZWxlbWVudCBhdCBhIGNlcnRhaW4gcG9zaXRp
b24gaW4gdGhlIHJvcGUKICAgIGRlZiBhdChzdGFydCkKICAgICAgICByc3RhcnQgPSBzdGFydC1A
bGxlbmd0aAogICAgICAgIGlmIHJzdGFydDwwCiAgICAgICAgICAgIEBsZWZ0LmF0KHN0YXJ0KQog
ICAgICAgIGVsc2UKICAgICAgICAgICAgQHJpZ2h0LmF0KHJzdGFydCkKICAgICAgICBlbmQKICAg
IGVuZAogICAgIyBpdGVyYXRlIHRocm91Z2ggdGhlIGVsZW1lbnRzIGluIHRoZSByb3BlCiAgICBk
ZWYgZWFjaAogICAgICAgIEBsZWZ0LmVhY2ggeyB8eHwgeWllbGQgeCB9CiAgICAgICAgQHJpZ2h0
LmVhY2ggeyB8eHwgeWllbGQgeCB9CiAgICBlbmQKICAgICMgZmxhdHRlbiB0aGUgcm9wZSBpbnRv
IGEgc3RyaW5nIChvcHRpb25hbGx5IHN0YXJ0aW5nIHdpdGggYSBwcmVmaXgpCiAgICBkZWYgdG9f
cyhzPSIiKQogICAgICAgIEByaWdodC50b19zKEBsZWZ0LnRvX3MocykpCiAgICBlbmQKZW5kCgpF
bXB0eVJvcGUgPSBPYmplY3QubmV3CmNsYXNzIDw8IEVtcHR5Um9wZQogICAgaW5jbHVkZSBFbnVt
ZXJhYmxlCiAgICBkZWYgbGVuZ3RoCiAgICAgICAgMAogICAgZW5kCiAgICBkZWYgZGVwdGgKICAg
ICAgICAwCiAgICBlbmQKICAgIGRlZiA8PChvdGhlcikKICAgICAgICBvdGhlcgogICAgZW5kCiAg
ICBkZWYgPj4ob3RoZXIpCiAgICAgICAgb3RoZXIKICAgIGVuZAogICAgZGVmIHNsaWNlKHN0YXJ0
LCBsZW4pCiAgICAgICAgc2VsZgogICAgZW5kCiAgICBkZWYgZWFjaAogICAgZW5kCiAgICBkZWYg
dG9fcwogICAgICAgICIiCiAgICBlbmQKZW5kCgpjbGFzcyBBcnJheVJvcGUKICAgIGluY2x1ZGUg
RW51bWVyYWJsZQogICAgZGVmIHNlbGYubmV3KCphcmdzKQogICAgICAgIGlmIGFyZ3MuZW1wdHk/
CiAgICAgICAgICAgIEVtcHR5Um9wZQogICAgICAgIGVsc2UKICAgICAgICAgICAgc3VwZXIKICAg
ICAgICBlbmQKICAgIGVuZAogICAgZGVmIGluaXRpYWxpemUoZGF0YSkKICAgICAgICBAZGF0YSA9
IGRhdGEKICAgIGVuZAogICAgZGVmIGxlbmd0aAogICAgICAgIEBkYXRhLmxlbmd0aAogICAgZW5k
CiAgICBkZWYgZGVwdGgKICAgICAgICAwCiAgICBlbmQKICAgIGRlZiA8PChvdGhlcikKICAgICAg
ICBiYWxhbmNlID0gb3RoZXIuZGVwdGgKICAgICAgICBpZiBiYWxhbmNlPjEKICAgICAgICAgICAg
bGVmdCA9IG90aGVyLmxlZnQKICAgICAgICAgICAgcmlnaHQgPSBvdGhlci5yaWdodAogICAgICAg
ICAgICBpZiBsZWZ0LmRlcHRoPnJpZ2h0LmRlcHRoCiAgICAgICAgICAgICAgICAjIHJvdGF0ZSBv
dGhlciB0byByaWdodCBiZWZvcmUgcm90YXRpbmcgc2VsZitvdGhlciB0byBsZWZ0CiAgICAgICAg
ICAgICAgICAoc2VsZiA8PCBsZWZ0LmxlZnQpIDw8IChsZWZ0LnJpZ2h0IDw8IHJpZ2h0KQogICAg
ICAgICAgICBlbHNlCiAgICAgICAgICAgICAgICAjIHJvdGF0ZSBzZWxmK290aGVyIHRvIGxlZnQK
ICAgICAgICAgICAgICAgIChzZWxmIDw8IGxlZnQpIDw8IHJpZ2h0CiAgICAgICAgICAgIGVuZAog
ICAgICAgIGVsc2lmIG90aGVyLmxlbmd0aD09MAogICAgICAgICAgICAjIG5vdGhpbmcgdG8gYXBw
ZW5kLCBzZWxmIHdpbGwgZG8KICAgICAgICAgICAgc2VsZgogICAgICAgIGVsc2UKICAgICAgICAg
ICAgUm9wZS5uZXcoc2VsZiwgb3RoZXIpCiAgICAgICAgZW5kCiAgICBlbmQKICAgIGRlZiBzbGlj
ZShzdGFydCwgbGVuKQogICAgICAgIHJldHVybiBzZWxmIGlmIHN0YXJ0Lnplcm8/IGFuZCBsZW49
PUBkYXRhLmxlbmd0aAogICAgICAgICMgZGVwZW5kIG9uIHJ1YnkncyBDT1cgbWVjaGFuaXNtIHRv
IGp1c3QgcmVmZXJlbmNlIHRoZSBzbGljZSBkYXRhCiAgICAgICAgc2VsZi5jbGFzcy5uZXcoQGRh
dGEuc2xpY2Uoc3RhcnQsIGxlbikpCiAgICBlbmQKICAgIGRlZiBhdChzdGFydCkKICAgICAgICBA
ZGF0YS5hdChzdGFydCkKICAgIGVuZAogICAgZGVmIGVhY2gKICAgICAgICBAZGF0YS5lYWNoIHsg
fHh8IHlpZWxkIHggfQogICAgZW5kCiAgICBkZWYgdG9fcyhzPSIiKQogICAgICAgIHMuY29uY2F0
KEBkYXRhLnRvX3MpCiAgICBlbmQKZW5kCgpjbGFzcyBTdHJpbmdSb3BlIDwgQXJyYXlSb3BlCiAg
ICAjIHVuY29tbWVudCB0aGlzIHRvIGFsbG93IHNob3J0IHN0cmluZ3MgdG8gY29uY2F0IGZsYXQK
ICAgICNTSE9SVCA9IDY0CiAgICAjZGVmIDw8KG90aGVyKQogICAgIyAgICBpZiBvdGhlci5sZW5n
dGg9PTAKICAgICMgICAgICAgIHNlbGYKICAgICMgICAgZWxzaWYgQGRhdGEubGVuZ3RoK290aGVy
Lmxlbmd0aDw9U0hPUlQKICAgICMgICAgICAgICMganVzdCBtZXJnZSB0aGUgc3RyaW5ncyBpZiB0
aGUgdG90YWwgbGVuZ3RoIGlzIHNob3J0CiAgICAjICAgICAgICBzZWxmLmNsYXNzLm5ldyhAZGF0
YStvdGhlci50b19zKQogICAgIyAgICBlbHNlCiAgICAjICAgICAgICBzdXBlcgogICAgIyAgICBl
bmQKICAgICNlbmQKICAgIGRlZiBhdChzdGFydCkKICAgICAgICBAZGF0YVtzdGFydF0KICAgIGVu
ZAogICAgZGVmIGVhY2gKICAgICAgICBAZGF0YS5lYWNoX2NoYXIgeyB8eHwgeWllbGQgeCB9CiAg
ICBlbmQKZW5kCgpjbGFzcyBJT1JvcGUgPCBBcnJheVJvcGUKICAgIGluY2x1ZGUgRW51bWVyYWJs
ZQogICAgZGVmIGluaXRpYWxpemUoaW8sIHN0YXJ0PTAsIGxlbmd0aD0oaW8uc2VlaygwLElPOjpT
RUVLX0VORCk7aW8ucG9zLXN0YXJ0KSkKICAgICAgICBAaW8gPSBpbwogICAgICAgIEBzdGFydCA9
IHN0YXJ0CiAgICAgICAgQGxlbmd0aCA9IGxlbmd0aAogICAgZW5kCiAgICBkZWYgbGVuZ3RoCiAg
ICAgICAgQGxlbmd0aAogICAgZW5kCiAgICBkZWYgc2xpY2Uoc3RhcnQsIGxlbikKICAgICAgICBy
ZXR1cm4gc2VsZiBpZiBzdGFydC56ZXJvPyBhbmQgbGVuPT1AbGVuZ3RoCiAgICAgICAgIyBqdXN0
IHJlZmVyZW5jZSBhIGRpZmZlcmVudCBwYXJ0IG9mIHRoZSBJTwogICAgICAgIHNlbGYuY2xhc3Mu
bmV3KEBpbywgQHN0YXJ0K3N0YXJ0LCBsZW4pCiAgICBlbmQKICAgIGRlZiBhdChzdGFydCkKICAg
ICAgICBAaW8ucG9zID0gQHN0YXJ0K3N0YXJ0CiAgICAgICAgQGlvLmdldGMKICAgIGVuZAogICAg
ZGVmIGVhY2gKICAgICAgICBAaW8ucG9zID0gQHN0YXJ0CiAgICAgICAgQGxlbmd0aC50aW1lcyB7
IHlpZWxkIEBpby5nZXRjIH0KICAgIGVuZAogICAgZGVmIHRvX3Mocz0iIikKICAgICAgICBAaW8u
cG9zID0gQHN0YXJ0CiAgICAgICAgcy5jb25jYXQoQGlvLnJlYWQoQGxlbmd0aCkpCiAgICBlbmQK
ZW5kCgoKcmVxdWlyZSAnYmVuY2htYXJrJwoKI1RoaXMgY29kZSBtYWtlIGEgU3RyaW5nL1JvcGUg
b2YgIENIVU5LUyBjaHVua3Mgb2YgdGV4dAojZWFjaCBjaHVuayBpcyBTSVpFIGJ5dGVzIGxvbmcu
ICBFYWNoIGNodW5rIHN0YXJ0cyB3aXRoCiNhbiA4IGJ5dGUgbnVtYmVyLiAgSW5pdGlhbGx5IHRo
ZSBjaHVua3MgYXJlIHNodWZmbGVkIHRoZQojcXNvcnQgbWV0aG9kIHNvcnRzIHRoZW0gaW50byBh
c2NlbmRpbmcgb3JkZXIuCiMKI3Bhc3MgdGhlIG5hbWUgb2YgdGhlIGNsYXNzIHRvIHVzZSBhcyBh
IHBhcmFtZXRlcgojcnVieSAtciByb3BlLnJiIHRoaXNfZmlsZSBSb3BlCgpwdXRzICdwcmVwYXJp
bmcgZGF0YS4uLicKVGV4dENsYXNzID0gT2JqZWN0LmNvbnN0X2dldChBUkdWLnNoaWZ0IHx8IDpT
dHJpbmcpCgpkZWYgcXNvcnQodGV4dCkKIHJldHVybiBUZXh0Q2xhc3MubmV3IGlmIHRleHQubGVu
Z3RoID09IDAKIHBpdm90ID0gdGV4dC5zbGljZSgwLDgpLnRvX3MudG9faQogbGVzcyA9IFRleHRD
bGFzcy5uZXcKIG1vcmUgPSBUZXh0Q2xhc3MubmV3CiBvZmZzZXQgPSA4K1NJWkUKIHdoaWxlIChv
ZmZzZXQgPCB0ZXh0Lmxlbmd0aCkKICAgaSA9IHRleHQuc2xpY2Uob2Zmc2V0LDgpLnRvX3MudG9f
aQogICBpZiBpIDwgcGl2b3QKICAgICAgIGxlc3MgPDw9IHRleHQuc2xpY2Uob2Zmc2V0LDgrU0la
RSkKICAgZWxzZQogICAgICAgbW9yZSA8PD0gdGV4dC5zbGljZShvZmZzZXQsOCtTSVpFKQogICBl
bmQKICAgb2Zmc2V0ID0gb2Zmc2V0ICsgOCtTSVpFCiBlbmQKIHByaW50ICIqIgogcmV0dXJuIHFz
b3J0KGxlc3MpIDw8IHRleHQuc2xpY2UoMCw4K1NJWkUpIDw8IHFzb3J0KG1vcmUpCmVuZAoKc3Jh
bmQoMTIzNDU2Nzg5KQpTSVpFICA9IDUxMiAqIDEwMjQKQ0hVTktTID0gMTI4CkNIQVJTID0gJXdb
UiBPIFAgRV0KYnVsa19zdHJpbmcgPSBBcnJheS5uZXcoU0laRSkgeyBDSEFSU1tyYW5kKDQpXSB9
LmpvaW4KYnVsa190ZXh0ID0gVGV4dENsYXNzLm5ldyhidWxrX3N0cmluZykKYnVpbGQgPSBbXQpk
YXRhID0gbmlsCjgudGltZXMgewpwdXRzICdCdWlsZGluZyBUZXh0Li4uJwpHQy5zdGFydApidWls
ZCA8PCBCZW5jaG1hcmsubWVhc3VyZSBkbwogZGF0YSA9IFRleHRDbGFzcy5uZXcKICgwLi5DSFVO
S1MpLnNvcnRfYnkgeyByYW5kIH0uZWFjaCBkbyB8bnwKICAgZGF0YSA8PD0gVGV4dENsYXNzLm5l
dyhzcHJpbnRmKCIlMDhpIixuKSkgPDwgYnVsa190ZXh0CiBlbmQKIGRhdGEubm9ybWFsaXplICBp
ZiBkYXRhLnJlc3BvbmRfdG8/IDpub3JtYWxpemUKZW5kCn0KYnVpbGQgPSBidWlsZC5pbmplY3Qg
eyB8bWluLGN1cnwgY3VyLnRvdGFsPG1pbi50b3RhbCA/IGN1ciA6IG1pbiB9CgoKIyBzZWxmLWNo
ZWNrIHRoYXQgdGhlIGluZGljZXMgYWRkIHVwCnN1bSA9IDAKMC5zdGVwKGRhdGEubGVuZ3RoLTEs
IDgrU0laRSkgeyB8b2Zmc2V0fAogICAgc3VtICs9IGRhdGEuc2xpY2Uob2Zmc2V0LDgpLnRvX3Mu
dG9faQogICAgYnVsa19kYXRhID0gZGF0YS5zbGljZShvZmZzZXQrOCxTSVpFKS50b19zCiAgICBy
YWlzZSgiI3tidWxrX2RhdGFbMCwxMF19Li4uLiAhPSAje2J1bGtfc3RyaW5nWzAsMTBdfS4uLi4i
KSBpZiBidWxrX2RhdGEhPWJ1bGtfc3RyaW5nCn0KZXhwZWN0ZWRfc3VtID0gKENIVU5LUyooQ0hV
TktTKzEpKS8yCnJhaXNlKCIje3N1bX0hPSN7ZXhwZWN0ZWRfc3VtfSIpIGlmIHN1bSE9ZXhwZWN0
ZWRfc3VtCgpzb3J0ID0gW10Kc29ydGVkX2RhdGEgPSBuaWwKOC50aW1lcyB7CkdDLnN0YXJ0CnNv
cnQgPDwgQmVuY2htYXJrLm1lYXN1cmUgZG8KIHB1dHMgIlNvcnRpbmcgVGV4dC4uLiIKIHNvcnRl
ZF9kYXRhID0gcXNvcnQoZGF0YSkKIHB1dHMiXG4iCmVuZAp9CnNvcnQgPSBzb3J0LmluamVjdCB7
IHxtaW4sY3VyfCBjdXIudG90YWw8bWluLnRvdGFsID8gY3VyIDogbWluIH0KCnB1dHMgIkJ1aWxk
OiAje2J1aWxkfVNvcnQ6ICN7c29ydH0iCgojIHNlbGYtY2hlY2sgdGhhdCB0aGUgaW5kaWNlcyBh
cmUgc29ydGVkCmkgPSAwCjAuc3RlcChzb3J0ZWRfZGF0YS5sZW5ndGgtMSwgOCtTSVpFKSB7IHxv
ZmZzZXR8CiAgICBkYXRhaSA9IHNvcnRlZF9kYXRhLnNsaWNlKG9mZnNldCw4KS50b19zCiAgICBy
YWlzZSgiI3tkYXRhaX0hPSN7aX0iKSBpZiBkYXRhaS50b19pIT1pCiAgICBidWxrX2RhdGEgPSBz
b3J0ZWRfZGF0YS5zbGljZShvZmZzZXQrOCxTSVpFKS50b19zCiAgICByYWlzZSgiI3tidWxrX2Rh
dGFbMCwxMF19Li4uLiAhPSAje2J1bGtfc3RyaW5nWzAsMTBdfS4uLi4iKSBpZiBidWxrX2RhdGEh
PWJ1bGtfc3RyaW5nCiAgICBpICs9IDEKfQoKCg==
------=_Part_17535_22666276.1188802265486--
 
E

Eric Mahurin

My implementation is attached along with a modified test. I made a
few changes to what was asked, but this also gave more flexibility.

Forgot about the results. Here's what I found on my machine:

String
Build: 0.120000 0.080000 0.200000 ( 0.206264)
Sort: 1.680000 0.420000 2.100000 ( 2.112934)
VmPeak: 1192112 kB
VmSize: 493708 kB

StringRope
Build: 0.010000 0.000000 0.010000 ( 0.014414)
Sort: 0.150000 0.020000 0.170000 ( 0.176734)
VmPeak: 38940 kB
VmSize: 37920 kB

so, 20X faster for build and 12.4X faster for sort. Memory looks to
be 10-30X smaller. I ran the build and sort 8 times during the
benchmark and took the min. The memory kept increasing for the String
runs for each of these 8 iterations which doesn't make sense.
 
M

Mauricio Fernandez

--eJnRUKwClWJh1Khz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

by John Miller

This week's task is to implement the Rope data structure as a Ruby class. This
topic comes out of the ICFP programming competition
(http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
character string this year.

I happened to have implemented ropes in OCaml recently, so I generated a Ruby
extension using rocaml to see how well it would perform.

Without further ado, here are the results I'm getting for SIZE = 512 * 1024,
CHUNKS = 512:

$ time ruby -r himadri_choudhury.rb bm.rb Rope
Build: 0.130000 0.000000 0.130000 ( 0.129476)
Sort: 10.340000 0.050000 10.390000 ( 10.648223)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
Build: 0.020000 0.000000 0.020000 ( 0.018946)
Sort: 0.100000 0.000000 0.100000 ( 0.108499)

$ ruby eric_mahurin.rb StringRope
[...]
Build: 0.060000 0.000000 0.060000 ( 0.057299)
Sort: 0.870000 0.000000 0.870000 ( 0.896493)

For SIZE = 1024, CHUNKS = 16384:

$ ruby eric_mahurin.rb StringRope
[...]
Build: 3.470000 0.040000 3.510000 ( 3.588875)
Sort: 89.110000 0.700000 89.810000 ( 92.179962)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
[...]
Build: 0.360000 0.000000 0.360000 ( 0.378352)
Sort: 3.940000 0.040000 3.980000 ( 4.079140)

At that point the pure Ruby rope is taking over 6 times more memory than
the OCaml one. I ascribe this to iv_tbls being very heavy and to memory
fragmentation.

I benchmarked Himadri's implementation first and was surprised by the
exceedingly large speed difference --- I expected one, not two orders of
magnitude for this code, as there's enough Ruby code in common in qsort to
mask the speed gains in the rope operations. However, Eric's solution proved
that it was just due to a slow Ruby implementation.

Here's the interface definition (extconf.rb):


EXT_NAME = "ocaml_rope"
OCAML_PACKAGES = %w[]
CAML_LIBS = %w[]
CAML_OBJS = %w[]
CAML_FLAGS = ""
CAML_INCLUDES = []

require 'rocaml'

Interface.generate("ocaml_rope") do |iface|
def_class("Rope", :under => "OCaml") do |c|
rope = c.abstract_type

fun "empty", UNIT => rope, :as => "new_empty"
fun "of_string", STRING => rope, :as => "new_from_string"

method "sub", [rope, INT, INT] => rope, :as => "slice"
method "concat", [rope, rope] => rope
method "length", rope => INT
method "get", [rope, INT] => INT
method "to_string", rope => STRING, :as => "to_s"
end
end

require 'rocaml_extconf'


As you can see, OCaml::Rope is purely functional, and the interface differs a
bit from that expected by bm.rb (a modified version that works with immutable
ropes is attached), so I adapted it with the following ocaml_rope.rb, which
also loads the extension:


module OCaml # Rope will be placed in this module
end

require "ocaml_rope.so"

module OCaml
class Rope
def self.new(str = "")
case str
when String; new_from_string str
when Rope; str
when ""; new_empty
else new_from_string(str.to_str) rescue new_from_string(str.to_s)
end
end

def prepend(rope)
rope.append(self)
end

alias_method :append, :concat
alias_method :<<, :append
end
end


The OCaml code is attached, in case anybody wants to look at it.
Incidentally, it weighs a bit under 220 lines, which is also the amount taken
by Himadri's and Eric's solutions. Unlike them, rope.ml features O(1)
concatenation for small elements; this accounts for a large part of the code
and the complexity of some patterns. O(1) concatenation doesn't really affect
performance in the use case exerted by bm.rb anyway.

--
Mauricio Fernandez - http://eigenclass.org

--eJnRUKwClWJh1Khz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="bm.rb"

require 'benchmark'

#This code make a String/Rope of CHUNKS chunks of text
#each chunck is SIZE bytes long. Each chunk starts with
#an 8 byte number. Initially the chunks are shuffled the
#qsort method sorts them into ascending order.
#
#pass the name of the class to use as a parameter
#ruby -r rope.rb this_file Rope

puts 'preparing data...'
TextClass = (ARGV.shift || "String").split(/::/).inject(Object){|s,x| s.const_get(x)}

def qsort(text)
return TextClass.new if text.length == 0
pivot = text.slice(0,8).to_s.to_i
less = TextClass.new
more = TextClass.new
offset = 8+SIZE
while (offset < text.length)
i = text.slice(offset,8).to_s.to_i
if i < pivot
less <<= text.slice(offset,8+SIZE)
else
more <<= text.slice(offset,8+SIZE)
end
offset = offset + 8+SIZE
end
#print "*"
return qsort(less) << text.slice(0,8+SIZE) << qsort(more)
end

SIZE = 1 * 1024
CHUNKS = 32768
CHARS = %w[R O P E]
data = TextClass.new
bulk_string =
TextClass.new(Array.new(SIZE) { CHARS[rand(4)] }.join)
puts 'Building Text...'
build = Benchmark.measure do
(0..CHUNKS).sort_by { rand }.each do |n|
data = data << TextClass.new(sprintf("%08i",n)) << bulk_string
end
data = data.normalize if data.respond_to? :normalize
end
GC.start
sort = Benchmark.measure do
puts "Sorting Text..."
qsort(data)
puts"\nEND"
end

puts "Build: #{build}Sort: #{sort}"


--eJnRUKwClWJh1Khz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="rope.ml"

type t =
Empty
(* left, left size, right, right size, height *)
| Concat of t * int * t * int * int
| Leaf of string

type forest_element = { mutable c : t; mutable len : int }

let str_append = (^)
let empty_str = ""
let string_of_string_list l = String.concat "" l

let max_height = 48

let leaf_size = 256

exception Out_of_bounds

let empty = Empty

(* by construction, there cannot be Empty or Leaf "" leaves *)
let is_empty = function Empty -> true | _ -> false

let height = function
Empty | Leaf _ -> 0
| Concat(_,_,_,_,h) -> h

let rec length = function
Empty -> 0
| Leaf s -> String.length s
| Concat(_,cl,_,cr,_) -> cl + cr

let make_concat l r =
let hl = height l and hr = height r in
let cl = length l and cr = length r in
Concat(l, cl, r, cr, if hl >= hr then hl + 1 else hr + 1)

let min_len =
let fib_tbl = Array.make max_height 0 in
let rec fib n = match fib_tbl.(n) with
0 ->
let last = fib (n - 1) and prev = fib (n - 2) in
let r = last + prev in
let r = if r > last then r else last in (* check overflow *)
fib_tbl.(n) <- r; r
| n -> n
in
fib_tbl.(0) <- leaf_size + 1; fib_tbl.(1) <- 3 * leaf_size / 2 + 1;
Array.init max_height (fun i -> if i = 0 then 1 else fib (i - 1))

let max_length = min_len.(Array.length min_len - 1)

let concat_fast l r = match l with
Empty -> r
| Leaf _ | Concat(_,_,_,_,_) ->
match r with
Empty -> l
| Leaf _ | Concat(_,_,_,_,_) -> make_concat l r

(* based on Hans-J. Boehm's *)
let add_forest forest rope len =
let i = ref 0 in
let sum = ref empty in
while len > min_len.(!i+1) do
if forest.(!i).c <> Empty then begin
sum := concat_fast forest.(!i).c !sum;
forest.(!i).c <- Empty
end;
incr i
done;
sum := concat_fast !sum rope;
let sum_len = ref (length !sum) in
while !sum_len >= min_len.(!i) do
if forest.(!i).c <> Empty then begin
sum := concat_fast forest.(!i).c !sum;
sum_len := !sum_len + forest.(!i).len;
forest.(!i).c <- Empty;
end;
incr i
done;
decr i;
forest.(!i).c <- !sum;
forest.(!i).len <- !sum_len

let concat_forest forest =
Array.fold_left (fun s x -> concat_fast x.c s) Empty forest

let rec balance_insert rope len forest = match rope with
Empty -> ()
| Leaf _ -> add_forest forest rope len
| Concat(l,cl,r,cr,h) when h >= max_height || len < min_len.(h) ->
balance_insert l cl forest;
balance_insert r cr forest
| x -> add_forest forest x len (* function or balanced *)

let balance r =
match r with
Empty -> Empty
| Leaf _ -> r
| _ ->
let forest = Array.init max_height (fun _ -> {c = Empty; len = 0}) in
balance_insert r (length r) forest;
concat_forest forest

let bal_if_needed l r =
let r = make_concat l r in
if height r < max_height then r else balance r

let concat_str l = function
Empty | Concat(_,_,_,_,_) -> invalid_arg "concat_str"
| Leaf rs as r ->
let lenr = String.length rs in
match l with
| Empty -> r
| Leaf ls ->
let slen = lenr + String.length ls in
if slen <= leaf_size then Leaf (str_append ls rs)
else make_concat l r (* height = 1 *)
| Concat(ll, cll, Leaf lrs, clr, h) ->
let slen = clr + lenr in
if clr + lenr <= leaf_size then
Concat(ll, cll, Leaf (str_append lrs rs), slen, h)
else
bal_if_needed l r
| _ -> bal_if_needed l r

let append_char c r = concat_str r (Leaf (String.make 1 c))

let concat l = function
Empty -> l
| Leaf _ as r -> concat_str l r
| Concat(Leaf rls,rlc,rr,rc,h) as r ->
(match l with
Empty -> r
| Concat(_,_,_,_,_) -> bal_if_needed l r
| Leaf ls ->
let slen = rlc + String.length ls in
if slen <= leaf_size then
Concat(Leaf(str_append ls rls), slen, rr, rc, h)
else
bal_if_needed l r)
| r -> (match l with Empty -> r | _ -> bal_if_needed l r)

let prepend_char c r = concat (Leaf (String.make 1 c)) r

let rec get i = function
Empty -> raise Out_of_bounds
| Leaf s ->
if i >= 0 && i < String.length s then String.unsafe_get s i
else raise Out_of_bounds
| Concat (l, cl, r, cr, _) ->
if i < cl then get i l
else get (i - cl) r

let of_string = function
s when String.length s = 0 -> Empty
| s ->
let min (x:int) (y:int) = if x <= y then x else y in
let rec loop r s len i =
if i < len then (* len - i > 0, thus Leaf "" can't happen *)
loop (concat r (Leaf (String.sub s i (min (len - i) leaf_size))))
s len (i + leaf_size)
else
r
in loop Empty s (String.length s) 0

let rec sub start len = function
Empty -> if start <> 0 || len <> 0 then raise Out_of_bounds else Empty
| Leaf s ->
if len > 0 then (* Leaf "" cannot happen *)
(try Leaf (String.sub s start len) with _ -> raise Out_of_bounds)
else if len < 0 || start < 0 || start > String.length s then
raise Out_of_bounds
else Empty
| Concat(l,cl,r,cr,_) ->
if start < 0 || len < 0 || start + len > cl + cr then raise Out_of_bounds;
let left =
if start = 0 then
if len >= cl then
l
else sub 0 len l
else if start > cl then Empty
else if start + len >= cl then
sub start (cl - start) l
else sub start len l in
let right =
if start <= cl then
let upto = start + len in
if upto = cl + cr then r
else if upto < cl then Empty
else sub 0 (upto - cl) r
else sub (start - cl) len r
in
concat left right

let to_string r =
let rec strings l = function
Empty -> l
| Leaf s -> s :: l
| Concat(left,_,right,_,_) -> strings (strings l right) left
in
string_of_string_list (strings [] r)

let insert start rope r =
concat (concat (sub 0 start r) rope) (sub start (length r - start) r)

let remove start len r =
concat (sub 0 start r) (sub (start + len) (length r - start - len) r)

let () =
let r name v = Callback.register ("Rope." ^ name) v in
r "empty" (fun () -> empty);
r "of_string" of_string;
r "sub" (fun r n m -> sub n m r);
r "concat" concat;
r "length" length;
r "get" (fun r i -> get i r);
r "to_string" to_string

--eJnRUKwClWJh1Khz--
 
G

Gustav Munkby

------=_Part_17604_4118040.1188812565770
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

by John Miller

This week's task is to implement the Rope data structure as a Ruby class. This
topic comes out of the ICFP programming competition
(http://www.icfpcontest.com/) which had competitors manipulating a 7.5 million
character string this year.

My solution deviates slightly from the problem specification.
The most important difference is that instead of implementing
a binary tree to store the strings, they are all stored in an
array instead.

The class itself is quite long, since I wanted to implement
many of the methods of the built-in String class. Some of the
methods will require significant work to actually implement.
Most notably the Regexp-related methods, since there is no
way to instruct the Regexp matcher to ask for more characters
once it has reached the end of a string.

Actually, all String methods are implemented, by implementing
method missing and delegating to the composed string if the
string class can handle the method. After delegating to the
string class, we convert any String result to a new rope and
we also make sure to replace our content by the string we
delegated to, to make sure that destructive methods works as
expected.

In fact, we replace the content of our rope as soon as to_s
is called. since the reason for lazy concatenation is to
avoid the concatenation cost, we can just as well keep the
concatenated string when we already had to pay the price.

The benchmark results are:

# String
Build: 0.170000 0.150000 0.320000 ( 0.324341)
Sort: 3.470000 3.630000 7.100000 ( 7.126741)

# ArrayRope
Build: 0.010000 0.010000 0.020000 ( 0.009744)
Sort: 0.130000 0.000000 0.130000 ( 0.138330)

# ArrayRope (with dup)
Build: 0.240000 0.160000 0.400000 ( 0.402201)
Sort: 0.160000 0.000000 0.160000 ( 0.163022)

For the unprotected case, sorting was ~52 times faster,
and building was ~33 times faster.

However, since the string class is not immutable, there is a
slight problem with this approach. The strings could added
to the rope could be modified "under the hood". We can easily
protect against that by making copies of the strings when we
add them to the rope. Since the built-in String shares the
actual data between the two instances (until they change),
this is not so much of a memory issue as one could expect.

By adding dup (in initialize/append/prepend) we end up with
the following times, which trades of some of our speed/memory
for a bit of safety. (This is actually the submitted solution)

Compared with the string approach, building is now (for obvious
reasons) slower than the String, but only about 25%.
Sorting is still much faster than the String case, namely ~44
times as fast.

!g

------=_Part_17604_4118040.1188812565770
Content-Type: application/octet-stream; name=array_rope.rb
Content-Transfer-Encoding: base64
X-Attachment-Id: f_f64rore4
Content-Disposition: attachment; filename="array_rope.rb"

Y2xhc3MgQXJyYXkKICBkZWYgdXBwZXJfYm91bmQodmFsdWUpCiAgICBsbywgaGkgPSAwLCBzaXpl
LTEKICAgIHdoaWxlIGxvIDw9IGhpCiAgCSAgbWlkID0gKGxvICsgaGkpIC8gMgogIAkgIGlmIHZh
bHVlID49IGF0KG1pZCkKCSAgICAgIGxvID0gbWlkICsgMQogIAkgIGVsc2UKCSAgICAgIGhpID0g
bWlkIC0gMQogIAkgIGVuZAogIAllbmQKICAgIGxvCiAgZW5kCmVuZAoKY2xhc3MgQXJyYXlSb3Bl
CiAgYXR0cl9yZWFkZXIgOnNlZ21lbnRzCiAgZGVmIGluaXRpYWxpemUoKnNlZ21lbnRzKQogICAg
cmFpc2Ugc2VnbWVudHMuaW5zcGVjdCBpZiBzZWdtZW50cy5hbnk/IHsgfHN8IHMubmlsPyB9CiAg
ICBAc2VnbWVudHMgPSBzZWdtZW50cy5tYXAgeyB8c3wgcy5kdXAgfQogICAgY29tcHV0ZV9vZmZz
ZXRzCiAgZW5kCgogIGRlZiBjb21wdXRlX29mZnNldHMKICAgIGwgPSAwCiAgICBAb2Zmc2V0cyA9
IEBzZWdtZW50cy5tYXAgeyB8c3wgbCArPSBzLmxlbmd0aCB9CiAgICBAb2Zmc2V0cy51bnNoaWZ0
IDAgICAgCiAgZW5kCgogIGRlZiBsZW5ndGgKICAgIEBvZmZzZXRzLmxhc3QKICBlbmQKICBhbGlh
cyA6c2l6ZSA6bGVuZ3RoCiAgZGVmIGVtcHR5PwogICAgQHNlZ21lbnRzLmVtcHR5PwogIGVuZAoK
ICBkZWYgdG9fcwogICAgY2FzZSBAc2VnbWVudHMuc2l6ZQogICAgd2hlbiAwOyAiIgogICAgd2hl
biAxOyBAc2VnbWVudHMuZmlyc3QuZHVwCiAgICBlbHNlCiAgICAgICMgV2hlbiBjb252ZXJ0aW5n
IHRvIGEgc3RyaW5nLCB3ZSBtdXN0IGRvIHRoZSBleHBlbnNpdmUKICAgICAgIyBjb25jYXRlbmF0
aW9uLCBzbyBsZXRzIGhpamFjayB0aGF0IGluZm9ybWF0aW9uIGFuZCB1c2UgdGhlCiAgICAgICMg
bmV3IHN0cmluZyBpbnRlcm5hbGx5IGFzIHdlbGwuCiAgICAgIHMgPSAoQHNlZ21lbnRzICogIiIp
CiAgICAgIEBzZWdtZW50cywgQG9mZnNldHMgPSBbc10sIFswLHMubGVuZ3RoXQogICAgICBzLmR1
cAogICAgZW5kCiAgZW5kCiAgCiAgZGVmIG1ldGhvZF9taXNzaW5nKG0sICphcmdzLCAmYmxvY2sp
CiAgICAjIE1ha2Ugc3VyZSB3ZSBoYW5kbGUgcmVtYWluaW5nIHN0cmluZyBtZXRob2RzIGFjY29y
ZGluZ2x5CiAgICBpZiAiIi5yZXNwb25kX3RvPyhtKQogICAgICBzID0gdG9fcwogICAgICByID0g
cy5fX3NlbmRfXyhtLCAqYXJncywgJmJsb2NrKQoKICAgICAgIyBHdWVzcyB0aGF0IHRoaXMgaXMg
dGhlIHJpZ2h0IHRoaW5nIHRvIGRvCiAgICAgIGlmIHIuZXF1YWw/IHMKICAgICAgICByID0gc2Vs
ZiAKICAgICAgZWxzaWYgci5pc19hPyBTdHJpbmcKICAgICAgICByID0gQXJyYXlSb3BlLm5ldyBy
CiAgICAgIGVuZAoKICAgICAgIyBUaGUgbWV0aG9kIG1pZ2h0IGNoYW5nZSB0aGUgY29udGVudHMg
b2YgdGhlIHN0cmluZwogICAgICByZXBsYWNlIHMKCiAgICAgIHIKICAgIGVsc2UKICAgICAgc3Vw
ZXIobSwgKmFyZ3MsICZibG9jaykKICAgIGVuZAogIGVuZAogIAogIGRlZiByZXNwb25kc190bz8o
bSkKICAgICMgTWFrZSBzdXJlIHdlIGhhbmRsZSByZW1haW5pbmcgc3RyaW5nIG1ldGhvZHMgYWNj
b3JkaW5nbHkKICAgIHN1cGVyIHx8ICIiLnJlc3BvbmRzX3RvPyhtKQogIGVuZAogIAogIGRlZiBh
cHBlbmQoKnNlZ21lbnRzKQogICAgc2VnbWVudHMuZWFjaCBkbyB8c3wKICAgICAgaWYgcy5pc19h
PyBBcnJheVJvcGUKICAgICAgICBhcHBlbmQgKnMuc2VnbWVudHMKICAgICAgZWxzaWYgIXMuZW1w
dHk/CiAgICAgICAgQHNlZ21lbnRzIDw8IHMuZHVwCiAgICAgICAgQG9mZnNldHMgPDwgQG9mZnNl
dHMubGFzdCArIHMubGVuZ3RoCiAgICAgIGVuZAogICAgZW5kCiAgICBzZWxmCiAgZW5kCiAgYWxp
YXMgOjw8IDphcHBlbmQKICBkZWYgcHJlcGVuZCgqc2VnbWVudHMpCiAgICBzZWdtZW50cy5lYWNo
IGRvIHxzfAogICAgICBpZiBzLmlzX2E/IEFycmF5Um9wZQogICAgICAgIHByZXBlbmQgKnMuc2Vn
bWVudHMKICAgICAgZWxzaWYgIXMuZW1wdHk/CiAgICAgICAgQHNlZ21lbnRzLnVuc2hpZnQgcy5k
dXAKICAgICAgICBjb21wdXRlX29mZnNldHMKICAgICAgZW5kCiAgICBlbmQKICAgIHNlbGYKICBl
bmQKICAKICBkZWYgc2xpY2Uoc3RhcnQsbGVuZ3RoPW5pbCkKICAgIHJldHVybiB0b19zLnNsaWNl
KHN0YXJ0LCBsZW5ndGh8fDApIGlmIHN0YXJ0LmlzX2E/IFJlZ2V4cAogICAgcmV0dXJuIGluZGV4
KHN0YXJ0KSBpZiBzdGFydC5pc19hPyBTdHJpbmcKICAgIGlmICFsZW5ndGggJiYgc3RhcnQuaXNf
YT8oUmFuZ2UpCiAgICAgIGxlbmd0aCA9IHN0YXJ0Lmxhc3QgLSBzdGFydC5maXJzdCAtIChleGNs
dWRlX2VuZD8gPyAxIDogMCkKICAgICAgc3RhcnQgPSBzdGFydC5maXJzdAogICAgZW5kCiAgICBz
dGFydCA9IHNlbGYubGVuZ3RoICsgc3RhcnQgaWYgc3RhcnQgPCAwCiAgICBpZiBzdGFydCA8IDAg
fHwgc3RhcnQgPiBzZWxmLmxlbmd0aAogICAgICBuaWwKICAgIGVsc2lmICFsZW5ndGgKICAgICAg
QHNlZ21lbnRzLmRldGVjdCB7IHxzfCAoc3RhcnQgLT0gcy5sZW5ndGgpIDwgMCB9LnNsaWNlKHN0
YXJ0KQogICAgZWxzZQogICAgICBpbmRleCA9IEBvZmZzZXRzLnVwcGVyX2JvdW5kKHN0YXJ0KSAt
IDEKICAgICAgcm9wZSA9IEFycmF5Um9wZS5uZXcoCiAgICAgICAgQHNlZ21lbnRzLmF0KGluZGV4
KS5zbGljZShzdGFydCAtIEBvZmZzZXRzLmF0KGluZGV4KSwgbGVuZ3RoKSkKICAgICAgcmVtYWlu
ID0gbGVuZ3RoIC0gcm9wZS5sZW5ndGgKICAgICAgd2hpbGUgcmVtYWluID4gMCAmJiAoaW5kZXgg
Kz0gMSkgPCBAc2VnbWVudHMubGVuZ3RoCiAgICAgICAgcm9wZSA8PCBAc2VnbWVudHMuYXQoaW5k
ZXgpLnNsaWNlKDAsIHJlbWFpbikKICAgICAgICByZW1haW4gPSBsZW5ndGggLSByb3BlLmxlbmd0
aAogICAgICBlbmQKICAgICAgcm9wZQogICAgZW5kCiAgZW5kCiAgYWxpYXMgOltdIDpzbGljZQog
IAogIGRlZiBkdXA7IEFycmF5Um9wZS5uZXcoKkBzZWdtZW50cyk7IGVuZAogIGFsaWFzIDpjbG9u
ZSA6ZHVwCiAgCiAgZGVmICsob3RoZXIpOyBkdXAgPDwgb3RoZXI7IGVuZAoKICBpbmNsdWRlIEVu
dW1lcmFibGUKICBkZWYgZWFjaCAmYmxvY2sKICAgIGxhc3QgPSBuaWwKICAgIEBzZWdtZW50cy5l
YWNoIGRvIHxzfAogICAgICBzLmVhY2ggZG8gfGx8CiAgICAgICAgaWYgbFstMV0gPT0gP1xuCiAg
ICAgICAgICB5aWVsZChsYXN0ID8gIiN7bGFzdH0je2x9IiA6IGwpCiAgICAgICAgICBsYXN0ID0g
bmlsCiAgICAgICAgZWxzZQogICAgICAgICAgbGFzdCA9ICIje2xhc3R9I3tsfSIKICAgICAgICBl
bmQKICAgICAgZW5kCiAgICBlbmQKICAgIHlpZWxkIGxhc3QgaWYgbGFzdAogICAgc2VsZgogIGVu
ZAogIGRlZiBlYWNoX2J5dGUgJmJsb2NrCiAgICBAc2VnbWVudHMuZWFjaCB7IHxzfCBzLmVhY2hf
Ynl0ZSAmYmxvY2sgfQogICAgc2VsZgogIGVuZAoKICBpbmNsdWRlIENvbXBhcmFibGUKICAld1s8
PT4gY2FzZWNtcF0uZWFjaCBkbyB8bXwKICAgIG1vZHVsZV9ldmFsIDw8LUVPTQogICAgICBkZWYg
I3ttfShvdGhlcikKICAgICAgICByZXN1bHQgPSAwCiAgICAgICAgQHNlZ21lbnRzLnppcChAb2Zm
c2V0cykgZG8gfHNlZ21lbnQsb2Zmc2V0fAogICAgICAgICAgcmVzdWx0ID0gb3RoZXIuc2xpY2Uo
b2Zmc2V0LCBzZWdtZW50Lmxlbmd0aCkuI3ttfShzZWdtZW50KQogICAgICAgICAgYnJlYWsgaWYg
cmVzdWx0ICE9IDAKICAgICAgICBlbmQKICAgICAgICAtcmVzdWx0CiAgICAgIGVuZAogICAgRU9N
CiAgZW5kCiAgZGVmID09KG90aGVyKQogICAgQG9mZnNldHMubGFzdCA9PSBvdGhlci5sZW5ndGgg
JiYgKHNlbGYgPD0+IG90aGVyKSA9PSAwCiAgZW5kCiAgYWxpYXMgOmVxbD8gOj09CiAgCiAgZGVm
IGhhc2g7IEBzZWdtZW50cy5pbmplY3QoMCkgeyB8aCwgc3wgaCArIHMuaGFzaCB9OyBlbmQKCiAg
ZGVmIHJlcGxhY2Ugc3RyCiAgICBAc2VnbWVudHMsIEBvZmZzZXRzID0gW10sIFswXQogICAgYXBw
ZW5kIHN0cgogIGVuZAoKICAld1tkb3duY2FzZSB1cGNhc2UgY2FwaXRhbGl6ZSBzd2FwY2FzZV0u
ZWFjaCBkbyB8bXwKICAgIG1vZHVsZV9ldmFsIDw8LUVPTQogICAgICBkZWYgI3ttfSEKICAgICAg
ICBAc2VnbWVudHMuaW5qZWN0KG5pbCkgZG8gfHIsc3wKICAgICAgICAgIHMuI3sgbSB9ISA/IHNl
bGYgOiByCiAgICAgICAgZW5kCiAgICAgIGVuZAogICAgRU9NCiAgZW5kCiAgCiAgZGVmIHJzdHJp
cCEKICAgIHdoaWxlIEBzZWdtZW50cy5sYXN0LnJzdHJpcCEKICAgICAgaWYgQHNlZ21lbnRzLmxh
c3QuZW1wdHk/CiAgICAgICAgQHNlZ21lbnRzLnBvcAogICAgICAgIEBvZmZzZXRzLnBvcAogICAg
ICBlbHNlCiAgICAgICAgQG9mZnNldHNbLTFdID0gQG9mZnNldHNbLTJdICsgQHNlZ21lbnRzWy0x
XS5sZW5ndGgKICAgICAgICByZXR1cm4gc2VsZgogICAgICBlbmQKICAgIGVuZAogIGVuZAoKICBk
ZWYgbHN0cmlwIQogICAgd2hpbGUgQHNlZ21lbnRzLmZpcnN0LmxzdHJpcCEKICAgICAgaWYgQHNl
Z21lbnRzLmZpcnN0LmVtcHR5PwogICAgICAgIEBzZWdtZW50cy5zaGlmdAogICAgICBlbHNlCiAg
ICAgICAgY29tcHV0ZV9vZmZzZXRzCiAgICAgICAgcmV0dXJuIHNlbGYKICAgICAgZW5kCiAgICBl
bmQKICBlbmQKCiAgZGVmIHN0cmlwIQogICAgcnN0cmlwISA/IGxzdHJpcCEgfHwgc2VsZiA6IGxz
dHJpcCEKICBlbmQKCiAgZGVmIGNob3AhCiAgICB1bmxlc3MgQHNlZ21lbnRzLmVtcHR5PwogICAg
ICBpZiBAc2VnbWVudHMubGFzdC5jaG9wIQogICAgICAgIGlmIEBzZWdtZW50cy5sYXN0LmVtcHR5
PwogICAgICAgICAgQHNlZ21lbnRzLnBvcAogICAgICAgICAgQG9mZnNldHMucG9wCiAgICAgICAg
ZWxzZQogICAgICAgICAgQG9mZnNldHNbLTFdIC09IDEKICAgICAgICBlbmQKICAgICAgICByZXR1
cm4gc2VsZgogICAgICBlbmQKICAgIGVuZAogIGVuZAogIAogIGRlZiBjaG9tcCEoc2VwPSQvKQog
ICAgdW5sZXNzIEBzZWdtZW50cy5lbXB0eT8KICAgICAgaWYgQHNlZ21lbnRzLmxhc3QuY2hvbXAh
KHNlcCkKICAgICAgICBpZiBAc2VnbWVudHMubGFzdC5lbXB0eT8KICAgICAgICAgIEBzZWdtZW50
cy5wb3AKICAgICAgICAgIEBvZmZzZXRzLnBvcAogICAgICAgIGVsc2UKICAgICAgICAgIEBvZmZz
ZXRzWy0xXSAtPSAxCiAgICAgICAgZW5kCiAgICAgICAgcmV0dXJuIHNlbGYKICAgICAgZW5kCiAg
ICBlbmQKICBlbmQKICAKICBkZWYgbmV4dCEKICAgIChAc2VnbWVudHMubGVuZ3RoLTEpLmRvd250
bygwKSBkbyB8aXwKICAgICAgcyA9IEBzZWdtZW50cy5hdChpKQogICAgICBuID0gcy5uZXh0CiAg
ICAgIHIgPSBuLmxlbmd0aCA+IHMubGVuZ3RoCiAgICAgIHMucmVwbGFjZSBuCiAgICAgIHMuc2xp
Y2UhKC0xKSBpZiByICYmIGkgPiAwCiAgICAgIGJyZWFrIHVubGVzcyByCiAgICBlbmQKICAgIHNl
bGYKICBlbmQKCiAgJXdbZG93bmNhc2UgdXBjYXNlIGNhcGl0YWxpemUgc3dhcGNhc2UgbHN0cmlw
IHJzdHJpcCBzdHJpcCBjaG9wIGNob21wIG5leHRdLmVhY2ggZG8gfG18CiAgICBtb2R1bGVfZXZh
bCA8PC1FT00KICAgICAgZGVmICN7bX1zdHJpcAogICAgICAgIGQgPSBkdXAKICAgICAgICBkLiN7
bX1zdHJpcCEKICAgICAgICBkCiAgICAgIGVuZAogICAgRU9NCiAgZW5kCmVuZAo=
------=_Part_17604_4118040.1188812565770--
 
M

Mauricio Fernandez

I happened to have implemented ropes in OCaml recently, so I generated a Ruby
extension using rocaml to see how well it would perform.
[...]
For SIZE = 1024, CHUNKS = 16384:

$ ruby eric_mahurin.rb StringRope
[...]
Build: 3.470000 0.040000 3.510000 ( 3.588875)
Sort: 89.110000 0.700000 89.810000 ( 92.179962)

$ time ruby -rocaml_rope bm.rb OCaml::Rope
[...]
Build: 0.360000 0.000000 0.360000 ( 0.378352)
Sort: 3.940000 0.040000 3.980000 ( 4.079140)

Also for laughs, playing with the GC parameters and with a qsort implemented
in OCaml:

$ OCAMLRUNPARAM=s=512k ruby -rocaml_rope bm.rb OCaml::Rope
[...]
Build: 0.290000 0.000000 0.290000 ( 0.290908)
Sort: 3.410000 0.040000 3.450000 ( 3.547792)
Sort': 0.830000 0.000000 0.830000 ( 0.845180)

Yielding the expected >100x gain over Ruby.

The interface part:

method "qsort", [rope, INT] => rope
method "qsort2", [rope, INT] => rope


I implemented the qsort function both in functional and imperative style, for
the sake of those who prefer something that reads similarly to the Ruby
version. The two variants are equally fast.


let (<<) = concat (* OCaml allows you to define new operators *)
let to_i = int_of_string
let to_s = to_string

let rec qsort' size = function
Empty -> Empty
| rope ->
let pivot = to_i (to_s (sub 0 8 rope)) in
let len = 8 + size in
let less = ref Empty in
let more = ref Empty in
let off = ref len in
while !off < length rope do
let slice = sub !off len rope in
if to_i (to_s (sub !off 8 rope)) < pivot then
less := !less << slice
else
more := !more << slice;
off := !off + len
done;
qsort' size !less << sub 0 len rope << qsort' size !more


let rec qsort size = function
Empty -> Empty
| rope ->
let rec loop r pivot off len max less more =
if off < max then begin
if to_i (to_s (sub off 8 r)) < pivot then
loop r pivot (off+len) len max (less << (sub off len r)) more
else
loop r pivot (off+len) len max less (more << (sub off len r))
end else (less, more) in

let pivot = to_i (to_s (sub 0 8 rope)) in
let len = 8 + size in
let less, more = loop rope pivot len len (length rope) Empty Empty in
qsort size less << sub 0 len rope << qsort size more
 
A

Ari Brown

Ok, So I'm still trying to figure out what stores the characters in a
Rope. A string or an array?

Judging from the fact that you have ArrayRope, I'm thinking it might
be an Array.

# ArrayRope
Build: 0.010000 0.010000 0.020000 ( 0.009744)
Sort: 0.130000 0.000000 0.130000 ( 0.138330)

# ArrayRope (with dup)
Build: 0.240000 0.160000 0.400000 ( 0.402201)
Sort: 0.160000 0.000000 0.160000 ( 0.163022)

Help!
Ari
--------------------------------------------|
If you're not living on the edge,
then you're just wasting space.
 
G

Gustav Munkby

Ok, So I'm still trying to figure out what stores the characters in a
Rope. A string or an array?

Judging from the fact that you have ArrayRope, I'm thinking it might
be an Array.

The actual characters are stored in strings. The rope is an array of strings,
stored in the @segments instance variable.

!g
 
E

Eugene Kalenkovich

Here is my solution. As a side note, it was probably not the best idea to
stick to the benchmarking code in the quiz, as it forced not the best design
decisions (I'd rather not have default constructor for Rope, and for sure -
no normalizing 'in place'). But - what's done is done.
For slice variants I've decided to have two argument ones in #slice, and one
arg - in #[]

Results:
String

Build: 0.188000 0.032000 0.220000 ( 0.219000)
Sort: 1.578000 0.843000 2.421000 ( 2.422000)

Rope

Build: 0.031000 0.000000 0.031000 ( 0.031000)
Sort: 0.203000 0.000000 0.203000 ( 0.234000)

Code:
-------------------------------------------------------------------------------------------------------------------------
class NilClass
def length; 0; end
end

class String
def shift
return nil if empty?
res=self[0]
self[0]=""
res
end
end

class Rope
attr_reader :length, :left, :right

def initialize(left=nil,right=nil)
@left=left if left
@right=right if right
@length=left.length+right.length
end

def append(what)
len=what.length
if (len>0)
@left=self.dup if @right.length>0
@right=what
@length+=len
end
self
end

alias << append

def prepend(what)
len=what.length
if (len>0)
@right=self.dup if @left.length>0
@left=what
@length+=len
end
self
end

def to_s
@[email protected]_s
end

def [](i)
return i.match(self.to_s)[0] if i.kind_of? Regexp
if i.kind_of? Range
pos,last=i.first,i.last
pos=@length+pos if pos<0
last=@length+last if last<0
return nil if pos<0 || last<0
return slice(pos,last-pos+1)
end
i=@length+i if i<0
return nil if i<0 || i>@length-1
[email protected]
i<llen ? @left : @right[i-llen]
end

def []=(i,val)
#fixnum only
i=@length+i if i<0
""=0 if i<0 || i>@length-1
@length+=val.length-1
[email protected]
i<llen ? @left=val : @right[i-llen]=val
end

def slice(pos,len)
return pos.match(self.to_s)[len] if pos.kind_of? Regexp
pos=@length+pos if pos<0
return nil if pos<0 || len<0 || pos>@length-1
[email protected]
return @left.slice(pos,len) if pos+len<=llen
return @right.slice(pos-llen, len) if pos>=llen
Rope.new(@left.slice(pos,len),@right.slice(0,len+pos-llen))
end

def shift
return nil if @length==0
@length-=1
[email protected]>0 ? @left.shift : @right.shift
@left=nil if @left.length==0
@right=nil if @right.length==0
res
end

def normalize
r=Rebalancer.new(@length)
self.traverse { |str| r.append(str) }
@left,@right=r.get_ropes
end

def traverse(&blck)
@left.kind_of?(String) ? yield(@left) : @left.traverse(&blck) if @left
@right.kind_of?(String) ? yield(@right) : @right.traverse(&blck) if
@right
end

end

class Rebalancer
def initialize len
@limits=[1,2]
@slots=[]
n=2
@limits<<n=@limits[-2]+@limits[-1] while n<len
end

def append str
@slots[0]=@slots[0] ? Rope.new(@slots[0],str) : str
i=0
while @slots.length>@limits
@slots[i+1]=@slots[i+1] ? Rope.new(@slots[i+1],@slots) : @slots
@slots=nil
i+=1
end
end

def get_ropes
@slots.compact!
(@slots.length-1).times { |i|
@slots[i+1]=@slots[i+1] ? Rope.new(@slots[i+1],@slots) : @slots
@slots=nil
i+=1
}
[@slots[-1].left,@slots[-1].right]
end
end
 
A

ara.t.howard

CPU(user+sys,sec) mem(peak,MB)
------------------- ------------
build sort total build sort class
----- ---- ----- ----- ---- -----
0.10 1.70 1.80 287 1327 String
0.01 0.27 0.28 22 153 Porth::Rope
0.02 0.83 0.85 22 34 Choudhury::Rope *
0.02 0.06 0.08 22 29 Mahurin::StringRope +
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope +
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.02 0.73 0.75 22 655 Kalenkovich::Rope *

CPU : minimum from 40 iterations
mem : peak over the 40 iterations


enough of this nonsense already! where is the gem! ;)


a @ http://drawohara.com/
 
C

Carl Porth

Eric, thanks for taking the time to run benchmarks for everyone's
solutions. It clears things up when they're run all on the same
machine.

Carl
 
E

Eugene Kalenkovich

"Eric Mahurin" wrote in message
CPU(user+sys,sec) mem(peak,MB)
------------------- ------------
build sort total build sort class
----- ---- ----- ----- ---- -----
0.10 1.70 1.80 287 1327 String
0.01 0.27 0.28 22 153 Porth::Rope
0.02 0.83 0.85 22 34 Choudhury::Rope *
0.02 0.06 0.08 22 29 Mahurin::StringRope +
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope +
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.02 0.73 0.75 22 655 Kalenkovich::Rope *

CPU : minimum from 40 iterations
mem : peak over the 40 iterations

* : self-checking failed
+ : immutable benchmark needed

Eric, thanks a lot for your test, it helped me to find a bug in my solution.
The only thing I do not understand - you've changed the game rules on flight
by making "data = data.normalize" instead of "data.normalize". I'm wondering
how could you get sorting time > 0 for my solution (as it played by old
rules and would give you nothing for such assignment)? :)

Anyway, I'm posting a fixed solution (for the bug and for your test) in the
next message, and I'd appreciate if you can re-run it

-- EK
 
E

Eugene Kalenkovich

Fixes for previously posted solution:
- empty node guard in slice
- normalize returning self for Eric's test
------------------------------------------------------------------------------

class NilClass
def length; 0; end
end

class String
def shift
return nil if empty?
res=self[0]
self[0]=""
res
end
end

class Rope
attr_reader :length, :left, :right

def initialize(left=nil,right=nil)
@left=left if left
@right=right if right
@length=left.length+right.length
end

def append(what)
len=what.length
if (len>0)
@left=self.dup if @right.length>0
@right=what
@length+=len
end
self
end

alias << append

def prepend(what)
len=what.length
if (len>0)
@right=self.dup if @left.length>0
@left=what
@length+=len
end
self
end

def to_s
@left.to_s + @right.to_s
end

def [](i)
return i.match(self.to_s)[0] if i.kind_of? Regexp
if i.kind_of? Range
pos,last=i.first,i.last
pos = @length+pos if pos<0
last = @length+last if last<0
return nil if pos<0 || last<0
return slice(pos,last-pos+1)
end
i = @length+i if i<0
return nil if i<0 || i>@length-1
llen = @left.length
i<llen ? @left : @right[i-llen]
end

def []=(i,val)
#fixnum only
i = @length+i if i<0
""=0 if i<0 || i> @length-1
@length+=val.length-1
llen = @left.length
i<llen ? @left=val : @right[i-llen]=val
end

def slice(pos,len)
return pos.match(self.to_s)[len] if pos.kind_of? Regexp
pos = @length+pos if pos<0
return nil if pos<0 || len<0 || pos>@length-1
llen = @left.length
return @left.slice(pos,len) if pos+len<=llen || ! @right
return @right.slice(pos-llen, len) if pos>=llen
Rope.new(@left.slice(pos,llen-pos),@right.slice(0,len+pos-llen))
end

def shift
return nil if @length==0
@length-=1
res = @left.length>0 ? @left.shift : @right.shift
@left=nil if @left.length==0
@right=nil if @right.length==0
res
end

def normalize
r=Rebalancer.new(@length)
self.traverse { |str| r.append(str) }
@left, @right=r.get_ropes
self
end

def traverse(&blck)
@left.kind_of?(String) ? yield( @left) : @left.traverse(&blck) if @left
@right.kind_of?(String) ? yield( @right) : @right.traverse(&blck) if
@right
end

end

class Rebalancer
def initialize len
@limits=[1,2]
@slots=[]
n=2
@limits<< n = @limits[-2] + @limits[-1] while n<len
end

def append str
@slots[0] = @slots[0] ? Rope.new( @slots[0],str) : str
i=0
while @slots.length>@limits
@slots[i+1] = @slots[i+1] ? Rope.new( @slots[i+1],@slots) :
@slots
@slots = nil
i+=1
end
end

def get_ropes
@slots.compact!
(@slots.length-1).times { |i|
@slots[i+1]=@slots[i+1] ? Rope.new(@slots[i+1],@slots) : @slots
@slots=nil
i+=1
}
[@slots[-1].left,@slots[-1].right]
end
end
 
G

Gustav Munkby

CPU(user+sys,sec) mem(peak,MB)
------------------- ------------
build sort total build sort class
----- ---- ----- ----- ---- -----
0.10 1.70 1.80 287 1327 String
0.01 0.27 0.28 22 153 Porth::Rope
0.02 0.83 0.85 22 34 Choudhury::Rope *
0.02 0.06 0.08 22 29 Mahurin::StringRope +
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope +
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.02 0.73 0.75 22 655 Kalenkovich::Rope *

Thanks for running the tests! I do think it is a shame, however,
that the table does not include the notion of safety that I mentioned
in my submission. If any of these classes should every go into a
gem as ara mentioned, the user of that gem at least ought to be
able to choose the safe version.

Here is a very simple scenario where such a problem is introduced.
One could probably do worse changes to the source strings, but this
was the first example I came up with.

ss = %w[test append]
r1, r2 = ss.map {|x| Mahurin::StringRope.new(x) }
r1 <<= r2
ss.first << "ing"
r1.length # => 10
r1.to_s.length # => 13

I have only tested Mahurin::StringRope, but would like to know for
which other classes I need to be extra careful when adding strings
to the rope. I did find, however, that by removing my "safety dups",
the combined running time (on my machine) is about the same for
my solution as for Mahurin::StringRope.

!g
 
E

Eric Mahurin

CPU(user+sys,sec) mem(peak,MB)
------------------- ------------
build sort total build sort class
----- ---- ----- ----- ---- -----
0.10 1.70 1.80 287 1327 String
0.01 0.27 0.28 22 153 Porth::Rope
0.02 0.83 0.85 22 34 Choudhury::Rope *
0.02 0.06 0.08 22 29 Mahurin::StringRope +
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope +
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.02 0.73 0.75 22 655 Kalenkovich::Rope *

Thanks for running the tests! I do think it is a shame, however,
that the table does not include the notion of safety that I mentioned
in my submission. If any of these classes should every go into a
gem as ara mentioned, the user of that gem at least ought to be
able to choose the safe version.

Here is a very simple scenario where such a problem is introduced.
One could probably do worse changes to the source strings, but this
was the first example I came up with.

ss = %w[test append]
r1, r2 = ss.map {|x| Mahurin::StringRope.new(x) }
r1 <<= r2
ss.first << "ing"
r1.length # => 10
r1.to_s.length # => 13

I have only tested Mahurin::StringRope, but would like to know for
which other classes I need to be extra careful when adding strings
to the rope. I did find, however, that by removing my "safety dups",
the combined running time (on my machine) is about the same for
my solution as for Mahurin::StringRope.

My classes that do the work are are immutable (there is a wrapper
class that makes a mutable rope). I assumed that original data you
give (strings) won't change. The caller should know whether the input
data will change or not and dup if necessary.

Some of the methods in your ArrayRope do modify the underlying
strings. Because of that, you really need to dup the input. But, you
could easily fix those few methods to dup when you want to make a
change. I made a version of your code that doesn't do the dups up
front and benchmarked it.

I assume most of the other mutable rope solutions have similar
problems and need to implement a proper COW (copy-on-write) solution.

At first I thought your solution was linear to do a slice, but then I
noticed that you do a binary search. I applaud your solution. I
think many C++ STL deque implementations use this same scheme (two
level array). Maybe that is why a rope isn't officially in STL -
because deque is good enough.

Here are the new results along with your non-dup solution and
Kalenkovich's fixes:

CPU(user+sys,sec) mem(peak,MB)
------------------- ------------
build sort total build sort class
----- ---- ----- ----- ---- -----
0.10 1.70 1.80 287 1327 String
0.01 0.27 0.28 22 153 Porth::Rope
0.02 0.83 0.85 22 34 Choudhury::Rope *
0.02 0.06 0.08 22 29 Mahurin::StringRope +
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope +
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.00 0.09 0.09 22 22 Munkby::ArrayRope (no dup)
0.01 0.12 0.13 22 22 Kalenkovich::Rope (fixed)

CPU : minimum from 40 iterations
mem : peak over the 40 iterations

* : self-checking failed
+ : immutable benchmark needed


As Eugene Kalenkovich mentioned my benchmark test shouldn't have
excepted self to be returned from normalize for mutable ropes. I
fixed this in my test.
 
E

Eugene Kalenkovich

0.02 0.06 0.08 22 29 Mahurin::StringRope +
0.00 0.08 0.08 22 30 Mahurin::DenormalStringRope +
0.02 0.14 0.16 22 29 Mahurin::MutableStringRope
0.07 0.10 0.17 151 151 Munkby::ArrayRope
0.00 0.09 0.09 22 22 Munkby::ArrayRope (no dup)
0.01 0.12 0.13 22 22 Kalenkovich::Rope (fixed)

Eric, may I ask you to test one more thing: your #slice does not have any
input checks. To make a fair comparison, could you please comment first 3
lines of my #slice, making it:

def slice(pos,len)
#1 return pos.match(self.to_s)[len] if pos.kind_of? Regexp
#2 pos = @length+pos if pos<0
#3 return nil if pos<0 || len<0 || pos>@length-1
llen = @left.length
return @left.slice(pos,len) if pos+len<=llen || ! @right
return @right.slice(pos-llen, len) if pos>=llen
Rope.new(@left.slice(pos,llen-pos),@right.slice(0,len+pos-llen))
end

This code still satisfies your test, and I believe it will beat performance
of yours (BTW, lines 2,3 show problems in your code)

Thank you,
-- EK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,732
Latest member
ArronPalin

Latest Threads

Top