Rope Data Structures

J

John Miller

Greetings All,

I have been plugging away a a parser for ruby Endo's DNA from this years
International Functional Programming Completion.
(http://www.icfpcontest.org/) Most teems who have put up there reviews
have talked about having to switch programming languages to something
faster. Most did this before trying to change algorithms. Many of the
successful teems ended up using the 'rope' from the C++ Standard
Template Library.

please see:

http://en.wikipedia.org/wiki/Rope_(computer_science)
http://www.sgi.com/tech/stl/ropeimpl.html
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

I tried writing a rope library in ruby and got an order of magnitude
better performance when compared to a String implementation (6.5 to 48.5
iterations per second -- to successfully solve the problem I estimated I
would need ~2000 iterations per second)

I learned several things from this attempt:
* Avoid creating edge cases they are bad for business and my poor rope
implementation had way too many of them.
* Working with long strings is a slow process (endo's dna is a string
~7.5M chars long)
* For long running programs calling GC.start at points where the number
of living object is at a minimum will save time and Hard Drive actuator
motors.
* The folks who wrote the ruby-prof gem deserve an award for making an
excelent tool.

One thing that got me thinking however was that even my poorly
implemented pure Ruby version of ropes gave me an order of magnitude
extra speed. It made me wonder: How hard would it be to implement ropes
as a compiled binary library for ruby, and how much would using such a
library (even a pure ruby version) speed up rails, which does a lot of
string concatenation?
 
J

James Edward Gray II

Most teems who have put up there reviews
have talked about having to switch programming languages to something
faster. Most did this before trying to change algorithms.

We used Ruby and didn't switch, but the code was slow.
Many of the
successful teems ended up using the 'rope' from the C++ Standard
Template Library.

please see:

http://en.wikipedia.org/wiki/Rope_(computer_science)
http://www.sgi.com/tech/stl/ropeimpl.html
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/
issue12/spe986.pdf

I tried writing a rope library in ruby and got an order of magnitude
better performance when compared to a String implementation (6.5 to
48.5
iterations per second -- to successfully solve the problem I
estimated I
would need ~2000 iterations per second)

I debated for some time about making this task, building a rope for
Ruby, a Ruby Quiz. Can I ask how long it took you? Also, just as a
rough metric, how many lines of code did you write?

James Edward Gray II
 
J

John Miller

James said:
We used Ruby and didn't switch, but the code was slow.


I debated for some time about making this task, building a rope for
Ruby, a Ruby Quiz. Can I ask how long it took you? Also, just as a
rough metric, how many lines of code did you write?

James Edward Gray II

It took about a week of causal programming to get a String version
running with no RNA output. To write writing a Rope class was another
three days of sloppy programming. Replacing String with Rope took about
2 hours. I then spent more time then I can recall fixing edge cases in
the Rope class. There are still a few loose ends, and the code has
become so full of edge case checks that it really isn't as fast as it
should be.

Looking only at the first 10,000 iterations, the fast majority of
instructions were copying huge swaths of the original dna. I decided
that a node should be able to represent a 'slice' of another node
without copying when these slices start having to span concatenations
life gets messy. When I added a pop command that could make the left
branch unused things got even more messy. When I wanted a << operator
that would add characters directly to the end of 'short' node things
became atrocious. And when finally I required the Environment DNA from
matchreplace(where my code spent 94.2% of it's time) not create new
copies of the Ropes it was holding when it was perpended to the dna the
code pretty much buckled under its own weight. At the moment there is a
bug in the normalization code that is introducing errors. Without being
able to renormalizes the data structure on th rope gets too deep and
everything grinds to a halt.

Lesson: Keep you data structures simple and elegant.

My rope implementation was initially ~120 (source file) lines. Now that
it is doing all this stuff it is 366. 44 of that is for a kmp_search
algorithm.

I would love to see Ropes as a Ruby quiz.

John Miller
 
B

benjohn

Just a musing on this, but if we want a fast and scalable rope structure
in Ruby (and speed and scalability are the reasons for using a rope, I
believe?), wouldn't it be better (though not quite so much fun) to
simply wrap a c / c++ rope library, rather than implement it from
scratch in Ruby?

Perhaps it'd even be possible to get Ruby's string to switch internal
implementation using some huristics about its size.

Cheers,
Benjohn
 
P

Peña, Botp

T24gQmVoYWxmIE9mIEpvaG4gTWlsbGVyOg0KIyBJIHRyaWVkIHdyaXRpbmcgYSByb3BlIGxpYnJh
cnkgaW4gcnVieSBhbmQgZ290IGFuIG9yZGVyIG9mIG1hZ25pdHVkZQ0KIyBiZXR0ZXIgcGVyZm9y
bWFuY2Ugd2hlbiBjb21wYXJlZCB0byBhIFN0cmluZyBpbXBsZW1lbnRhdGlvbiAoNi41IHRvIDQ4
LjUNCg0Kd293LCB0aGF0J3MgcXVpdGUgYSBoZWxwZnVsIGJvb3N0LiBzcGVjaWFsbHkgc2luY2Ug
YSBsYXJnZSBwZXJjZW50YWdlIG9mIHRoZSB3ZWIgYW5kIGRhdGFiYXNlIHRpbWUgaXMgc3BlbnQg
b24gc3RyaW5nIG1hbmlwdWxhdGlvbi4gcG9zdCB5b3VyIHdvcmsgYXMgYSBnZW0sIHBscy4NCg0K
a2luZCByZWdhcmRzIC1ib3RwDQo=
 
J

James Edward Gray II

It took about a week of causal programming to get a String version
running with no RNA output.

Thanks for all of the information.
I would love to see Ropes as a Ruby quiz.

Contact me off-list if you are interested in helping me put it together.

James Edward Gray II
 
J

John Miller

Just a musing on this, but if we want a fast and scalable rope structure
in Ruby (and speed and scalability are the reasons for using a rope, I
believe?), wouldn't it be better (though not quite so much fun) to
simply wrap a c / c++ rope library, rather than implement it from
scratch in Ruby?

Perhaps it'd even be possible to get Ruby's string to switch internal
implementation using some huristics about its size.

Cheers,
Benjohn

I agree that for something like this a binary distribution is best, but
they still cause problems in cross platform environments. (eg. JRuby) I
think the best solutions are 1.) Have Multiple Gem distros (java,
mswin32, source) and include a pure ruby version. Or 2.) integrate this
into the interpreter core as another built in class.


On Behalf Of John Miller:
# I tried writing a rope library in ruby and got an order of magnitude
# better performance when compared to a String implementation (6.5 to
48.5

wow, that's quite a helpful boost. specially since a large percentage of
the web and database time is spent on string manipulation. post your
work as a gem, pls.

kind regards -botp

The ICFP task was written in such a way as to make maximum use of Ropes.
I think that processing eRB template would also see pretty good gains.
However, one must be careful to use Ropes only when string concatenation
and slicing (Ropes do these in O(1) ) are much more common then
accessing the string content (O(log(n)). An application where a string
is build once and the accessed repeatedly will have poorer performance
if it is converted to a rope. On the other hand a string that is
repeatedly manipulated before finally being printed a single time will
see performance gains proportional to the average length of the
String/Rope.

Databases do spend lots of time with strings, but there are many more
accesses to strings then changes. Beyond that most of the strings that
databases deal with are "short."

I will see what I can do about getting out some sample code.

John Miller
 
D

Daniel Berger

John said:
Greetings All,

I have been plugging away a a parser for ruby Endo's DNA from this years
International Functional Programming Completion.
(http://www.icfpcontest.org/) Most teems who have put up there reviews
have talked about having to switch programming languages to something
faster. Most did this before trying to change algorithms. Many of the
successful teems ended up using the 'rope' from the C++ Standard
Template Library.

please see:

http://en.wikipedia.org/wiki/Rope_(computer_science)
http://www.sgi.com/tech/stl/ropeimpl.html
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf

I tried writing a rope library in ruby and got an order of magnitude
better performance when compared to a String implementation (6.5 to 48.5
iterations per second -- to successfully solve the problem I estimated I
would need ~2000 iterations per second)

Did you ever publish this?

Regards,

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top