[QUIZ] FasterGenerator (#66)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Ruby includes a useful Generator library for switching internal iterators to
external iterators. It is used like this:

require 'generator'

# Generator from an Enumerable object
g = Generator.new(['A', 'B', 'C', 'Z'])

while g.next?
puts g.next
end

I've heard complaints in the past that this library is slow. One reason is that
it was implemented with continuations, which have performance issues in the
current version of Ruby. "Was" is the keyword there though, because I've just
learned that Generator was recently re-implemented. I learned some good tricks
reading the new version, so let's try fixing Generator ourselves. (No peeking
at the new version!)

This week's Ruby Quiz is to write FasterGenerator, your own re-implementation of
Generator with an eye towards working faster. (This is a small library. My
version is 54 lines.) It is possible to go even faster than the new
implementation, with certain trade-offs:

### Construction ###

Rehearsal -----------------------------------------------------------
Current Generator 0.340000 0.480000 0.820000 ( 0.828985)
Old callcc Generator 0.590000 0.840000 1.430000 ( 1.439255)
James's FasterGenerator 0.210000 1.040000 1.250000 ( 1.252359)
-------------------------------------------------- total: 3.500000sec

user system total real
Current Generator 0.750000 0.880000 1.630000 ( 1.630639)
Old callcc Generator 0.190000 1.170000 1.360000 ( 1.375868)
James's FasterGenerator 0.210000 1.230000 1.440000 ( 1.433152)

### next() ###

Rehearsal -----------------------------------------------------------
Current Generator 16.280000 0.100000 16.380000 ( 16.434828)
Old callcc Generator 9.260000 33.490000 42.750000 ( 42.997528)
James's FasterGenerator 0.030000 0.000000 0.030000 ( 0.038645)
------------------------------------------------- total: 59.160000sec

user system total real
Current Generator 15.940000 0.140000 16.080000 ( 16.425068)
Old callcc Generator 6.390000 30.160000 36.550000 ( 36.676838)
James's FasterGenerator 0.030000 0.000000 0.030000 ( 0.036880)

It you want to see the class documentation, you can find it here:

http://www.ruby-doc.org/stdlib/libdoc/generator/rdoc/classes/Generator.html

If you want to make sure your implementation is correct, you can use these tests
straight out of the current implementation:

require 'test/unit'

class TC_Generator < Test::Unit::TestCase
def test_block1
g = Generator.new { |g|
# no yield's
}

assert_equal(0, g.pos)
assert_raises(EOFError) { g.current }
end

def test_block2
g = Generator.new { |g|
for i in 'A'..'C'
g.yield i
end

g.yield 'Z'
}

assert_equal(0, g.pos)
assert_equal('A', g.current)

assert_equal(true, g.next?)
assert_equal(0, g.pos)
assert_equal('A', g.current)
assert_equal(0, g.pos)
assert_equal('A', g.next)

assert_equal(1, g.pos)
assert_equal(true, g.next?)
assert_equal(1, g.pos)
assert_equal('B', g.current)
assert_equal(1, g.pos)
assert_equal('B', g.next)

assert_equal(g, g.rewind)

assert_equal(0, g.pos)
assert_equal('A', g.current)

assert_equal(true, g.next?)
assert_equal(0, g.pos)
assert_equal('A', g.current)
assert_equal(0, g.pos)
assert_equal('A', g.next)

assert_equal(1, g.pos)
assert_equal(true, g.next?)
assert_equal(1, g.pos)
assert_equal('B', g.current)
assert_equal(1, g.pos)
assert_equal('B', g.next)

assert_equal(2, g.pos)
assert_equal(true, g.next?)
assert_equal(2, g.pos)
assert_equal('C', g.current)
assert_equal(2, g.pos)
assert_equal('C', g.next)

assert_equal(3, g.pos)
assert_equal(true, g.next?)
assert_equal(3, g.pos)
assert_equal('Z', g.current)
assert_equal(3, g.pos)
assert_equal('Z', g.next)

assert_equal(4, g.pos)
assert_equal(false, g.next?)
assert_raises(EOFError) { g.next }
end

def test_each
a = [5, 6, 7, 8, 9]

g = Generator.new(a)

i = 0

g.each { |x|
assert_equal(a, x)

i += 1

break if i == 3
}

assert_equal(3, i)

i = 0

g.each { |x|
assert_equal(a, x)

i += 1
}

assert_equal(5, i)
end
end

The Generator library also includes a SyncEnumerator class, but it is written to
use Generator and will work fine with a new version, as long as it is
API-compatible.
 
P

Pit Capitain

Ruby said:
Ruby includes a useful Generator library for switching internal iterators to
external iterators. It is used like this:
...
If you want to make sure your implementation is correct, you can use these tests
straight out of the current implementation:
...

James, thanks for the new quiz. It would be interesting to add a
testcase with an endless internal iterator. I don't know the new
implementation of Generator, so I can't say whether an endless iterator
should be supported.

Regards,
Pit
 
J

James Edward Gray II

It would be interesting to add a testcase with an endless internal
iterator. I don't know the new implementation of Generator, so I
can't say whether an endless iterator should be supported.

That's an interesting point we will certainly talk more about as we
start to see some solutions. Generator *can* handle endless
iterators. Quiz solvers can decide whether or not to limit
themselves with that additional requirement...

James Edward Gray II
 
M

Matthew Moss

Not being familiar with all the various Ruby packages and libs, I
first want to thank ya for posting a good set of test cases that I can
review, but was wondering if you also might post the code used to do
the timing?
 
J

James Edward Gray II

Not being familiar with all the various Ruby packages and libs, I
first want to thank ya for posting a good set of test cases that I can
review, but was wondering if you also might post the code used to do
the timing?

I will, yes, on Sunday. :)

I pulled down copies on the Generator library, before and after the
change. I then modified the class names so they could all peacefully
coexist, loaded them, and ran a trivial benchemark:

#!/usr/local/bin/ruby -w

require "benchmark"

require "current_generator"
require "callcc_generator"
require "faster_generator"

tests = 1000
enum = (1..10000).to_a

puts
puts "### Construction ###"
puts

Benchmark.bmbm do |x|
x.report("Current Generator") do
tests.times { CurrentGenerator.new(enum) }
end
x.report("Old callcc Generator") do
tests.times { CallCCGenerator.new(enum) }
end
x.report("James's FasterGenerator") do
tests.times { FasterGenerator.new(enum) }
end
end

puts
puts "### next() ###"
puts

Benchmark.bmbm do |x|
x.report("Current Generator") do
generator = CurrentGenerator.new(enum)
tests.times { generator.next until generator.end? }
end
x.report("Old callcc Generator") do
generator = CallCCGenerator.new(enum)
tests.times { generator.next until generator.end? }
end
x.report("James's FasterGenerator") do
generator = FasterGenerator.new(enum)
tests.times { generator.next until generator.end? }
end
end

__END__

I'll post the modified libraries to go with that after the spoiler
period. Obviously, you could go get them yourself, before then. I
strongly recommend writing a solution first though.

James Edward Gray II
 
J

James Edward Gray II

I pulled down copies on the Generator library, before and after the
change. I then modified the class names so they could all
peacefully coexist, loaded them, and ran a trivial benchemark:

In answering Matthew's question, I found a small mistake in the
benchmarks posted with the quiz (the timings for constructing my
FasterGenerator were wrong). Here are the corrected numbers:

### Construction ###

Rehearsal -----------------------------------------------------------
Current Generator 0.320000 0.320000 0.640000 ( 0.642583)
Old callcc Generator 0.610000 0.870000 1.480000 ( 1.480780)
James's FasterGenerator 0.000000 0.000000 0.000000 ( 0.003751)
-------------------------------------------------- total: 2.120000sec

user system total real
Current Generator 0.740000 0.720000 1.460000 ( 1.464659)
Old callcc Generator 0.220000 1.500000 1.720000 ( 1.714859)
James's FasterGenerator 0.010000 0.000000 0.010000 ( 0.003258)

### next() ###

Rehearsal -----------------------------------------------------------
Current Generator 16.610000 0.130000 16.740000 ( 17.032537)
Old callcc Generator 8.070000 32.740000 40.810000 ( 41.072265)
James's FasterGenerator 0.030000 0.000000 0.030000 ( 0.037034)
------------------------------------------------- total: 57.580000sec

user system total real
Current Generator 16.630000 0.120000 16.750000 ( 16.878429)
Old callcc Generator 7.440000 32.720000 40.160000 ( 40.336902)
James's FasterGenerator 0.040000 0.000000 0.040000 ( 0.035432)

James Edward Gray II
 
M

Matthew Moss

Not being familiar with all the various Ruby packages and libs, I
I will, yes, on Sunday. :)

But you just posted it. :)

Sorry, guess I wasn't clear. I wasn't looking to see your
implementation of Generator, I was to see your benchmarking code.
 
J

James Edward Gray II

I wasn't looking to see your implementation of Generator, I was to
see your benchmarking code.

It's not my implementation I am trying to hide, it's the current
Generator code the benchmark uses as a baseline. I realize it's
readily available, but I think solving this one is more fun if you
don't peek. :)

James Edward Gray II
 
M

Matthew Moss

It's not my implementation I am trying to hide, it's the current
Generator code the benchmark uses as a baseline. I realize it's
readily available, but I think solving this one is more fun if you
don't peek. :)

Okay.... ummm.... sure.... I won't peek. No, you can't make me.

(In case I still wasn't clear, I wasn't asking to peek at ANY
implementation of Generator, but rather wanted exactly what you
posted, which has nothing to do with Generator but everything to do
with benchmark, i.e., the thing I wanted to know about.)
 
D

Dave Lee

Here are the corrected numbers:

### Construction ###

Rehearsal -----------------------------------------------------------
Current Generator 0.320000 0.320000 0.640000 ( 0.642583)
Old callcc Generator 0.610000 0.870000 1.480000 ( 1.480780)
James's FasterGenerator 0.000000 0.000000 0.000000 ( 0.003751)
-------------------------------------------------- total: 2.120000sec

user system total real
Current Generator 0.740000 0.720000 1.460000 ( 1.464659)
Old callcc Generator 0.220000 1.500000 1.720000 ( 1.714859)
James's FasterGenerator 0.010000 0.000000 0.010000 ( 0.003258)

### next() ###

Rehearsal -----------------------------------------------------------
Current Generator 16.610000 0.130000 16.740000 ( 17.032537)
Old callcc Generator 8.070000 32.740000 40.810000 ( 41.072265)
James's FasterGenerator 0.030000 0.000000 0.030000 ( 0.037034)
------------------------------------------------- total: 57.580000sec

user system total real
Current Generator 16.630000 0.120000 16.750000 ( 16.878429)
Old callcc Generator 7.440000 32.720000 40.160000 ( 40.336902)
James's FasterGenerator 0.040000 0.000000 0.040000 ( 0.035432)

Off topic a bit, but what os/hardware are you using to get these numbers?

Dave
 
J

James Edward Gray II

Off topic a bit, but what os/hardware are you using to get these
numbers?

A dual processor G5 at 2.0 Ghz (each processor), running Mac OS X
(10.4.4). The box has 2 Gigs of RAM.

James Edward Gray II
 
J

Jacob Fugal

This week's Ruby Quiz is to write FasterGenerator, your own
re-implementation of Generator with an eye towards working faster.

Here's the benchmark results from my own implementation:

galadriel:~/ruby/qotw/66$ ruby benchmark.rb

### Construction ###

Rehearsal -------------------------------------------------------------
Old callcc Generator 7.380000 1.000000 8.380000 ( 8.726668)
lukfugl's FasterGenerator 0.020000 0.000000 0.020000 ( 0.070048)
---------------------------------------------------- total: 8.400000sec

user system total real
Old callcc Generator 8.580000 0.960000 9.540000 ( 9.765350)
lukfugl's FasterGenerator 0.020000 0.000000 0.020000 ( 0.020035)

### next() ###

Rehearsal -------------------------------------------------------------
Old callcc Generator 10.750000 17.010000 27.760000 ( 28.587567)
lukfugl's FasterGenerator 0.680000 0.000000 0.680000 ( 0.744570)
--------------------------------------------------- total: 28.440000sec

user system total real
Old callcc Generator 11.490000 17.390000 28.880000 ( 29.853396)
lukfugl's FasterGenerator 0.650000 0.000000 0.650000 (=20
0.694442)

My machine is obviously painfully slower than James', and it looks
like my implementation isn't quite as quick as his, but it was fun
figuring out a few tricks. Now I'm gonna go look at the new code and
see if my implementation beats it too!

(I'll post my solution after the spoiler period expires tomorrow morning.)

Jacob Fugal
 
C

Christoffer Lernö

Newbie jumping in here.

I will, yes, on Sunday. :)

I pulled down copies on the Generator library, before and after the
change. I then modified the class names so they could all
peacefully coexist, loaded them, and ran a trivial benchemark:

...snip..

Benchmark.bmbm do |x|
x.report("Current Generator") do
generator = CurrentGenerator.new(enum)
tests.times { generator.next until generator.end? }
end

I have small problem with this. Won't this mean the loop will only
run 1 time?

The first time it correctly loops, but the next time "generator.end?"
will be true, and the following 999 runs will consequently does not
execute generator.next at all, right?

If we change the row to

test.times do
generator.rewind
generator.next until generator.end?
end

Isn't that giving us the desired test-behaviour?


/Christoffer
 
C

Christoffer Lernö

I've heard complaints in the past that this library is slow. One
reason is that
it was implemented with continuations, which have performance
issues in the
current version of Ruby. "Was" is the keyword there though,
because I've just
learned that Generator was recently re-implemented. I learned some
good tricks
reading the new version, so let's try fixing Generator ourselves.
(No peeking
at the new version!)

This week's Ruby Quiz is to write FasterGenerator, your own re-
implementation of
Generator with an eye towards working faster. (This is a small
library. My
version is 54 lines.) It is possible to go even faster than the new
implementation, with certain trade-offs:

With the same trade-offs as James' version(?) (i.e. no infinite block
iterators) and with the hope that I don't have any bugs in the code,
I get something like this on a G4 1.67Mhz (after changing the tests
somewhat)

### next() ###

Rehearsal -----------------------------------------------------
Current Generator 20.360000 14.330000 34.690000 ( 51.788014)
My Generator 0.080000 0.010000 0.090000 ( 0.116704)
------------------------------------------- total: 34.780000sec

user system total real
Current Generator 23.200000 14.610000 37.810000 ( 53.145604)
My Generator 0.080000 0.010000 0.090000 ( 0.129965)

(and this code)

8<----

tests = 10
enum = (1..1000).to_a

puts
puts "### next() ###"
puts

Benchmark.bmbm do |x|
x.report("Current Generator") do
generator = Generator.new(enum)
tests.times {
generator.rewind
generator.next until generator.end?
}
end
x.report("My Generator") do
generator = MyGenerator.new(enum)
tests.times {
generator.rewind
generator.next until generator.end?
}
end
end

8<-----

I originally wanted to use the original values for tests and enum,
but I got bored waiting.


/Christoffer
 
H

H.Yamamoto

Hello. Please add this to quiz's rule.

Generator should suspend current calculation, and
resume it when next() is called. (I uses @queue as
array, but this is just for insurance. normally
@queue should not contain multiple values)

//////////////////////////////////////////

require "test/unit"
require "generator"

class TestGenerator < Test::Unit::TestCase

class C
def value=(x)
@value = x
end
def each
loop do
yield @value
end
end
end

def test_realtime
c = C.new
g = Generator.new(c)
3.times do |i|
c.value = i
assert_equal(i, g.next())
end
end

end

//////////////////////////////////////////

And python supports generator natively,
this behaves like HEAD's Generator class.

def generator():
global value
while True:
yield value

g = generator()
for i in xrange(3):
value = i
print g.next()

////////////////////////////////////////////

And this one is Java version.


abstract class CoRoutine
{
private final Thread _thread;

private final Object _lock = new Object();

private java.util.LinkedList _list = new java.util.LinkedList();

private boolean _terminated = false;

public CoRoutine()
{
final Object first_lock = new Object();

_thread = new Thread()
{
public void run()
{
synchronized(_lock)
{
synchronized (first_lock)
{
first_lock.notify();
}

try
{
_lock.wait();
}
catch (InterruptedException e)
{
throw new RuntimeException(e); // ???
}

try
{
CoRoutine.this.run();
}
finally
{
_terminated = true;

_lock.notify();
}
}
}
};

_thread.setDaemon(true);

synchronized(first_lock)
{
_thread.start();

try
{
first_lock.wait();
}
catch (InterruptedException e)
{
throw new RuntimeException(e); // ???
}
}
}

protected abstract void run();

protected final void yield(Object value)
{
synchronized(_lock)
{
_list.add(value);

_lock.notify();

try
{
_lock.wait();
}
catch (InterruptedException e)
{
throw new RuntimeException(e); // ???
}
}
}

private void ready() // must be called in synchronized(_lock)
{
if (!_terminated && _list.isEmpty())
{
_lock.notify();

try
{
_lock.wait();
}
catch (InterruptedException e)
{
throw new RuntimeException(e); // ???
}
}
}

public boolean hasNext()
{
synchronized(_lock)
{
ready();

return !_list.isEmpty();
}
}

public Object next()
{
synchronized(_lock)
{
ready();

if (_list.isEmpty())
{
throw new RuntimeException("EOF");
}

return _list.removeFirst();
}
}
}

//////////////////////////////////////////////////
// Main

class Test extends CoRoutine
{
public int value;

protected void run()
{
while (true)
{
yield(value);
}
}
}

class Main
{
public static void main(String[] args)
{
Test t = new Test();

for (int i = 0; i < 3; ++i)
{
t.value = i;

System.out.println(t.next());
}
}
}
 
J

Jacob Fugal

x =3D 1
g =3D Generator.new do |g|
5.times {|i| g.yield [x,i].max}
end

until g.end?
x =3D 10 if g.current > 2
puts g.next
end

This outputs 1, 1, 2, 3, 10.

The next question is, why is the API for this class so crappy? I would
have expected the output to be 1, 1, 2, 10, 10. But creating the
generator automatically advances to the first yield, and "next" advances
to the next one while returning the previous one. This is just wrong.

Actually, there's no way (I can think of) to get the output you
expect, and it has nothing to do with an initial advance or anything.
My implementation doesn't advance to the yields until required, yet
has the same output. This is because of these two lines:

x =3D 10 if g.current > 2
puts g.next

g.current returns the current value. g.current is not greater than two
until g.current =3D=3D 3. g.next then returns the same as g.current, while
also advancing the internal counter. The behavior your describing
would require changing the value of g.current by assigning to x. Can't
be done.

Jacob Fugal
 
J

James Edward Gray II

I have small problem with this. Won't this mean the loop will only =20
run 1 time?

Yes, you are right. That was silly of me.

I like your fix. Adding rewind() fixes things up.

Thank you.

James Edward Gray II=
 
C

Christoffer Lernö

Mine does, of course, pull all the items in at once and has all the
problems discussed with that approach. Attached is my
implementation and the altered versions I have been using in
benchmarks.

James Edward Gray II

Mine looks like a clone of James':

class MyGenerator
attr_reader :index
def initialize(enum = nil)
if enum then
@array = enum.to_a
else
@array = Array.new
yield self
end
@index = 0
end
def current
raise EOFError unless next?
@array[@index]
end
def next
value = current
@index += 1
return value
end
def next?
return @index < @array.length
end
def rewind
@index = 0
self
end
def each(&block)
@array.each(&block)
end
def yield(value)
@array << value
end
def pos
return @index
end
def end?
return !next?
end
end


/Christoffer
 
L

Luke Blanshard

Jacob said:
x = 1
g = Generator.new do |g|
5.times {|i| g.yield [x,i].max}
end

until g.end?
x = 10 if g.current > 2
puts g.next
end

This outputs 1, 1, 2, 3, 10.

The next question is, why is the API for this class so crappy? I would
have expected the output to be 1, 1, 2, 10, 10. But creating the
generator automatically advances to the first yield, and "next" advances
to the next one while returning the previous one. This is just wrong.

Actually, there's no way (I can think of) to get the output you
expect, and it has nothing to do with an initial advance or anything.
My implementation doesn't advance to the yields until required, yet
has the same output. This is because of these two lines:

x = 10 if g.current > 2
puts g.next

g.current returns the current value. g.current is not greater than two
until g.current == 3. g.next then returns the same as g.current, while
also advancing the internal counter. The behavior your describing
would require changing the value of g.current by assigning to x. Can't
be done.

You are correct. My bad.

My point is that the API is pointlessly difficult. I think, actually,
that the C# iterator API is the cleanest I've seen: there is one
operation that attempts to advance to the next position, returning false
if it has reached the end, and another that returns the current value,
throwing an exception if it's still before the first advance.

And my other point is that, while you're going down the road of
coroutines, you might as well go the whole way.

Luke Blanshard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top