Iterator objects and lazy evaluation

Y

Yuh-Ruey Chen

Two questions:

1) Are there equivalents for iteration/enumeration functions like map
that return iterator/enumeration objects (in a Python sense)?

An example:

def read_files(files)
files.each {|file|
# blah
}
end
read_file(['file1', 'file2', 'file3'].map_iter {|x| open(x)})
# map_iter would return an iterator object immediately instead of
opening every file and storing them into an array

I know that I could do this instead:

['file1', 'file2', 'file3'].each {|x|
open(x) {
# blah
}
}

but sometimes it's inconvenient to do that if there's already a
function that accepts an Enumerable such as read_files above.

2) Is there some lazy evaluation library that can recalculate lazy
expression when values change? Something with the following semantics
(or something like it):

x = lazy {a + b}
a = 1
b = 2
p x # 3 (calculates a + b)
p x # 3 (memoized value)
a = 3
p x # 5 (recalculates a + b)
a = [1,2]
b = [3,4]
p x # [1,2,3,4] (recalculates a + b)
p x # [1,2,3,4] (memoized value)
a[1] = '.'
p x # [1,'.',3,4] (recalculates a + b)

I know there's a library called lazy.rb, but it's not exactly what I'm
looking for as it doesn't implement this functionality.

Thanks.
 
M

Mike Gold

Yuh-Ruey Chen said:
I know that I could do this instead:

['file1', 'file2', 'file3'].each {|x|
open(x) {
# blah
}
}

but sometimes it's inconvenient to do that if there's already a
function that accepts an Enumerable such as read_files above.

some_file_operation = lambda { |file|
puts "doing stuff with #{file}"
}

%w(file1 file2 file3).each(&some_file_operation)
# =>
# doing stuff with file1
# doing stuff with file2
# doing stuff with file3

def some_file_operation2(file)
puts "doing stuff with #{file}"
end

%w(file1 file2 file3).each(&method:)some_file_operation2))
# =>
# doing stuff with file1
# doing stuff with file2
# doing stuff with file3

In Ruby, a.x calls the method named 'x', whereas Python requires
parens a.x(). As a consequence, the syntax for getting the method in
Ruby is a.method:)x).

2) Is there some lazy evaluation library that can recalculate lazy
expression when values change? Something with the following semantics
(or something like it):

x = lazy {a + b}
a = 1
b = 2
p x # 3 (calculates a + b)
p x # 3 (memoized value)
a = 3
p x # 5 (recalculates a + b)
a = [1,2]
b = [3,4]
p x # [1,2,3,4] (recalculates a + b)
p x # [1,2,3,4] (memoized value)
a[1] = '.'
p x # [1,'.',3,4] (recalculates a + b)

I know there's a library called lazy.rb, but it's not exactly what I'm
looking for as it doesn't implement this functionality.

If you google for ruby memoization, you'll find several options
available. However none can use the syntax in your example. Ruby
determines at compile time that "p x" is a local variable lookup.
You'll have to use "p lazy.x" or some such.
 
R

Robert Klemme

Two questions:

1) Are there equivalents for iteration/enumeration functions like map
that return iterator/enumeration objects (in a Python sense)?

An example:

def read_files(files)
files.each {|file|
# blah
}
end
read_file(['file1', 'file2', 'file3'].map_iter {|x| open(x)})
# map_iter would return an iterator object immediately instead of
opening every file and storing them into an array

There is one issue with this design: the IO object is not closed properly.
I know that I could do this instead:

['file1', 'file2', 'file3'].each {|x|
open(x) {
# blah
}
}

but sometimes it's inconvenient to do that if there's already a
function that accepts an Enumerable such as read_files above.

So read_files expects an enumeration of opened IO objects, I guess. And
you want to make sure that files are only opened on demand, correct?
You could do something like this

module Enumerable
FileIter = Struct.new :enum do
include Enumerable
def each(&b)
enum.each do |file_name|
File.open(file_name, &b)
end
self
end
end

def file_iter
FileIter.new self
end
end

Robert@babelfish2 ~
$ echo 1 > a

Robert@babelfish2 ~
$ echo 2 > b

irb(main):016:0> def read_files(files)
irb(main):017:1> files.each {|io| p io.read}
irb(main):018:1> end
=> nil
irb(main):019:0> read_files %w{a b}.file_iter
"1\n"
"2\n"
=> #<struct Enumerable::FileIter enum=["a", "b"]>
irb(main):020:0>

Note that there is ARGF which basically does a similar thing: it opens
and closes all files named in ARGV and presents them as a single IO object.

Your approach feels a bit off the usual Ruby way and I am suspecting
that you are trying to force foreign patterns into the language.

I'd also say there is a design issue with your version of read_files:
basically it iterates over a collection of file handles and does
something to each one of them. Given the elegance and shortness of
Ruby's iteration idiom the function read_files would rather be called
read_file and process a single IO only.
2) Is there some lazy evaluation library that can recalculate lazy
expression when values change? Something with the following semantics
(or something like it):

x = lazy {a + b}
a = 1
b = 2
p x # 3 (calculates a + b)

This cannot work IMHO since local variables a and b are defined *after*
the block.
p x # 3 (memoized value)
a = 3
p x # 5 (recalculates a + b)
a = [1,2]
b = [3,4]
p x # [1,2,3,4] (recalculates a + b)
p x # [1,2,3,4] (memoized value)
a[1] = '.'
p x # [1,'.',3,4] (recalculates a + b)

I know there's a library called lazy.rb, but it's not exactly what I'm
looking for as it doesn't implement this functionality.

IMHI you better explicitly provide arguments to whatever you use to
lazily calculate values. One way is memoize and another option is to
use a Hash for this if there is just one argument:

irb(main):023:0> square = Hash.new {|h,k| puts "called"; h[k] = k * k}
=> {}
irb(main):024:0> square[1000000]
called
=> 1000000000000
irb(main):025:0> square[1000000]
=> 1000000000000
irb(main):026:0> square[1000000]
=> 1000000000000
irb(main):027:0>

If you have more arguments, you can do

def lazy(&b)
cache = {}
lambda {|*a| cache[a] ||= b[*a]}
end

irb(main):006:0> plus2 = lazy {|a,b| puts "called #{a} + #{b}"; a+b}
=> #<Proc:0x7ff8c2d8@(irb):3>
irb(main):007:0> plus2[1,2]
called 1 + 2
=> 3
irb(main):008:0> plus2[1,2]
=> 3
irb(main):009:0> plus2[2,1]
called 2 + 1
=> 3
irb(main):010:0> plus2[2,1]
=> 3
irb(main):011:0>

Kind regards

robert
 
B

Brian Candler

Yuh-Ruey Chen said:
1) Are there equivalents for iteration/enumeration functions like map
that return iterator/enumeration objects (in a Python sense)?

You can generate Enumerator objects, and in 1.9 (and I think 1.8.7), an
Enumberable method which hasn't been given a block returns an
Enumerator.

irb(main):001:0> a = (1..10).each
=> #<Enumerator:0x83b687c>
irb(main):002:0> a.next
=> 1
irb(main):003:0> a.next
=> 2
irb(main):004:0> a.next
=> 3

However I don't think you can use these for chaining. It might be nice
if you could do something like

(1..10).to_enum.map2 { |x| x * 2 }.map2 { |x| x + 1 }.map2 { |x| puts
x }

which wouldn't create any intermediate array. I guess you'd need each
element to call 'next' on the previous one, and you'd need something at
the end to trigger the whole process off.
 
D

David A. Black

Hi --

You can generate Enumerator objects, and in 1.9 (and I think 1.8.7), an
Enumberable method which hasn't been given a block returns an
Enumerator.

irb(main):001:0> a = (1..10).each
=> #<Enumerator:0x83b687c>
irb(main):002:0> a.next
=> 1
irb(main):003:0> a.next
=> 2
irb(main):004:0> a.next
=> 3

However I don't think you can use these for chaining. It might be nice
if you could do something like

(1..10).to_enum.map2 { |x| x * 2 }.map2 { |x| x + 1 }.map2 { |x| puts
x }

which wouldn't create any intermediate array. I guess you'd need each
element to call 'next' on the previous one, and you'd need something at
the end to trigger the whole process off.

It used to be that you could associate a block with an enumerator
lazily:
e = [1,2,3].enum_for:)map,&lambda {|x| x * 10 })
=> # said:
=> 10

and some chaining was possible. But that's gone now. Enumerators are
still able to remember regular arguments, though, so you can do:
e = [1,2,3,4,5].each_cons(2)
=> # said:
e.select {|a,b| b > 3 }
=> [[3, 4], [4, 5]]

(The contrast here is between a relatively old 1.9 version and 1.9.1
preview 1.)


David

--
Rails training from David A. Black and Ruby Power and Light:
Intro to Ruby on Rails January 12-15 Fort Lauderdale, FL
Advancing with Rails January 19-22 Fort Lauderdale, FL *
* Co-taught with Patrick Ewing!
See http://www.rubypal.com for details and updates!
 
B

Brian Candler

David said:
It used to be that you could associate a block with an enumerator
lazily:
e = [1,2,3].enum_for:)map,&lambda {|x| x * 10 })
=> # said:
=> 10

and some chaining was possible. But that's gone now.

I see. I was wondering about turning the whole thing backwards: pull
instead of push. I did some doodling, and after refactoring here's what
it boiled down to (ruby1.9.1-preview1):

class Enumerator
def nmap(&blk)
Enumerator.new do |y|
begin
y << blk[self.next] while true
rescue StopIteration
end
end
end

def nselect(&blk)
Enumerator.new do |y|
begin
while true
val = self.next
y << val if blk[val]
end
rescue StopIteration
end
end
end
end

# Two variations of the same theme
(1..10).each.nselect { |i| i%3 == 2 }.nmap { |i| i+100 }.each { |i| puts
i }
puts (1..10).each.nselect { |i| i%3 == 2 }.nmap { |i| i+100 }.to_a

Now, the intention was as follows: the final method in the chain (each
or to_a) repeatedly calls the 'next' method on the item before it, which
calls 'next' on the item before it, and so on.

It seems to work. I guess Enumerators must use Fibers internally to
pause the enumerator block as required?
 
Y

Yuh-Ruey Chen

You can generate Enumerator objects, and in 1.9 (and I think 1.8.7), an
Enumberable method which hasn't been given a block returns an
Enumerator.

irb(main):001:0> a = (1..10).each
=> #<Enumerator:0x83b687c>
irb(main):002:0> a.next
=> 1
irb(main):003:0> a.next
=> 2
irb(main):004:0> a.next
=> 3

However I don't think you can use these for chaining. It might be nice
if you could do something like

(1..10).to_enum.map2 { |x| x * 2 }.map2 { |x| x + 1 }.map2 { |x| puts
x }

which wouldn't create any intermediate array. I guess you'd need each
element to call 'next' on the previous one, and you'd need something at
the end to trigger the whole process off.

Yes, this is exactly what I'm looking for - the ability to chain
iterators like this without creating intermediate arrays, hopefully
using less memory.
 
Y

Yuh-Ruey Chen

Two questions:
1) Are there equivalents for iteration/enumeration functions like map
that return iterator/enumeration objects (in a Python sense)?
An example:
def read_files(files)
files.each {|file|
# blah
}
end
read_file(['file1', 'file2', 'file3'].map_iter {|x| open(x)})
# map_iter would return an iterator object immediately instead of
opening every file and storing them into an array

There is one issue with this design: the IO object is not closed properly.
I know that I could do this instead:
['file1', 'file2', 'file3'].each {|x|
open(x) {
# blah
}
}
but sometimes it's inconvenient to do that if there's already a
function that accepts an Enumerable such as read_files above.

So read_files expects an enumeration of opened IO objects, I guess. And
you want to make sure that files are only opened on demand, correct?

That was really just a random example that I thought up quickly to
explain my question. Others have already posted better examples.
Your approach feels a bit off the usual Ruby way and I am suspecting
that you are trying to force foreign patterns into the language.

Per my previous reply, I'm trying to find a way to chain iterators
without nesting blocks (so they can be passed freely into other
functions expecting enumerables) and without intermediate arrays.
2) Is there some lazy evaluation library that can recalculate lazy
expression when values change? Something with the following semantics
(or something like it):
x = lazy {a + b}
a = 1
b = 2
p x # 3 (calculates a + b)

This cannot work IMHO since local variables a and b are defined *after*
the block.
p x # 3 (memoized value)
a = 3
p x # 5 (recalculates a + b)
a = [1,2]
b = [3,4]
p x # [1,2,3,4] (recalculates a + b)
p x # [1,2,3,4] (memoized value)
a[1] = '.'
p x # [1,'.',3,4] (recalculates a + b)
I know there's a library called lazy.rb, but it's not exactly what I'm
looking for as it doesn't implement this functionality.

IMHI you better explicitly provide arguments to whatever you use to
lazily calculate values. One way is memoize and another option is to
use a Hash for this if there is just one argument:

irb(main):023:0> square = Hash.new {|h,k| puts "called"; h[k] = k * k}
=> {}
irb(main):024:0> square[1000000]
called
=> 1000000000000
irb(main):025:0> square[1000000]
=> 1000000000000
irb(main):026:0> square[1000000]
=> 1000000000000
irb(main):027:0>

If you have more arguments, you can do

def lazy(&b)
cache = {}
lambda {|*a| cache[a] ||= b[*a]}
end

irb(main):006:0> plus2 = lazy {|a,b| puts "called #{a} + #{b}"; a+b}
=> #<Proc:0x7ff8c2d8@(irb):3>
irb(main):007:0> plus2[1,2]
called 1 + 2
=> 3
irb(main):008:0> plus2[1,2]
=> 3
irb(main):009:0> plus2[2,1]
called 2 + 1
=> 3
irb(main):010:0> plus2[2,1]
=> 3
irb(main):011:0>

Kind regards

robert

Hmm, what I'm looking for is not just a simple memoization technique.
Suppose I have a function that computes the union of two arrays
whenever each array is changed. Using a perhaps more possible syntax:

attr_accessor :array_a

def foo
lazy_union = lazy_expr { lazy_depends:)array_a) +
lazy_depends:)array_b) }
@array_a = [1,2]
@array_b = [3,4]
p lazy_union # [1,2,3,4]
@array_a[1] = 5
p lazy_union # [1,5,3,4]
end

This is a case your memoization technique doesn't address. Yet I'm
pretty sure this is a common use case, so I was thinking that there
should be some library out there that provides this functionality.
 
R

Robert Klemme

On Nov 1, 6:36 am, Robert Klemme <[email protected]> wrote:
Per my previous reply, I'm trying to find a way to chain iterators
without nesting blocks (so they can be passed freely into other
functions expecting enumerables) and without intermediate arrays.

I see you have been referred to Enumertor and to_enum already.
Hmm, what I'm looking for is not just a simple memoization technique.
Suppose I have a function that computes the union of two arrays
whenever each array is changed. Using a perhaps more possible syntax:

attr_accessor :array_a

def foo
lazy_union = lazy_expr { lazy_depends:)array_a) +
lazy_depends:)array_b) }
@array_a = [1,2]
@array_b = [3,4]
p lazy_union # [1,2,3,4]
@array_a[1] = 5
p lazy_union # [1,5,3,4]
end

This is a case your memoization technique doesn't address. Yet I'm
pretty sure this is a common use case, so I was thinking that there
should be some library out there that provides this functionality.

Well, for this I'd rather create a custom class. In your case of Array
you could do something like this

# ALT: require 'md5'

class MemoFunc
def initialize(*args, &b)
@args = args
@hsh = nil
@fun = b
@result = self
end

def call
h = arg_hash
if @result == self || h != @hsh
@result = @fun[*@args]
@hsh = h
end
@result
end

def invalidate
@result = self
end

def rearg(*a)
@args = a
self
end

private
def arg_hash
@args.hash
# ALT: MD5.digest(Marshal.dump(@args))
end
end

irb(main):024:0> a = [1,2]
=> [1, 2]
irb(main):025:0> b = [3,4]
=> [3, 4]
irb(main):026:0> f = MemoFunc.new(a,b) {|x,y| puts "eval"; x | y}
=> #<MemoFunc:0x7ffa63cc @hsh=nil, @args=[[1, 2], [3, 4]],
@result=#<MemoFunc:0x7ffa63cc ...>, @fun=#<Proc:0x7ffa646c@(irb):26>>
irb(main):027:0> f.call
eval
=> [1, 2, 3, 4]
irb(main):028:0> f.call
=> [1, 2, 3, 4]
irb(main):029:0> a[1] = 5
=> 5
irb(main):030:0> f.call
eval
=> [1, 5, 3, 4]
irb(main):031:0> f.call
=> [1, 5, 3, 4]
irb(main):032:0> f.call
=> [1, 5, 3, 4]
irb(main):033:0>


The difficult part is to get detection of changes of the arguments right
(meaning safe and fast). There is no general solution that fits well to
all situations. My implementation with Array#hash is too weak for a
general solution. I am thinking that creating an MD5 of the marshalled
stream of the argument Array is more robust, but you may pay a
performance penalty. Which might be ok depending on the complexity of
the operation.

Kind regards

robert
 
B

Brian Candler

Yuh-Ruey Chen said:
Suppose I have a function that computes the union of two arrays
whenever each array is changed. Using a perhaps more possible syntax:

attr_accessor :array_a

def foo
lazy_union = lazy_expr { lazy_depends:)array_a) +
lazy_depends:)array_b) }
@array_a = [1,2]
@array_b = [3,4]
p lazy_union # [1,2,3,4]
@array_a[1] = 5
p lazy_union # [1,5,3,4]
end

There is an Observable pattern (observer.rb); however it doesn't
automatically notice when an instance variable has been changed, and I
don't know of a hook which detects that.

In any case, it's possible for the array to remain unchanged (i.e. it
contains references to the same objects) whilst those objects still
mutate:

@array_a = [1, "foo"]
...
@array[1] << "bar"

Actually, your "lazy_union" could be fine with this, since the union
object would also contain the mutated object, but a more general case
like lazy_concat which depends on the *values* of the objects would have
serious problems.

So, do you want your lazy dependency system to notice this? What about
objects held an arbitary number of levels deep?

I'm not sure that Ruby really suits itself well to this sort of pattern,
because of the mutable nature of objects. Perhaps you want to look at
Erlang instead.
 
B

Brian Candler

Yuh-Ruey Chen said:
Yes, this is exactly what I'm looking for - the ability to chain
iterators like this without creating intermediate arrays, hopefully
using less memory.

I've researched this a bit more, and it seems that ruby 1.9 *does*
support this model in a very smart way. Unfortunately the built-in
Enumerable methods like #map, #select, #reject don't, but these are easy
enough to synthesise. I started a new thread on that topic at
http://www.ruby-forum.com/topic/169807

Anyway, if you want the following pattern

generator --> filter --> filter ... --> consumer

you implement it as follows.

(1) The generator is any object which implements 'each' and yields the
members of the (possibly infinite) collection.

class Fib
include Enumerable
def initialize(a=1,b=1)
@a, @b = a, b
end
def each
a, b = @a, @b
yield a
while true
yield b
a, b = b, a+b
end
end
end

(by including Enumerable, you gain a whole load of useful methods which
in turn call your 'each' method)

(2) The consumer is an object which calls 'each' with a block that does
something with the data yielded to it:

obj.each { |i| puts i }

So far, all so ruby-1.8.

(3) The clever bit is the filter, and this only works in ruby-1.9. You
create an Enumerator object with a block which processes the data.

Enumerator.new do |y|
obj.each do |thing| # the filter INPUT
...
y << otherthing # the filter OUTPUT
end
end

The really clever bit is the "y << otherthing" line. By cunning use of
Fibers (I believe), execution of the filter stops at this point,
allowing the next filter along the chain to make use of the value as its
input. Once all the filters along the chain have finished using this
value, this filter carries on into the next iteration of its 'each'
loop, to take the next input from upstream.

So here's a complete example with an infinite source, two filters, and a
consumer. Notice that only the consumer decides when to terminate;
before this point, everything is a potentially infinite series.

module Enumerable
def lmap(&blk)
Enumerator.new do |y|
each do |e|
y << blk[e]
end
end
end

def lselect(&blk)
Enumerator.new do |y|
each do |e|
y << e if blk[e]
end
end
end
end

class Fib
include Enumerable
def initialize(a=1,b=1)
@a, @b = a, b
end
def each
a, b = @a, @b
yield a
while true
yield b
a, b = b, a+b
end
end
end

Fib.new.lselect { |i| i % 2 == 0 }.lmap { |i| "<#{i}>" }.
each_with_index { |i,cnt| puts i; break if cnt >= 20 }

$ ruby19 fib.rb
<2>
<8>
<34>
<144>
<610>
<2584>
<10946>
<46368>
<196418>
<832040>
<3524578>
<14930352>
<63245986>
<267914296>
<1134903170>
<4807526976>
<20365011074>
<86267571272>
<365435296162>
<1548008755920>
<6557470319842>
 
B

Brian Candler

And here's a wrapper for this pattern:

class Enumerator
def filter(&blk)
self.class.new do |y|
each do |*input|
blk.call(y, *input)
end
end
end
end

a = (1..1_000_000_000).to_enum
a.filter { |out,inp| out << inp if inp % 2 == 0 }.
filter { |out,inp| out << inp+100 }.
with_index.each { |inp,c| puts inp; break if c > 10 }

$ ruby19 -v
ruby 1.9.1 (2008-10-28 revision 19983) [i686-linux]
$ ruby19 filter.rb
102
104
106
108
110
112
114
116
118
120
122
124
$
 
B

Brian Candler

Actually, it turns out you don't even need ruby-1.9 for this. The
following runs fine in ruby-1.8.6:

module Enumerable
class Gen
include Enumerable
def initialize(&body)
@body = body
end
def each(&receiver)
@body.call(receiver)
end
end

def filter(&blk)
Gen.new do |y|
each do |*input|
blk.call(y, *input)
end
end
end
end

(1..10).
filter { |y,e| puts "Sending #{e.inspect}"; y[e]; puts "Sent" }.
filter { |y,e| y[e] if e % 2 == 0; sleep 0.5 }.
filter { |y,e| y[e + 100]; sleep 0.5 }.
each { |e| puts e }

The puts and sleep calls are in there to show this is proper
'horizontal' evaluation, with no intermediate arrays. But if you want an
array of the finished product, just replace the last line with "to_a"

I wouldn't be surprised if this pattern exists in a library somewhere,
but I don't know where.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top