need help on generator...

C

Craig Ringer

Maybe ''.join(infile) is a better way to express this functionality?
Avoids 2.4 dependency and should be faster as well as more concise.

Thanks - for some reason I hadn't clicked to that. Obvious in hindsight,
but I just completely missed it.
It's certainly worth doing a patch and see what the python-dev crowd
thinks of it, I think; it might make it into 2.5.

I'll certainly look into doing so. I'm not dumb enough to say "Sure,
I'll do that" before looking into the code involved and thinking more
about what issues could pop up. Still, it's been added to my
increasingly frightening TODO list.
Under what conditions, exactly, would you want such an exception?

When read() or other methods suffering from the same issue are called
after next() without an intervening seek(). It'd mean another state flag
for the file to keep track of - but I doubt it'd make any detectable
difference in performance given that there's disk I/O involved.

I'd be happier to change the behaviour so that a warning isn't
necessary, though, and I suspect it can be done without introducing
backward compatibility issues. Well, so long as nobody is relying on the
undefined file position after using an iterator - and I'm not dumb
enough to say nobody would ever do that.

I've really got myself into hot water now though - I'm going to have to
read over the `file' source code before impulsively saying anything
REALLY stupid.
 
T

Terry Reedy

If I understand correctly,
Almost...

a "generator" produce something over which you can
iterate with the help of an "iterator".

To be exact, the producer is a generator function, a function whose body
contains 'yield'. In CPython, the difference after executing the def is
that a generator function has a particular flag set. People sometimes
shorten 'generator function' to 'generator' as you did, but calling both a
factory and its products by the same name is confusing. (For instance, try
calling an automobile factory an automobile).
<function genf at 0x008873B8>

The result of calling a generator function is a generator, which is one but
only one type of iterator.
[<stuff inherited from object>, '__iter__', ' gi_frame', 'gi_running',
'next']

The .__iter__ and .next methods make this an iterator. The two data
attributes are for internal use.
Can you iterate (in the strict sense
of an "iterator") over something not generated by a "generator" ?

Of course. Again, a generator is one specific type of iterator, where an
iterator is anything with the appropriate .__iter__ and .next methods.

Terry J. Reedy
 
F

Francis Girard

Hi,

First,

My deepest thanks to Craig Ringer, Alex Martelli, Nick Coghlan and Terry Reedy
for having generously answered on the "Need help on need help on generator"
thread. I'm compiling the answers to sketch myself a global pictures about
iterators, generators, iterator-generators and laziness in python.

In the meantime, I couldn't resist to test the new Python features about
laziness on a classical FP problem, i.e. the "Hamming" problem.

The name of the game is to produce the sequence of integers satisfying the
following rules :

(i) The list is in ascending order, without duplicates
(ii) The list begins with the number 1
(iii) If the list contains the number x, then it also contains the numbers
2*x, 3*x, and 5*x
(iv) The list contains no other numbers.

The algorithm in FP is quite elegant. Simply suppose that the infinite
sequence is produced, then simply merge the three sequences (2*x,3*x,5*x) for
each x in the infinite sequence we supposed as already produced ; this is
O(n) complexity for n numbers.

I simply love those algorithms that run after their tails.

In haskell, the algorithm is translated as such :

-- BEGIN SNAP
-- hamming.hs

-- Merges two infinite lists
merge :: (Ord a) => [a] -> [a] -> [a]
merge (x:xs)(y:ys)
| x == y = x : merge xs ys
| x < y = x : merge xs (y:ys)
| otherwise = y : merge (x:xs) ys

-- Lazily produce the hamming sequence
hamming :: [Integer]
hamming
= 1 : merge (map (2*) hamming) (merge (map (3*) hamming) (map (5*) hamming))
-- END SNAP


In Python, I figured out this implementation :

-- BEGIN SNAP
import sys
from itertools import imap

## Merges two infinite lists
def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else: # if y < x:
yield y
y = ys.next()

## Lazily produce the hamming sequence
def hamming():
yield 1 ## Initialize the machine
for n in imerge(imap(lambda h: 2*h, hamming()),
imerge(imap(lambda h: 3*h, hamming()),
imap(lambda h: 5*h, hamming()))):
yield n
print "Falling out -- We should never get here !!"

for n in hamming():
sys.stderr.write("%s " % str(n)) ## stderr for unbuffered output
-- END SNAP


My goal is not to compare Haskell with Python on a classical FP problem, which
would be genuine stupidity.

Nevertheless, while the Haskell version prints Hamming sequence for as long as
I can stand it, and with very little memory consumation, the Python version
only prints :
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 54 60 64 72 75
80 81 90 96 100 108 120 125 128 135 144 150 160 162 180 192 200 216 225 240
243 250 256 270 288 300 320 324 360 375 384 400 405 432 450 480 486 500 512
540 576 600 625 640 648 675 720 729 750 768 800 810 864 900 960 972 1000 1024
1080 1125 1152 1200 1215 1250 1280 1296 1350 1440 1458 1500 1536 1600 1620
1728 1800 1875 1920 1944 2000 2025 2048 2160 2187 2250 2304 2400 2430 2500
2560 2592 2700 2880 2916 3000 3072 3125 3200 3240 3375 3456 3600 3645 3750
3840 3888 4000 4050 4096 4320 4374 4500 4608 4800 4860 5000 5120 5184 5400
5625 5760 5832 6000 6075 6144 6250 6400 6480 6561 6750 6912 7200 7290 7500
7680 7776 8000 8100 8192 8640 8748 9000 9216 9375 9600 9720 10000 10125 10240
10368 10800 10935 11250 11520 11664 12000 12150 12288 12500 12800 12960 13122
13500 13824 14400 14580 15000 15360 15552 15625 16000 16200 16384 16875 17280
17496 18000 18225 18432 18750 19200 19440 19683 20000 20250 20480 20736 21600
21870 22500 23040 23328 24000 24300 24576 25000 25600 25920 26244 27000 27648
28125 28800 29160 30000 30375 30720 31104 31250 32000 32400 32768 32805 33750
34560 34992 36000 36450 36864 37500 38400 38880 39366 40000 40500 40960 41472
43200 43740 45000 46080 46656 46875 48000 48600 49152 50000 50625 51200 51840
52488 54000 54675 55296 56250 57600
Processus arrêté

After 57600, my machine begins swapping like crazy and I do have to kill the
python processus.

I think I should not have this kind of behaviour, even using recursion, since
I'm only using lazy constructs all the time. At least, I would expect the
program to produce much more results before surrending.

What's going on ?

Thank you

Francis Girard
FRANCE
 
J

Jeff Epler

Your formulation in Python is recursive (hamming calls hamming()) and I
think that's why your program gives up fairly early.

Instead, use itertools.tee() [new in Python 2.4, or search the internet
for an implementation that works in 2.3] to give a copy of the output
sequence to each "multiply by N" function as well as one to be the
return value.

Here's my implementation, which matched your list early on but
effortlessly reached larger values. One large value it printed was
6412351813189632 (a Python long) which indeed has only the distinct
prime factors 2 and 3. (2**43 * 3**6)

Jeff

from itertools import tee
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()

def hamming():
def _hamming(j, k):
yield 1
hamming = generators[j]
for i in hamming:
yield i * k
generators = []
generator = imerge(imerge(_hamming(0, 2), _hamming(1, 3)), _hamming(2, 5))
generators[:] = tee(generator, 4)
return generators[3]

for i in hamming():
print i,
sys.stdout.flush()

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFB9CTeJd01MZaTXX0RAlHPAJ4gJwuPxCzBdnWiM1/Rs3eUPk1HMwCeJUXh
selUGrwJc9T8wE6aCQOjq+Q=
=AW5F
-----END PGP SIGNATURE-----
 
T

Tim Peters

[Francis Girard]
...
In the meantime, I couldn't resist to test the new Python features about
laziness on a classical FP problem, i.e. the "Hamming" problem. ....
Nevertheless, while the Haskell version prints Hamming sequence for as long as
I can stand it, and with very little memory consumation, the Python version
only prints : ....
I think I should not have this kind of behaviour, even using recursion, since
I'm only using lazy constructs all the time. At least, I would expect the
program to produce much more results before surrending.

What's going on ?

You can find an explanation in Lib/test/test_generators.py, which uses
this problem as an example; you can also find one efficient way to
write it there.
 
F

Francis Girard

Ok, I think that the bottom line is this :

For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of Eratosthene
-- there's a subtle difference), i.e. all those algorithms that typically
rely upon recursion to get the beginning of the generated elements in order
to generate the next one ...

... so for this family of algorithms, we have, somehow, to abandon recursion,
and to explicitely maintain one or many lists of what had been generated.

One solution for this is suggested in "test_generators.py". But I think that
this solution is evil as it looks like recursion is used but it is not and it
depends heavily on how the m235 function (for example) is invoked.
Furthermore, it is _NOT_ memory efficient as pretended : it leaks ! It
internally maintains a lists of generated results that grows forever while it
is useless to maintain results that had been "consumed". Just for a gross
estimate, on my system, percentage of memory usage for this program grows
rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between 31%-36%.
Ugly and inefficient.

The solution of Jeff Epler is far more elegant. The "itertools.tee" function
does just what we want. It internally maintain a list of what had been
generated and deletes the "consumed" elements as the algo unrolls. To follow
with my gross estimate, memory usage grows from 1.2% to 1.9% after 5 minutes
(probably only affected by the growing size of Huge Integer). CPU usage
varies between 27%-29%.
Beautiful and effecient.

You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.

For this family of algorithms, itertools.tee is the way to go.

I think that the semi-tutorial in "test_generators.py" should be updated to
use "tee". Or, at least, a severe warning comment should be written.

Thank you,

Francis Girard
FRANCE

Le dimanche 23 Janvier 2005 23:27, Jeff Epler a écrit :
Your formulation in Python is recursive (hamming calls hamming()) and I
think that's why your program gives up fairly early.

Instead, use itertools.tee() [new in Python 2.4, or search the internet
for an implementation that works in 2.3] to give a copy of the output
sequence to each "multiply by N" function as well as one to be the
return value.

Here's my implementation, which matched your list early on but
effortlessly reached larger values. One large value it printed was
6412351813189632 (a Python long) which indeed has only the distinct
prime factors 2 and 3. (2**43 * 3**6)

Jeff

from itertools import tee
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()

def hamming():
def _hamming(j, k):
yield 1
hamming = generators[j]
for i in hamming:
yield i * k
generators = []
generator = imerge(imerge(_hamming(0, 2), _hamming(1, 3)), _hamming(2,
5)) generators[:] = tee(generator, 4)
return generators[3]

for i in hamming():
print i,
sys.stdout.flush()
 
F

Francis Girard

The following implementation is even more speaking as it makes self-evident
and almost mechanical how to translate algorithms that run after their tail
from recursion to "tee" usage :

*** BEGIN SNAP
from itertools import tee, imap
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()


def hamming():
def _hamming():
yield 1
hamming2 = hammingGenerators[0]
hamming3 = hammingGenerators[1]
hamming5 = hammingGenerators[2]
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hammingGenerators = tee(_hamming(), 4)
return hammingGenerators[3]

for i in hamming():
print i,
sys.stdout.flush()
*** END SNAP

Here's an implementation of the fibonacci sequence that uses "tee" :

*** BEGIN SNAP
from itertools import tee
import sys

def fib():
def _fib():
yield 1
yield 1
curGen = fibGenerators[0]
curAheadGen = fibGenerators[1]
curAheadGen.next()
while True:
yield curGen.next() + curAheadGen.next()
fibGenerators = tee(_fib(), 3)
return fibGenerators[2]

for n in fib():
print n,
sys.stdout.flush()
*** END SNAP

Francis Girard
FRANCE


Le lundi 24 Janvier 2005 14:09, Francis Girard a écrit :
Ok, I think that the bottom line is this :

For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of
Eratosthene -- there's a subtle difference), i.e. all those algorithms that
typically rely upon recursion to get the beginning of the generated
elements in order to generate the next one ...

... so for this family of algorithms, we have, somehow, to abandon
recursion, and to explicitely maintain one or many lists of what had been
generated.

One solution for this is suggested in "test_generators.py". But I think
that this solution is evil as it looks like recursion is used but it is not
and it depends heavily on how the m235 function (for example) is invoked.
Furthermore, it is _NOT_ memory efficient as pretended : it leaks ! It
internally maintains a lists of generated results that grows forever while
it is useless to maintain results that had been "consumed". Just for a
gross estimate, on my system, percentage of memory usage for this program
grows rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between
31%-36%. Ugly and inefficient.

The solution of Jeff Epler is far more elegant. The "itertools.tee"
function does just what we want. It internally maintain a list of what had
been generated and deletes the "consumed" elements as the algo unrolls. To
follow with my gross estimate, memory usage grows from 1.2% to 1.9% after 5
minutes (probably only affected by the growing size of Huge Integer). CPU
usage varies between 27%-29%.
Beautiful and effecient.

You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.

For this family of algorithms, itertools.tee is the way to go.

I think that the semi-tutorial in "test_generators.py" should be updated to
use "tee". Or, at least, a severe warning comment should be written.

Thank you,

Francis Girard
FRANCE

Le dimanche 23 Janvier 2005 23:27, Jeff Epler a écrit :
Your formulation in Python is recursive (hamming calls hamming()) and I
think that's why your program gives up fairly early.

Instead, use itertools.tee() [new in Python 2.4, or search the internet
for an implementation that works in 2.3] to give a copy of the output
sequence to each "multiply by N" function as well as one to be the
return value.

Here's my implementation, which matched your list early on but
effortlessly reached larger values. One large value it printed was
6412351813189632 (a Python long) which indeed has only the distinct
prime factors 2 and 3. (2**43 * 3**6)

Jeff

from itertools import tee
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()

def hamming():
def _hamming(j, k):
yield 1
hamming = generators[j]
for i in hamming:
yield i * k
generators = []
generator = imerge(imerge(_hamming(0, 2), _hamming(1, 3)),
_hamming(2, 5)) generators[:] = tee(generator, 4)
return generators[3]

for i in hamming():
print i,
sys.stdout.flush()
 
T

Tim Peters

[Francis Girard]
For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of Eratosthene
-- there's a subtle difference), i.e. all those algorithms that typically
rely upon recursion to get the beginning of the generated elements in order
to generate the next one ...

... so for this family of algorithms, we have, somehow, to abandon recursion,
and to explicitely maintain one or many lists of what had been generated.

One solution for this is suggested in "test_generators.py". But I think that
this solution is evil as it looks like recursion is used but it is not and it
depends heavily on how the m235 function (for example) is invoked.

Well, yes -- "Heh. Here's one way to get a shared list, complete with
an excruciating namespace renaming trick" was intended to warn you in
advance that it wasn't pretty said:
Furthermore, it is _NOT_ memory efficient as pretended : it leaks !

Yes. But there are two solutions to the problem in that file, and the
second one is in fact extremely memory-efficient compared to the first
one. "Efficiency" here was intended in a relative sense.
It internally maintains a lists of generated results that grows forever while it
is useless to maintain results that had been "consumed". Just for a gross
estimate, on my system, percentage of memory usage for this program grows
rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between 31%-36%.
Ugly and inefficient.

Try the first solution in the file for a better understanding of what
inefficient" means said:
The solution of Jeff Epler is far more elegant. The "itertools.tee" function
does just what we want. It internally maintain a list of what had been
generated and deletes the "consumed" elements as the algo unrolls. To follow
with my gross estimate, memory usage grows from 1.2% to 1.9% after 5 minutes
(probably only affected by the growing size of Huge Integer). CPU usage
varies between 27%-29%.
Beautiful and effecient.

Yes, it is better. tee() didn't exist when generators (and
test_generators.py) were written, so of course nothing in the test
file uses them.
You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.

Possibly -- there really aren't many Pythonistas who care about this.
For this family of algorithms, itertools.tee is the way to go.

I think that the semi-tutorial in "test_generators.py" should be updated to
use "tee". Or, at least, a severe warning comment should be written.

Please submit a patch. The purpose of that file is to test
generators, so you should add a third way of doing it, not replace the
two ways already there. It should also contain prose explaining why
the third way is better (just as there's prose now explaining why the
second way is better than the first).
 
F

Francis Girard

Ok I'll submit the patch with the prose pretty soon.
Thank you
Francis Girard
FRANCE

Le mardi 25 Janvier 2005 04:29, Tim Peters a écrit :
[Francis Girard]
For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of
Eratosthene -- there's a subtle difference), i.e. all those algorithms
that typically rely upon recursion to get the beginning of the generated
elements in order to generate the next one ...

... so for this family of algorithms, we have, somehow, to abandon
recursion, and to explicitely maintain one or many lists of what had been
generated.

One solution for this is suggested in "test_generators.py". But I think
that this solution is evil as it looks like recursion is used but it is
not and it depends heavily on how the m235 function (for example) is
invoked.

Well, yes -- "Heh. Here's one way to get a shared list, complete with
an excruciating namespace renaming trick" was intended to warn you in
advance that it wasn't pretty said:
Furthermore, it is _NOT_ memory efficient as pretended : it leaks !

Yes. But there are two solutions to the problem in that file, and the
second one is in fact extremely memory-efficient compared to the first
one. "Efficiency" here was intended in a relative sense.
It internally maintains a lists of generated results that grows forever
while it is useless to maintain results that had been "consumed". Just
for a gross estimate, on my system, percentage of memory usage for this
program grows rapidly, reaching 21.6 % after 5 minutes. CPU usage varies
between 31%-36%. Ugly and inefficient.

Try the first solution in the file for a better understanding of what
inefficient" means said:
The solution of Jeff Epler is far more elegant. The "itertools.tee"
function does just what we want. It internally maintain a list of what
had been generated and deletes the "consumed" elements as the algo
unrolls. To follow with my gross estimate, memory usage grows from 1.2%
to 1.9% after 5 minutes (probably only affected by the growing size of
Huge Integer). CPU usage varies between 27%-29%.
Beautiful and effecient.

Yes, it is better. tee() didn't exist when generators (and
test_generators.py) were written, so of course nothing in the test
file uses them.
You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.

Possibly -- there really aren't many Pythonistas who care about this.
For this family of algorithms, itertools.tee is the way to go.

I think that the semi-tutorial in "test_generators.py" should be updated
to use "tee". Or, at least, a severe warning comment should be written.

Please submit a patch. The purpose of that file is to test
generators, so you should add a third way of doing it, not replace the
two ways already there. It should also contain prose explaining why
the third way is better (just as there's prose now explaining why the
second way is better than the first).
 
M

Michael Spencer

Francis said:
The following implementation is even more speaking as it makes self-evident
and almost mechanical how to translate algorithms that run after their tail
from recursion to "tee" usage :

Thanks, Francis and Jeff for raising a fascinating topic. I've enjoyed trying
to get my head around both the algorithm and your non-recursive implementation.

Here's a version of your implementation that uses a helper class to make the
algorithm itself prettier.

from itertools import tee, imap

def hamming():
def _hamming():
yield 1
for n in imerge(2 * hamming, imerge(3 * hamming, 5 * hamming)):
yield n

hamming = Tee(_hamming())
return iter(hamming)


class Tee(object):
"""Provides an indepent iterator (using tee) on every iteration request
Also implements lazy iterator arithmetic"""
def __init__(self, iterator):
self.iter = tee(iterator,1)[0]
def __iter__(self):
return self.iter.__copy__()
def __mul__(self, number):
return imap(lambda x: x * number,self.__iter__())

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else: # if y < x:
yield y
y = ys.next()
.... n = hg.next()
.... if i % 1000 == 0: print i, n
....
0 1
1000 51840000
2000 8100000000
3000 279936000000
4000 4707158941350
5000 50960793600000
6000 409600000000000
7000 2638827906662400
8000 14332723200000000
9000 68024448000000000


Regards

Michael
 
N

Nick Craig-Wood

Francis Girard said:
def hamming():
def _hamming():
yield 1
hamming2 = hammingGenerators[0]
hamming3 = hammingGenerators[1]
hamming5 = hammingGenerators[2]
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hammingGenerators = tee(_hamming(), 4)
return hammingGenerators[3]

If you are after readability, you might prefer this...

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result

PS interesting thread - never heard of Hamming sequences before!
 
B

Bengt Richter

Francis Girard said:
def hamming():
def _hamming():
yield 1
hamming2 = hammingGenerators[0]
hamming3 = hammingGenerators[1]
hamming5 = hammingGenerators[2]
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hammingGenerators = tee(_hamming(), 4)
return hammingGenerators[3]

If you are after readability, you might prefer this...

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result

PS interesting thread - never heard of Hamming sequences before!

Are the long words really that helpful?

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hg2)),
imerge(imap(lambda h: 3*h, iter(hg3)),
imap(lambda h: 5*h, iter(hg5)))):
yield n
hg2, hg3, hg5, result = tee(_hamming(), 4) # four hamming generators
return result

Regards,
Bengt Richter
 
N

Nick Craig-Wood

Francis Girard said:
The following implementation is even more speaking as it makes self-evident
and almost mechanical how to translate algorithms that run after their tail
from recursion to "tee" usage :

*** BEGIN SNAP
from itertools import tee, imap
import sys

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()

Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values == x:
values = generators.next()
def hamming():
def _hamming():
yield 1
hamming2 = hammingGenerators[0]
hamming3 = hammingGenerators[1]
hamming5 = hammingGenerators[2]
for n in imerge(imap(lambda h: 2*h, iter(hamming2)),
imerge(imap(lambda h: 3*h, iter(hamming3)),
imap(lambda h: 5*h, iter(hamming5)))):
yield n
hammingGenerators = tee(_hamming(), 4)
return hammingGenerators[3]

This means that this can be further simplified thus,

def hamming():
def _hamming():
yield 1
for n in imerge( imap(lambda h: 2*h, hamming2),
imap(lambda h: 3*h, hamming3),
imap(lambda h: 5*h, hamming5) ):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result

(Note the iter(...) seemed not to be doing anything useful so I
removed them)
 
S

Steven Bethard

Nick said:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values == x:
values = generators.next()


This kinda looks like it dies after the first generator is exhausted,
but I'm not certain. An alternate version that doesn't search for 'i':

py> def imerge(*iterables):
.... iters = [iter(i) for i in iterables]
.... values = [i.next() for i in iters]
.... while iters:
.... x, i = min((val, i) for i, val in enumerate(values))
.... yield x
.... try:
.... values = iters.next()
.... except StopIteration:
.... del iters
.... del values
....
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([3, 6, 9], [1, 4, 7], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([1, 4, 7], [3, 6, 9], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
This means that this can be further simplified thus,

def hamming():
def _hamming():
yield 1
for n in imerge( imap(lambda h: 2*h, hamming2),
imap(lambda h: 3*h, hamming3),
imap(lambda h: 5*h, hamming5) ):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result

Nice.

Steve
 
N

Nick Craig-Wood

Steven Bethard said:
Nick said:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values == x:
values = generators.next()


This kinda looks like it dies after the first generator is exhausted,
but I'm not certain.


Yes it will stop iterating then (rather like zip() on lists of unequal
size). Not sure what the specification should be! It works for the
hamming problem though.
list(imerge(iter([1, 2]), iter([1, 2, 3]), iter([1, 2, 3, 4])))
[1, 2]
An alternate version that doesn't search for 'i':

py> def imerge(*iterables):
... iters = [iter(i) for i in iterables]
... values = [i.next() for i in iters]
... while iters:
... x, i = min((val, i) for i, val in enumerate(values))
... yield x
... try:
... values = iters.next()
... except StopIteration:
... del iters
... del values
...
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([3, 6, 9], [1, 4, 7], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([1, 4, 7], [3, 6, 9], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]


This isn't quite right...
list(imerge([1, 2, 3], [1, 2, 3], [1, 2, 3]))
[1, 1, 1, 2, 2, 2, 3, 3, 3]

This should produce
[1, 2, 3]

So I'm afraid the searching *is* necessary - you've got to find all
the generators with the min value and move them on.
 
S

Steven Bethard

Nick said:
Steven Bethard said:
Nick said:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values == x:
values = generators.next()


This kinda looks like it dies after the first generator is exhausted,
but I'm not certain.



Yes it will stop iterating then (rather like zip() on lists of unequal
size). Not sure what the specification should be! It works for the
hamming problem though.


Actually, it stops iterating on lists of equal size too:

py> def imerge(*iterators):
.... iterators = [iter(i) for i in iterators]
.... values = [i.next() for i in iterators]
.... while True:
.... x = min(values)
.... yield x
.... for i, val in enumerate(values):
.... if val == x:
.... values = iterators.next()
....
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7]

Note that 8 and 9 are not in the list.

Steve
 
M

Michael Spencer

Nick said:
Steven Bethard said:
Nick said:
Thinking about this some more leads me to believe a general purpose
imerge taking any number of arguments will look neater, eg

def imerge(*generators):
values = [ g.next() for g in generators ]
while True:
x = min(values)
yield x
for i in range(len(values)):
if values == x:
values = generators.next()


This kinda looks like it dies after the first generator is exhausted,
but I'm not certain.



Yes it will stop iterating then (rather like zip() on lists of unequal
size). Not sure what the specification should be! It works for the
hamming problem though.

list(imerge(iter([1, 2]), iter([1, 2, 3]), iter([1, 2, 3, 4])))

[1, 2]

An alternate version that doesn't search for 'i':

py> def imerge(*iterables):
... iters = [iter(i) for i in iterables]
... values = [i.next() for i in iters]
... while iters:
... x, i = min((val, i) for i, val in enumerate(values))
... yield x
... try:
... values = iters.next()
... except StopIteration:
... del iters
... del values
...
py> list(imerge([1, 4, 7], [2, 5, 8], [3, 6, 9]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([3, 6, 9], [1, 4, 7], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]
py> list(imerge([1, 4, 7], [3, 6, 9], [2, 5, 8]))
[1, 2, 3, 4, 5, 6, 7, 8, 9]



This isn't quite right...

list(imerge([1, 2, 3], [1, 2, 3], [1, 2, 3]))

[1, 1, 1, 2, 2, 2, 3, 3, 3]

This should produce
[1, 2, 3]

So I'm afraid the searching *is* necessary - you've got to find all
the generators with the min value and move them on.

Here's a dict-based implementation: cute, but slow, at least for a small number
of iterators
... cache = {}
... iterators = map(iter,iterables)
... number = len(iterables)
... exhausted = 0
... while 1:
... for it in iterators:
... try:
... cache.setdefault(it.next(),[]).append(it)
... except StopIteration:
... exhausted += 1
... if exhausted == number:
... raise StopIteration
... val = min(cache)
... iterators = cache.pop(val)
... yield val
>>> list(imerge([1, 2, 3, 6], [1, 2, 3, 7], [1, 2, 3, 4, 5])) [1, 2, 3, 4, 5, 6, 7]
>>>

Michael
 
J

Jeff Shannon

Bengt said:
Are the long words really that helpful?

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, iter(hg2)),
imerge(imap(lambda h: 3*h, iter(hg3)),
imap(lambda h: 5*h, iter(hg5)))):
yield n
hg2, hg3, hg5, result = tee(_hamming(), 4) # four hamming generators
return result

Well, judging by the fact that shortening the identifiers made it so
that you felt the need to add a comment indicating what they were
identifying, I'd say that yes, the long words *are* helpful. ;)
Comments are good, but self-documenting code is even better.

Jeff Shannon
Technician/Programmer
Credit International
 
B

Bengt Richter

Well, judging by the fact that shortening the identifiers made it so
that you felt the need to add a comment indicating what they were
identifying, I'd say that yes, the long words *are* helpful. ;)
Comments are good, but self-documenting code is even better.

The comment was a conscious factoring decision, in terms of documentation ;-)

Regards,
Bengt Richter
 
F

Francis Girard

Le mardi 25 Janvier 2005 09:01, Michael Spencer a écrit :
Thanks, Francis and Jeff for raising a fascinating topic. I've enjoyed
trying to get my head around both the algorithm and your non-recursive
implementation.

Yes, it's been fun.
Here's a version of your implementation that uses a helper class to make
the algorithm itself prettier.

from itertools import tee, imap

def hamming():
def _hamming():
yield 1
for n in imerge(2 * hamming, imerge(3 * hamming, 5 * hamming)):
yield n

hamming = Tee(_hamming())
return iter(hamming)


class Tee(object):
"""Provides an indepent iterator (using tee) on every iteration
request Also implements lazy iterator arithmetic"""
def __init__(self, iterator):
self.iter = tee(iterator,1)[0]
def __iter__(self):
return self.iter.__copy__()
def __mul__(self, number):
return imap(lambda x: x * number,self.__iter__())

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else: # if y < x:
yield y
y = ys.next()

... n = hg.next()
... if i % 1000 == 0: print i, n
...
0 1
1000 51840000
2000 8100000000
3000 279936000000
4000 4707158941350
5000 50960793600000
6000 409600000000000
7000 2638827906662400
8000 14332723200000000
9000 68024448000000000

Interesting idea.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top