Ok, I think that the bottom line is this :
For all the algorithms that run after their tail in an FP way, like the
Hamming problem, or the Fibonacci sequence, (but unlike Sieve of
Eratosthene -- there's a subtle difference), i.e. all those algorithms that
typically rely upon recursion to get the beginning of the generated
elements in order to generate the next one ...
... so for this family of algorithms, we have, somehow, to abandon
recursion, and to explicitely maintain one or many lists of what had been
generated.
One solution for this is suggested in "test_generators.py". But I think
that this solution is evil as it looks like recursion is used but it is not
and it depends heavily on how the m235 function (for example) is invoked.
Furthermore, it is _NOT_ memory efficient as pretended : it leaks ! It
internally maintains a lists of generated results that grows forever while
it is useless to maintain results that had been "consumed". Just for a
gross estimate, on my system, percentage of memory usage for this program
grows rapidly, reaching 21.6 % after 5 minutes. CPU usage varies between
31%-36%. Ugly and inefficient.
The solution of Jeff Epler is far more elegant. The "itertools.tee"
function does just what we want. It internally maintain a list of what had
been generated and deletes the "consumed" elements as the algo unrolls. To
follow with my gross estimate, memory usage grows from 1.2% to 1.9% after 5
minutes (probably only affected by the growing size of Huge Integer). CPU
usage varies between 27%-29%.
Beautiful and effecient.
You might think that we shouldn't be that fussy about memory usage on
generating 100 digits numbers but we're talking about a whole family of
useful FP algorithms ; and the best way to implement them should be
established.
For this family of algorithms, itertools.tee is the way to go.
I think that the semi-tutorial in "test_generators.py" should be updated to
use "tee". Or, at least, a severe warning comment should be written.
Thank you,
Francis Girard
FRANCE
Le dimanche 23 Janvier 2005 23:27, Jeff Epler a écrit :
Your formulation in Python is recursive (hamming calls hamming()) and I
think that's why your program gives up fairly early.
Instead, use itertools.tee() [new in Python 2.4, or search the internet
for an implementation that works in 2.3] to give a copy of the output
sequence to each "multiply by N" function as well as one to be the
return value.
Here's my implementation, which matched your list early on but
effortlessly reached larger values. One large value it printed was
6412351813189632 (a Python long) which indeed has only the distinct
prime factors 2 and 3. (2**43 * 3**6)
Jeff
from itertools import tee
import sys
def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()
def hamming():
def _hamming(j, k):
yield 1
hamming = generators[j]
for i in hamming:
yield i * k
generators = []
generator = imerge(imerge(_hamming(0, 2), _hamming(1, 3)),
_hamming(2, 5)) generators[:] = tee(generator, 4)
return generators[3]
for i in hamming():
print i,
sys.stdout.flush()