Making the case for repeat

P

pataphor

This probably has a snowballs change in hell of ending up in builtins or
even some in some module, but such things should not prevent one to
try and present the arguments for what one thinks is right. Else one
would end up with consequentialism and that way lies madness and
hyperreality.

So here is my proposed suggestion for a once and for all reconciliation
of various functions in itertools that can not stand on their own and
keep a straight face. Because of backwards compatibility issues we
cannot remove them but we can boldly jump forward and include the right
repeat in the builtin namespace, which I think would be the best thing.
Alternatively -- the second best solution -- would be to give this
function its own namespace where it can supersede the old incongruencies
in itertools. Combiniter or combinator?

P.

from itertools import count
from functools import partial

def repeat(iterable, cycles = None, times = 1):
L = []
for x in iterable:
for i in xrange(times):
yield x
L.append(x)
counter = count if cycles is None else partial(xrange,cycles-1)
for _ in counter():
for x in L:
for i in xrange(times):
yield x

def test():
#making the case for repeat
from itertools import islice, cycle
times = 2
n = 3
cycles = 2
L = range(n)
#look ma, no islice!
print list(repeat(L, cycles))
print list(repeat(L, cycles, times))
#repeat without extra args works like itertools.cycle:
print list(islice(repeat(L),len(L)*cycles))
print list(islice(cycle(L),len(L)*cycles))
#enclosing a single item in a list emulates
#itertools.repeat functionality:
print list(repeat(['Mr. Anderson'],cycles))

if __name__ == '__main__':
test()
 
S

Steven D'Aprano

This probably has a snowballs change in hell of ending up in builtins or
even some in some module, but such things should not prevent one to try
and present the arguments for what one thinks is right. Else one would
end up with consequentialism and that way lies madness and hyperreality.

It would be cruel of me to say "Too late", so I shall merely ask, what on
earth are you going on about?
So here is my proposed suggestion for a once and for all reconciliation
of various functions in itertools that can not stand on their own and
keep a straight face. Because of backwards compatibility issues we
cannot remove them but we can boldly jump forward and include the right
repeat in the builtin namespace, which I think would be the best thing.


What is "the right repeat"? What's wrong with the old one? If you're
going to make a proposal, you have to actually *make the proposal* and
not just say "Here, have some code, now put it in the builtins because
the rest of itertools is teh suxor!!!". That rarely goes down well.

(I don't know if that's exactly what you're trying to say, but it seems
that way to me.)

I've run your test code, and I don't know what I'm supposed to be
impressed by.
 
P

pataphor

Steven said:
I've run your test code, and I don't know what I'm supposed to be
impressed by.

Thank you for trying out the code. That you're unimpressed actually is a
huge encouragement because code should just run the way people expect,
without unnecessary surprises.

P.
 
G

Gabriel Genellina

So here is my proposed suggestion for a once and for all reconciliation
of various functions in itertools that can not stand on their own and
keep a straight face. Because of backwards compatibility issues we
cannot remove them but we can boldly jump forward and include the right
repeat in the builtin namespace, which I think would be the best thing.
Alternatively -- the second best solution -- would be to give this
function its own namespace where it can supersede the old incongruencies
in itertools. Combiniter or combinator?

Ok, you're proposing a "bidimensional" repeat. I prefer to keep things
simple, and I'd implement it in two steps. First, something similar to
your repeat_each function in another post:

py> thing = ['1','2','3','4']
py> chain.from_iterable(repeat(elem, 3) for elem in thing)
<itertools.chain object at 0x00BECB90>
py> list(_)
['1', '1', '1', '2', '2', '2', '3', '3', '3', '4', '4', '4']

Note that this doesn't require any additional storage. Second step would
be to build a bidimensional repeat:

py> one = chain.from_iterable(repeat(elem, 3) for elem in thing)
py> two = chain.from_iterable(tee(one, 2))
py> list(two)
['1', '1', '1', '2', '2', '2', '3', '3', '3', '4', '4', '4', '1', '1',
'1', '2',
'2', '2', '3', '3', '3', '4', '4', '4']

Short and simple, but this one requires space for one complete run (3*4
items in the example).
Another variant that only requires space for freezing the original
iterable (4 items in the example) is:
thing = ['1','2','3','4']
items = list(thing) # ok, silly in this case, but not for a generic
iterable
chain.from_iterable(chain.from_iterable(repeat(elem, 3) for elem in
items) f
or rownumber in range(2))
['1', '1', '1', '2', '2', '2', '3', '3', '3', '4', '4', '4', '1', '1',
'1', '2',
'2', '2', '3', '3', '3', '4', '4', '4']

All of them run at full speed, using the optimized itertools machinery
written in C.
 
P

pataphor

Gabriel said:
Ok, you're proposing a "bidimensional" repeat. I prefer to keep things
simple, and I'd implement it in two steps.

But what is simple? I am currently working on a universal feature
creeper that could replace itertools.cycle, itertools.repeat,
itertools.chain and reverse and also helps to severely cut down on
itertools.islice usage. All within virtually the same parameter
footprint as the last function I posted. The problem is posting *this*
function would kill my earlier repeat for sure. And it already had a
problem with parameters < 0 (Hint: that last bug has now become a
feature in the unpostable repeat implementation)
Note that this doesn't require any additional storage. Second step would
be to build a bidimensional repeat:

Thanks for reminding me, but the storage savings only work for a
'single cycle' function call. But I guess one could special case for
that.
py> one = chain.from_iterable(repeat(elem, 3) for elem in thing)
py> two = chain.from_iterable(tee(one, 2))
py> list(two)
['1', '1', '1', '2', '2', '2', '3', '3', '3', '4', '4', '4', '1', '1',
'1', '2',
'2', '2', '3', '3', '3', '4', '4', '4']

Short and simple, but this one requires space for one complete run (3*4
items in the example).

Really? I count 4 nested functions and an iterator comprehension. I
guess it's a tradeoff between feature creep and function nesting
creep.

P.
 
G

Gabriel Genellina

But what is simple? I am currently working on a universal feature
creeper that could replace itertools.cycle, itertools.repeat,
itertools.chain and reverse and also helps to severely cut down on
itertools.islice usage. All within virtually the same parameter
footprint as the last function I posted. The problem is posting *this*
function would kill my earlier repeat for sure. And it already had a
problem with parameters < 0 (Hint: that last bug has now become a
feature in the unpostable repeat implementation)

Plans to conquer the world, second desk, over there.
 
S

Steven D'Aprano

Plans to conquer the world, second desk, over there.


He'll have to go to the end of the queue though.


Why is it that it's (almost) always newbies with about five minutes'
worth of experience with Python that come up with these grandiose plans
to replace entire modules with a single function?
 
C

Carl Banks

Ok, you're proposing a "bidimensional" repeat. I prefer to keep things
simple, and I'd implement it in two steps.


That's brings up a good software engineering question: What is
better, to have one function with lots of functionality, or many
functions with a single functionality?

Before anyone follows up with the obvious answer, hear me out.

Small functions that do one thing well are almost always a good thing,
but there is a significant benefit to cramming a lot of functionality
into one function: it forces you to hit all corners of a problem.
That is, at least for problems where it makes sense to hit all the
corners of the input space.

The subprocess module is the best example of this. Pretty much any
combination of arguments to subprocess.Popen makes sense. If I want
to spawn a process with pipes to standard input and standard error,
but not standard output, that allows me to specify a custom
environment, that uses no buffering, and that goes through the
shell... I can! Not so with the mish-mash of other calls.

By having a single function doing all that it enabled all those
combinations, something that wouldn't have happened with lots of small
functions. (I suddenly wonder what other languages have such easy
versatilty in spawning subprocesses.)

So... there is something to be said for pataphor's unified repeat
function.

Having said that, the nature of iterators is that they are easily
recombinable. This is very much unlike the different options with
subprocesses. It's not like you can make an extra call just to tack
on some behavior to a simple subprocess-spawning function, but with
iterators you can. In fact, iterators are (almost) orthogonally
recombinable; the whole input space of a problem can be spanned simply
by combining simple iterators.

So, I will have to ultimately agree with Gabriel: itertools are best
kept simple and complex interator behavior like you suggest is best
done by combining simple iterators.


Carl Banks
 
T

Terry Reedy

Tim said:
Well, many great innovations in history have come from people who did not
have enough experience to know that what they were doing was impossible...

Along with many of the great follies and frauds ;-)
 
S

Steven D'Aprano

Well, many great innovations in history have come from people who did
not have enough experience to know that what they were doing was
impossible...

So the old saw goes. Like most old saws, it's a load of codswallop. If
you scratch beneath the surface, you soon discover that the people who
supposedly "didn't know it was impossible" actually had a great deal of
experience of what was and wasn't possible in the subject at hand. They
might have been outsiders, but they were *knowledgeable* outsiders.

e.g. the Wright Brothers weren't lone inventors working at a time when
everyone knew powered flight was impossible, they were experienced
engineers and glider-pilots who paid a lot of attention to research done
by their many competitors.
 
B

bearophileHUGS

pataphor:
The problem is posting *this*
function would kill my earlier repeat for sure. And it already had a
problem with parameters < 0 (Hint: that last bug has now become a
feature in the unpostable repeat implementation)

Be bold, kill your precedent ideas, and post the Unpostable :)
Despite people here complaining a bit, you can't hurt posting some
lines of safe code here :)
But I agree with Steven D'Aprano, adding some explaining words to your
proposals helps.

Bye,
bearophile
 
R

Raymond Hettinger

[pataphor]
So here is my proposed suggestion for a once and for all reconciliation
of various functions in itertools that can not stand on their own and
keep a straight face.

Interesting phraseology ;-) Enticing and yet fallacious in its
presumption of known and accepted usability problems. FWIW, when I
designed the module, I started by researching constructs that had
proven success in functional languages and then adapted them to the
needs of Python applications. That being said, I'm always open to
hearing new ideas.

After reading this thread a couple times, I have a few thoughts
to offer.

1. The Pythonic Way(tm) is to avoid combining too much functionality
in a single function, preferring to split when possible. That is why
ifilter() and ifilterfalse() are separate functions.

(FWIW, the principle is considered pythonic because it was articulated
by Guido and has been widely applied throughout the language.)

There is a natural inclination to do the opposite. We factor code
to eliminate redundancy, but that is not always a good idea with
an API. The goal for code factoring is to minimize redundancy.
The goal for API design is having simple parts that are easily
learned and can be readily combined (i.e. the notion of an
iterator algebra).

It is not progress to mush the parts together in a single function
requiring multiple parameters.

2. I question the utility of some combining repeat() and cycle()
because I've not previously seen the two used together.

OTOH, there may be some utility to producing a fixed number of cycles
(see the ncycles() recipe in the docs). Though, if I thought this
need
arose very often (it has never been requested), the straight-forward
solution would be to add a "times" argument to cycle(), patterned
after repeat()'s use of a "times" argument.

3. Looking at the sample code provided in your post, I would suggest
rewriting it as a factory function using the existing tools as
components. That way, the result of the function will still run
at C speed and not be slowed by corner cases or unused parameters.
(see the ncycles recipe for an example of how to do this).

4. The suggested combined function seems to emphasize truncated
streams (i.e. a fixed number of repetitions or cycles). This is
at odds with the notion of a toolset designed to allow lazy
infinite iterators to be fed to consumer functions that truncate
on the shortest iterable. For example, the toolset supports:

izip(mydata, count(), repeat(datetime.now()))

in preference to:

izip(mydata, islice(count(), len(mydata)), repeat(datetime.now
(),times=len(mydata)))

To gain a better appreciation for this style (and for the current
design of itertools), look at the classic Hughes' paper "Why
Functional
Programming Matters".

http://www.math.chalmers.se/~rjmh/Papers/whyfp.pdf


Raymond
 
S

Steven D'Aprano

We factor code
to eliminate redundancy, but that is not always a good idea with an API.
The goal for code factoring is to minimize redundancy. The goal for API
design is having simple parts that are easily learned and can be readily
combined (i.e. the notion of an iterator algebra).

Wonderfully said! That has articulated something which I only recently
came to appreciate, but couldn't put into words.
 
J

jkn

[...]
As originally defined by Martin Fowler, re-factoring always means the
external behaviour is unchanged <URL:http://refactoring.com/>.

So, there's no such thing as a re-factoring that changes the API.
Anything that changes an external attribute of the code is a different
kind of transformation, not a re-factoring.

.... and Steven was not calling the two things by the same name. He,
and Raymond, were distinguishing between refactoring, and API design.
That was their point, I think.


Jon N
 
S

Steven D'Aprano

As originally defined by Martin Fowler, re-factoring always means the
external behaviour is unchanged <URL:http://refactoring.com/>.

So, there's no such thing as a re-factoring that changes the API.
Anything that changes an external attribute of the code is a different
kind of transformation, not a re-factoring.

Possibly a *factoring*, without the "re-", just like Raymond said.

Also, keep in mind that when creating a new API, you have no existing API
to re-factor.
 
J

jkn

Possibly a *factoring*, without the "re-", just like Raymond said.

Also, keep in mind that when creating a new API, you have no existing API
to re-factor.

Exactly.

I think this has come up before, but I can't remember the answers; any
suggestions for pointer to examples of very well-designed APIs, and
maybe some background as to how the design was achieved?

Thanks
jon N
 
A

alex23

Steven said:
e.g. the Wright Brothers weren't lone inventors working at a time when
everyone knew powered flight was impossible, they were experienced
engineers and glider-pilots who paid a lot of attention to research done
by their many competitors.

Be careful, the idea that human knowledge is a process of incremental
improvements rather than pulled complete and inviolate out of the
minds of a handful of individuals is pretty unpopular in this day and
age.

It's a lot harder to assert the existence of "intellectual property"
that way, and there's just no money in *that* ;)

cynically y'rs,
- alex23
 
B

Boris Borcic

Raymond said:
There is a natural inclination to do the opposite. We factor code
to eliminate redundancy, but that is not always a good idea with
an API. The goal for code factoring is to minimize redundancy.
The goal for API design is having simple parts that are easily
learned and can be readily combined (i.e. the notion of an
iterator algebra).

This reminds me of an early programming experience that left me with a
fascination. At a time where code had to fit in a couple dozens kilobytes, I
once had to make significant room in what was already very tight and terse code.
Code factoring *did* provide the room in the end, but the fascinating part came
before.

There was strictly no redundancy apparent at first, and finding a usable one
involved contemplating code execution paths for hours until some degree of
similarity was apparent between two code path families. And then, the
fascinating part, was to progressively mutate both *away* from minimality of
code, in order to enhance the similarity until it could be factored out.

I was impressed; in various ways. First; that the effort could be characterized
quite mechanically and in a sense stupidly as finding a shortest equivalent
program, while the subjective feeling was that the task exerted perceptive
intelligence to the utmost. Second; by the notion that a global constraint of
code minimization could map more locally to a constraint that drew code to
expand. Third; that the process resulted in bottom-up construction of what's
usually constructed top-down, mimicking the willful design of the latter case,
eg. an API extension, as we might call it nowadays.

Cheers, BB
 
S

Steven D'Aprano

Boris said:
This reminds me of an early programming experience that left me with a
fascination. At a time where code had to fit in a couple dozens kilobytes,
I once had to make significant room in what was already very tight and
terse code. Code factoring *did* provide the room in the end, but the
fascinating part came before.

There was strictly no redundancy apparent at first, and finding a usable
one involved contemplating code execution paths for hours until some
degree of similarity was apparent between two code path families. And
then, the fascinating part, was to progressively mutate both *away* from
minimality of code, in order to enhance the similarity until it could be
factored out.

I was impressed; in various ways. First; that the effort could be
characterized quite mechanically and in a sense stupidly as finding a
shortest equivalent program, while the subjective feeling was that the
task exerted perceptive intelligence to the utmost. Second; by the notion
that a global constraint of code minimization could map more locally to a
constraint that drew code to expand. Third; that the process resulted in
bottom-up construction of what's usually constructed top-down, mimicking
the willful design of the latter case, eg. an API extension, as we might
call it nowadays.

This is much the same that happens in maximisation problems: the value gets
trapped in a local maximum, and the only way to reach a greater global
maximum is to go downhill for a while.

I believe that hill-climbing algorithms allow some downhill movement for
just that reason. Genetic algorithms allow "mutations" -- and of course
real evolution of actual genes also have mutation.
 
K

K4NTICO

This is much the same that happens in maximisation problems: the value gets
trapped in a local maximum, and the only way to reach a greater global
maximum is to go downhill for a while.

I believe that hill-climbing algorithms allow some downhill movement for
just that reason. Genetic algorithms allow "mutations" -- and of course
real evolution of actual genes also have mutation.

Indeed exactly. But it wasn't quite the same to think through it first
hand, as I said that the subjective feeling was "perceptive
intelligence got exercized to the utmost".

To illustrate, it is very much the memory of that experience - the
perceptive training - that made me notice, for another high point,
what I think to be a common factor worthy of capture between the
sociology of Archimedes' Eureka and that of Einstein's E=mc^2, let's
just describe Archimedes' case in the right manner: There is a trading
port city, and thus citizens who are experts in both floating ships on
the seas and of weighting goods on the scale in the markets. And then
comes Archimedes who says : "hey, experts, I show you that these two
aeras of your expertise that of course you think have nothing to do
with each other except for what you know so well and clear - they are
in fact two faces of a single coin that you ignore."

And thus a codeful question : "What does F(Syracuse) hear if F(Eureka)
is the = in E=mc^2 ?"

And a more serious one : what happens to the potential for similar
discoveries, when society specializes expertise to the point that
there isn't any more any community of "simultaneous experts" ?

Cheers, BB
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top