Pre-PEP: reverse iteration methods

C

Christoph Becker-Freyseng

I like the idea of having some iterator.backwards, backwards(iterator) etc.

However especially reading Andrew Dalke's postings and replies (and
having thought about using internally len() and so on --- but then
thinking of iterator maybe having *no* __len__)
I now think that reverse iteration is not well defined when __len__
can't be defined (and sometimes even if it is). This is not the case for
list-like objects as they
have a predictable fixed length (even if the __len__-method isn't
actually implemented).
Let me give an example:

class NaturalNumbers:
def __init__(self):
self.n= 0
def __iter__(self):
return self
def next(self):
self.n+=1
return self.n


This is surely ordered but defining the reverse-order is difficult.
Maybe it should always return infty then.

Or think about a "Tape-Iterator". An object that reads data from e.g. a
file forwards/next and backwards/iter_backwards.
What would you expect to get if tape is at position x and you say
tape.backwards()?
Should it be the last byte of the file or maybe that on position x-1 ?
This shows that we can get into trouble when we have already iterated
"in the right direction" and then start to do "backwards".

While reverse-iteration is a sound idea we need a clear definition of
what is meant!


I thought of the following (I give multiple possibilies named a,b,c, ...
finally only one may be used):
1. If iterator has fixed/determined length (so it could be converted
into a regular list):
a) iterator.iter_backwards() creates an independent reverse-iterator:
1.1)
l2= list(iterator)
l1= list(iterator.iter_backwards())
l2.reverse()
l1 and l2 have to be equal
1.2)
l1= list(iterator.iter_backwards())
l2= list(iterator)
l2.reverse()
l1 and l2 have to be equal
(next cases as examples for simplicity)
1.3)
it= range(4)
it.next()
l1= list(it.iter_backwards())
l2= list(it)
-> l1 == [3,2,1]
l2 == [1,2,3]
1.4)
it= range(4)
itBack= it.iter_backwards()
l1= [itBack.next(), itBack.next(), itBack.next()]
l2= [it.next(), it.next(), it.next()]
-> l1= [3,2,1]
l2= [0,1,2]
b) iterator.iter_backwards() creates a dependent reverse-iterator:
1.1) 1.2) like in a)
(next cases as examples for simplicity)
1.3)
it= range(4)
it.next()
l1= list(it.iter_backwards())
l2= list(it)
-> l1 == [0,]
l2 == [0,1,2,3]
1.4)
it= range(4)
itBack= it.iter_backwards()
l1= [itBack.next(), itBack.next(), itBack.next()]
l2= [it.next(), it.next(), it.next()]
-> l1= [3,2,1]
l2= [1,2,3]
2. Infinite/non-determined iterators:
They should implement reverse-iteration in a way that
iterator.next()
iterator.iter_backwards().next()
is a nop
and
iterator.iter_backwards().next()
iterator.next()
is a nop, too.
Or if there are more backward than forward-steps should raise some
StopBackwardIterator



I'm sure there are a lot more and probably better definitions, so I hope
we will find a useful one.


Christoph Becker-Freyseng
 
C

Christos TZOTZIOY Georgiou

Abstract
========

This proposal is to extend the API of several sequence types
to include methods for iterating over the sequence in reverse.

-0

I believe this should be a function in itertools.

For indexable sequences, a little modification of itertools.islice would
do the trick, since it can't at the moment; it's designed for forward
iteration.

Helpful error messages:ValueError: Step must be one or larger for islice().

For non-indexable sequences (eg a file object), things get hairier, but
given enough memory, it's not hard.
 
B

Bob Gailer

Here is a discussion draft of a potential PEP.
The ideas grew out of the discussion on pep-284.
Comments are invited. Dart throwing is optional.

The discussion so far has focused on how to modify the operand of for x in.

There is another viewpoint: consider that we are modifying the behavior of
for x in

In APL we often modify the behavior of a primitive operator by adding
something to the operator itself. For example
+/A means apply + along the last last dimension of the array A. (e.g. sum
the elements in each row of the matrix A)
To sum along the columns change the expression to =+/[1]A where the index
modifies the / operator.

Could this open the door to some fundamentally new Python syntax that
would support the modification of keyword behavior? e.g.:

for x in[-1] list:
for[-1] x in list:
for x in[-1] list:for x in[-1] list:
for.reverse x in list:
for x out list:
rof x in list:

Let's brainstorm?

Bob Gailer
(e-mail address removed)
303 442 2625
 
I

Irmen de Jong

Raymond said:
Add a method called iter_backwards() to sequence objects that can benefit
from it. The above examples then simplify to::

I'm not too fond of the name "iter_backwards". I think it is too long
(with respect to the already existing "iter" built-in function).

It's just my feeling towards the name, but to me "iter_backwards" sounds
more like an action that is performed on the object it is called on
(a real method so to speak, not a factory method):
"seq.iter_backwards()" reads to me at first glance: 'iterate over the
seq in backward direction'. Not: 'return a reverse iterator object'.
('iter' reads like the verb/action 'iterate' instead of a noun).

Instead, "backwards_iter", "back_iter" or preferrably just "riter"
reads more natural to me:
"seq.back_iter()" reads to me at first glance: 'get a backwards iteration
object for seq'. ('iter' reads like the noun 'iterator', not a verb).


No language syntax changes are needed.

Making it a method, indeed. But then there is this difference
between *forward* iteration and *backward* iteration, right?

iter(seq) --> get a (forward) iterator for seq
seq.riter() --> get a (backward) iterator for seq

How is this easily explained to new people? Why not keep it
orthogonal:

iter(seq) and riter(seq)
~or~
seq.iter() and seq.riter()

For what it's worth (probably nothing, but hey), in C++ STL
we have seq.begin() that returns a forward iterator, and
seq.rbegin() that returns a backward iterator.

* Should tuples be included?
Don't really care. Are there any reasons why they shouldn't?
* Should file objects be included?
Don't care. I've never had to read a file in reverse.
* Should enumerate() be included?
I think it should.


--Irmen de Jong
 
R

Raymond Hettinger

[Raymond]
[Christos TZOTZIOY Georgiou]
-0

I believe this should be a function in itertools.

For indexable sequences, a little modification of itertools.islice would
do the trick, since it can't at the moment; it's designed for forward
iteration.

Helpful error messages:
ValueError: Step must be one or larger for islice().

For non-indexable sequences (eg a file object), things get hairier, but
given enough memory, it's not hard.

Here's a quick and dirty implementation of your idea (kept separate
from islice for clarity). On the plus side, it is somewhat clean and
easy to use. On the minus side:
* It's slower than custom iterator methods
* It behaves badly with infinite iterators as inputs
* For non-indexables, it silently transitions into a
non-lazy, memory intensive mode with a long
delay on the first call. In my opinion, this defeats
the purpose of using iterators in the first place.

def ireverse(obj):
assert not hasattr(obj, 'keys')
try:
for i in xrange(len(obj)-1, -1, -1):
yield obj
except (AttributeError, TypeError):
x = list(obj) # XXX fails with infinite iterators
x.reverse()
for elem in x:
yield elem

for i in ireverse(xrange(5)):
print i

for c in ireverse('abcde'):
print c

for elem in ireverse(['atom', 'beta', 'curry']):
print elem

for line in ireverse(file('/autoexec.bat')):
print line.rstrip()

for x in reverse(itertools.count()):
# Never gets here and Python dies with a MemoryError
# from the infinite loop
pass


Raymond Hettinger
 
P

Peter Otten

Raymond said:
* It behaves badly with infinite iterators as inputs
[...]

def ireverse(obj):
assert not hasattr(obj, 'keys')

# let inifinite iterators identify themselves
assert not hasattr(obj, 'infinite')
try:
for i in xrange(len(obj)-1, -1, -1):
yield obj
except (AttributeError, TypeError):
x = list(obj) # XXX fails with infinite iterators
x.reverse()
for elem in x:
yield elem


As infinite iterators are rare, it would not be too cumbersome to let them
identify themselves as such.

Peter
 
A

Andrew Dalke

Stephen Horne:
To me, the reason to use a method (or property) is simply that most
types cannot be efficiently 'backwardised'.

Which is why my function looks for an '__riter__' under
the covers, so that an object can provide a specialized
implementation, if possible.

If it can't, I then proposed an implementation which will
iterate in reverse through lists and list-like interfaces (things
indexed by 0, 1, ... len(obj)-1). There is a problem though
in that my attempt at a 'generic' reverse implementation
will give strange errors in dictionary-like interfaces.
For instance, iterators in
general would have to be buffered into a list an then the resulting
list iterated backwards. That could be useful, but the overhead is
sufficient that I think explicitly writing 'list (x).backward ()'
rather than 'x.backward ()' would be a good thing.

Those objects which implement a 'reverse_iterate' /
'backward' / whatever are identical to those which would
implement a __riter__ accessed under the covers.

My question is, what to do about objects which *don't*
provide a special method. Should there still be a standard
way to do a reverse iteration through them?

The only two I can think of are:
- assume
for i in range(len(obj)-1, -1, -1): yield obj
works as a backup. It doesn't quite work because of
dictionaries.

- assume that storing all elements in a temporary
list and doing a reverse on the temp list is okay.
As you point out, it is the most generic but doesn't
work if memory is a problem. (There's also a
concern about proper destruction order if you want
to delete elements from a list starting from the end.)

- (There's a third, which is O(n**2). At every request,
iterate through the list to find the last element.)

If there is an acceptable solution (which needn't be
perfect, just practicable) then it's a good reason to
have it be a builtin over a special method.

I still point out that there are many other ways of
doing iteration and I don't see what makes reverse
iteration so important to get its own method.

Andrew
(e-mail address removed)
 
A

Andrew Dalke

Me:
The only two I can think of are:
for i in range(len(obj)-1, -1, -1): yield obj


Peter Otten proposed a nice variation; assume negative
indicies work. Then there's no need to assume len
works.

Andrew
(e-mail address removed)
 
C

Christos TZOTZIOY Georgiou

# let inifinite iterators identify themselves
assert not hasattr(obj, 'infinite')

As infinite iterators are rare, it would not be too cumbersome to let them
identify themselves as such.

This is an interesting idea.
 
D

David Eppstein

# let inifinite iterators identify themselves
It would be quite cumbersome to have to identify simple generators as
infinite or finite manually, and impossible (halting problem) to do so
reliably automatically.
 
L

Lulu of the Lotus-Eaters

|> >As infinite iterators are rare, it would not be too cumbersome to
|> >let them identify themselves as such.

|It would be quite cumbersome to have to identify simple generators as
|infinite or finite manually, and impossible (halting problem) to do so
|reliably automatically.

Humans cannot solve the halting problem either, FWIW. Just because a
programmer THINKS an iterator is finite, it don't mean it is!

Actually, to my mind, infinite iterators are the RULE not the exception.
Or if not actually infinite, iterators tend to be *indefinitely* large.
That is, I use an iterator because I want to deal with something at
runtime whose size is unbounded at programming time (e.g. an external
file, or data from a socket).

Overall, this makes me -1 on the whole idea of reverse iteration.

Moreover, even iterators that are known to be finite and even of
reasonable size will often be SLOW. For example, I might naturally
write a simple generator to get data from a socket. Each time a block
of data comes in, I take some action based on it. I want those actions
to happen as soon as the relevant chunk of data is available, even
though the socket is probably much slower than the CPU. If my GUI needs
to wait for ALL the data in order to play the event in reverse, that
might cause very uncomfortable pauses in the interface. On the other
hand, if the widget updates every second (even for a couple minutes
total), that seems reasonable enough.

Yours, Lulu...
 
D

David Eppstein

Lulu of the Lotus-Eaters said:
|It would be quite cumbersome to have to identify simple generators as
|infinite or finite manually, and impossible (halting problem) to do so
|reliably automatically.

Humans cannot solve the halting problem either, FWIW. Just because a
programmer THINKS an iterator is finite, it don't mean it is!

Well, true, but if you are a programmer and don't know whether your
program is always going to halt, you probably shouldn't have written it
that way.
 
L

Lulu of the Lotus-Eaters

|> Humans cannot solve the halting problem either, FWIW. Just because a
|> programmer THINKS an iterator is finite, it don't mean it is!

|Well, true, but if you are a programmer and don't know whether your
|program is always going to halt, you probably shouldn't have written it
|that way.

Sure, I generally agree. But creating a special __finite__ attribute
(or whatever) introduces a new source of potentially rather bad errors.
Doing a reverse iteration on an iterator that turns out to be infinite
(or even just VERY big) will presumably either freeze the system or
cause a memory overrun.

Moreover, I can see a class of cases where it's not obvious about
halting. A generator that depends on some runtime data might halt for
all the "normal" cases, but wind up running infinitely long for special
(untested) cases. Some attractors, for example, with the right
parameters (or wrong, depending on how you think of it) are strange, if
you know what I mean :).

Yours, Lulu...

--
mertz@ | The specter of free information is haunting the `Net! All the
gnosis | powers of IP- and crypto-tyranny have entered into an unholy
..cx | alliance...ideas have nothing to lose but their chains. Unite
| against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------
 
A

Andrew Dalke

Stephen Horne:
I disagree with this. IMO something that modifies a value in place
should be named using a verb (such as reverse, or sort, or update...).

You are correct. And I can't even blame it on lack of sleep. ;)

Andrew
(e-mail address removed)
 
S

Stephen Horne

Second, I'm not quite ready to drop my property idea ;-)

An advantage is that the object returned by the property can sensibly
support more than just iteration - e.g. slicing (as touched on in the
PEP284 thread). So for instance...
x = range(10)
list(x.backward)
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

I'm not clear on the details of what you're proposing. What is
type(x.backward)?

Some made-up proxy class that understands that certain operations will
work as if the list was reversed.

For integer lists, I'd use much the same thing but on the
set-of-all-integers object I proposed in the earlier thread. So I
could write...

for i in integers.backwards [:10] :
print i

....to do the same as your...

for i in xrange (10).iter_backwards () :
print i

....though I'm also having this idea about using the 'int' type object
to implement these again. It makes a certain sense for type names to
also act as abstract sets of possible values. For typechecking
purposes, for instance, it might be nice to be able to write...

if x in float :
do stuff with float value

Of course that's a long way from compelling given that we already have
the isinstance function, but what is attracting me is that these two
ideas...

1. Provide objects (or extend the type name objects) to provide
the abstraction of a set of values of that type, implementing
whatever operations are practical and useful for that
abstraction.

2. Provide properties of objects (including that abstract set of
integers) which act as proxies for the object, but providing
an modified view of that object (including a backward view of a
sequence object, but possibly other things too).

....seem to provide, either alone or in combination, natural solutions
to several problems. For instance, you don't just get the ability to
loop through a range of integers backwards...

for i in integers.ibackwards [:50] :

But equally the ability to iterate through a sequence backwards...

for i in x.ibackwards [:50] :

And basically you are doing the same thing in each case - its just
that the first example slices an abstract sequence as opposed to one
that is physically held in memory.
Please show what a pure python version would look like
(taking UserList as an example).

I didn't use UserList, but I did post example code in the "Comment on
PEP-0322: Reverse Iteration Methods" thread. I wasn't so keen in that
post, but I seem to like the idea again now.
The point is that enumerate will starting numbering from zero
for the last item:

Yes - I wasn't thinking straight, sorry about that. Just to put a
thought in based on those two ideas above, though, how about...
x=["a", "b", "c"]
print x.enumerated [(0, "a"), (1, "b"), (2, "c")]
print x.backwards.enumerated [(0, "c"), (1, "b"), (2, "c")]
print x.enumerated.backwards
[(2, "c"), (1, "b"), (0, "c")]

'enumerated' need not create the list or iterator immediately - again,
it returns a proxy which implements the modified behaviour. Having a
proxy to a proxy gives IMO quite a neat solution.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top