reseting an iterator

J

Jan

Wouldn't it be easy for Python to implement generating functions so
that the iterators they return are equipped with a __reset__() method?

Here is the context of this question.

Python documentation defines a "iterator" as an object ITERATOR
having methods __next__() and __iter__() such that the call
ITERATOR.__iter__() returns the object itself, and once a call
ITERATOR. __next__() raises StopIteration every such subsequent call
does the same.

Python iterators model "generating Turing machines", i.e.
deterministic Turing machines which take no input, which have a
separation symbol # in the output alphabet; have an one-way-infinite
output tape on which the output head prints and moves only to the
right; The machine may have a halting state; a word is said to be
"generated by M" if it appears on the output tape delimited by
separation symbols # (independently of whether M halts or computes
infinitely). Generating Turing machines provide a characterization of
recursively enumerable languages: a language is generated by a
generating Turing machine iff it is accepted by a Turing machine.

Turing machines can take as input and run other Turing Machines --
similarly Python functions can take other functions (including
iterators) as parameters and call them. HOWEVER, this symmetry breaks
down: a Turing machine which takes a generating Turing machine M as
input can run M for a number of steps and then RESET M and run it
again, while iterators as currently defined in Python do not have a
reset method. (I realize that instead of reseting an old iterator, one
can make a new iterator but it is not as elegant.)

(Contrary to Python's philosophy that there should be one-- and
preferably only one --obvious way to do it) there are several ways to
define iterators:
 
J

Jan

OOPS, I have pressed some keys and the message went out before It was
finished.
Here is the last fragment:

So, one can define iterators by defining a class whose objects have
methods
__iter__ and __next__ -- with this approach it is easy to add some
__reset__
method. But the easiest way to define iterators is by constructing a
generating
function (with yield expressions); this function returns iterators,
although
without __reset__. Another way to define an iterator is to define
callable
and using iter(CALLABLE, SENTINEL) -- the resulting iterator will not
have __reset__.
I do not know how Python is implemented but I believe that
in the last two cases, Python could produce improved iterators with
__reset__
at almost no additional cost.

Jan
 
J

Jan

Iterators can also be produced by iter(ITERABLE) which
could mnufacture them with a __reset__.

Jan
 
T

Terry Reedy

Jan said:
Wouldn't it be easy for Python to implement generating functions so
that the iterators they return are equipped with a __reset__() method?

No. Such a method would have to poke around in the internals of the
__next__ function in implementation specific ways. The values used to
initialize that function might have changed, so 'reset' would have to be
carefully defined.

def squares():
start = int(input("enter starting int:"))
stop = int(input("enter stopping int"))
for i in range(start,stop):
yield i*i

What does 'reset' mean here?
Here is the context of this question.

Python documentation defines a "iterator" as an object ITERATOR
having methods __next__() and __iter__() such that the call
ITERATOR.__iter__() returns the object itself,

This is so that 'iter(iterator) is iterator', so that functions can take
either an interable or iterator as an argument and proceed without
checking which it got.
and once a call ITERATOR. __next__() raises StopIteration every
> such subsequent call does the same.

After returning objects for some number of calls, which might be unbounded.

The protocol is basically one method with defined behavior. It is
intentionally minimal so it can be used as the universal within-Python
object stream protocol. Multiple ways of getting interators is in line
with this purpose.

Terry Jan Reedy
 
S

Steven D'Aprano

Wouldn't it be easy for Python to implement generating functions so that
the iterators they return are equipped with a __reset__() method?

No.

def gen():
for name in os.listdir('.'):
yield open(name).read()
os.remove(name)


How would you "easily" reset this generator so that it returns the same
values each time?

That's an extreme example, but as a general rule, generators/iterators
are one-shot: having consumed a value, you can't get it back again
without re-creating it from scratch, or possibly not even then. Here's a
less destructive example:

def gen():
for i in xrange(10):
yield time.time()
 
N

norseman

Terry said:
No. Such a method would have to poke around in the internals of the
__next__ function in implementation specific ways. The values used to
initialize that function might have changed, so 'reset' would have to be
carefully defined.

def squares():
start = int(input("enter starting int:"))
stop = int(input("enter stopping int"))
for i in range(start,stop):
yield i*i

What does 'reset' mean here?

I don't understand. Why would one use a reset here in the first place?
One simply re-runs this. The output could be collected into a list,
which one might want to reset, re-list.
bad example?

This is so that 'iter(iterator) is iterator',

You would do well to clarify the previous line.
To me it is the same as a face is a face
function(x) is x six of one, half dozen of the other
gobble-d-goop, double talk so what?
Not trying to pick a fight - just pointing out the need for some
clarity. Had a math teacher that proclaimed any function(x) that only
returned x was (hummm can't print that here). Something like useless.


so that functions can take
either an interable or iterator as an argument and proceed without
checking which it got.
OK.

if it can't be restarted (serial line input) then don't, else do.
serial line, ethernet, etc are basically pipes. But a list itself can be
re-wound, be it a file on disk, something in memory, on tape -- whatever

may have to reach back past the 'pipe' to the 'iterator' there to
restart but being able to restart (elegantly) restartables is the point.
After returning objects for some number of calls, which might be unbounded.

The protocol is basically one method with defined behavior. It is
intentionally minimal so it can be used as the universal within-Python
object stream protocol. Multiple ways of getting interators is in line
with this purpose.

I think clarity is also needed here. Different disciplines will
interpret the above differently. Stream means river, channel, pipe,
singular direction to me. Once the water flows past there is no hope of
getting those molecules back in same order. A list of things in a
container (variable or file) is not a stream and can be rewound. The
whole concept of random access hinges on this.
To me this boils down to two distinct concepts.
one being stream as in here it comes there it goes, never to return.
The stream is not rewindable. The container you put it in might be.
one being sequential reading of finite number of randomly accessible
things. This being inherently rewindable.
Testing which is simple enough and can set the ability to rewind.
Having it 'built-in' will reduce the problems generated by another
'do-it-yourself' design by person that may or may not have thought
things out. The old - "I took the carburettor off the Olds but it
doesn't work on my Hugo. Can you help me?" would be avoided.
Really - rewind is better if it is builtin and preforms where it should.
The docs should explain what will or will not happen and why.
Preferably in plain language. :)



In short: are you telling us the reset() can't do in background the
exact same thing that the docs tell the users to do? It's a lot simpler
to move file descriptors and pointers from the inside.

I often get so close to things I forget to look up too.
 
T

Terry Reedy

I will clarify by starting over with current definitions.

Ob is an iterator iff next(ob) either returns an object or raises
StopIteration and continues to raise StopIteration on subsequent calls.

Ob is an iterable iff iter(ob) raturns an iterator.

It is intentional that the protocol definitions be minimal, so that they
can used as widely as possible.

As a convenience, the definition of iterators is given a slight
complication. They are defined as a subcategory of iterables, with the
requirement that iter(iterator) be that same iterator. This means
that iterators need the following boilerplate:
def __iter__(self): return self
The extra burden is slight since most iterators are based on builtins or
generator functions or expressions, which add the boilerplate
automatically. The convenience is that one may write

def f(iterable_or_iterator):
it = iter(iterable_or_iterator)
...

instead of

def f(iterable_or_iterator):
if is_iterable(iterable_or_iterator):
it = iter(iterable_or_iterator)
else:
it = iterable_or_iterator

In particular, the internal function that implements for loops can do
the former.

In other words, a small bit of boilerplate added to iterators, mostly
automatically, saves boilerplate in the use of iterators and iterables.

When the protocols were defined, there was discussion about whether or
not to require 'continue to raise StopIteration'. For instance, an
iterator that returns objects derived from external input might not have
any new external input now but expect to get some in the future. It was
decided the such iterators should either wait and block the thread or
return a 'Not now' indicator such as None. StopIteration should
consistently mean 'Done, over and out' so for loops, for instance, would
know to exit.

The OP proposes that StopIteraton should instead mean 'Done until
reset', without defining 'reset'. Some comments:
* This would complicate the protocol.
* There are real use cases, and reiterability is a real issue. But ...
* Depending on the meaning, resetting may or may not be possible.
* When it is possible, it can potentially be done today with a .send()
method.
* Many use cases are easier with a new iterator. For instance

for i in iterable: block1()
for i in iterable: block2()

is easier to write than

it = iter(iterable)
for i in it: block1()
it.reset()
for i in it: block2()

with little relative time saving in the second case, for practical
problems, to compensate for the extra boilerplate.

Terry Jan Reedy
 
N

norseman

Terry said:
I will clarify by starting over with current definitions.

Ob is an iterator iff next(ob) either returns an object or raises
StopIteration and continues to raise StopIteration on subsequent calls.

Ob is an iterable iff iter(ob) raturns an iterator.

It is intentional that the protocol definitions be minimal, so that they
can used as widely as possible.

As a convenience, the definition of iterators is given a slight
complication. They are defined as a subcategory of iterables, with the
requirement that iter(iterator) be that same iterator. This means that
iterators need the following boilerplate:
def __iter__(self): return self
The extra burden is slight since most iterators are based on builtins or
generator functions or expressions, which add the boilerplate
automatically. The convenience is that one may write

def f(iterable_or_iterator):
it = iter(iterable_or_iterator)
...

instead of

def f(iterable_or_iterator):
if is_iterable(iterable_or_iterator):
it = iter(iterable_or_iterator)
else:
it = iterable_or_iterator

In particular, the internal function that implements for loops can do
the former.

In other words, a small bit of boilerplate added to iterators, mostly
automatically, saves boilerplate in the use of iterators and iterables.

When the protocols were defined, there was discussion about whether or
not to require 'continue to raise StopIteration'. For instance, an
iterator that returns objects derived from external input might not have
any new external input now but expect to get some in the future. It was
decided the such iterators should either wait and block the thread or
return a 'Not now' indicator such as None. StopIteration should
consistently mean 'Done, over and out' so for loops, for instance, would
know to exit.

The OP proposes that StopIteraton should instead mean 'Done until

Done unless you put the data pointer back to offset zero
reset', without defining 'reset'. Some comments:
* This would complicate the protocol.
* There are real use cases, and reiterability is a real issue. But ...
* Depending on the meaning, resetting may or may not be possible.
* When it is possible, it can potentially be done today with a .send()
method.
* Many use cases are easier with a new iterator. For instance

for i in iterable: block1()
for i in iterable: block2()

is easier to write than

it = iter(iterable)
for i in it: block1()
it.reset()
for i in it: block2()

with little relative time saving in the second case, for practical
problems, to compensate for the extra boilerplate.


while testing:
for i in it:
code
it.reset()
 
J

J. Cliff Dyer

Wouldn't it be easy for Python to implement generating functions so
that the iterators they return are equipped with a __reset__() method?

Here is the context of this question.

Python documentation defines a "iterator" as an object ITERATOR
having methods __next__() and __iter__() such that the call
ITERATOR.__iter__() returns the object itself, and once a call
ITERATOR. __next__() raises StopIteration every such subsequent call
does the same.

You don't need a reset method. There is no hard and fast rule that
__iter__ must return the object itself. It just needs to return an
iterator. For example:
l = [1,2,3]
l.__iter__()
l is l.__iter__()
False

Just create a class with an __iter__ method that returns a reset
iterator object.


class X(object):
def __init__(self, max=3):
self.counter = 0
self.max = max
def __iter__(self):
return self
def next(self):
if self.counter < self.max:
self.counter += 1
return self.counter
else:
raise StopIteration

class Y(object):
def __iter__(self):
return X()

In this setup, X has the problem you are trying to avoid, but Y behaves
as a resettable iterable.
.... print c
....
1
2
3.... print c
.... .... print c
....
1
2
3.... if c < 3:
.... print c
....
1
2.... print c
....
1
2
3

Cheers,
Cliff
 
J

Jan

You don't need a reset method.  There is no hard and fast rule that
__iter__ must return the object itself.  It just needs to return an
iterator.  

I disagree.
If ITRATOR is a true iterator, ITRATOR.__iter__() must return
ITERATOR.
If ITERABLE is an iterable (but not necessarily an iterator)
ITRABLE.__iter__() must return an iterator.
For example:
l = [1,2,3]
l.__iter__()

<listiterator object at 0x7fd0da315850>>>> l is l.__iter__()

False

[1,2,3] is an iterable but not an iterator, so this False result is
expected.
Compare this with the following.
ii = iter([1,2,3]) # ii is an iterator.
next(ii) 1
jj = ii.__iter__() # call __iter__ method on an iterator
ii is jj True
next(jj)
2

Just create a class with an __iter__ method that returns a reset
iterator object.

class X(object):
    def __init__(self, max=3):
        self.counter = 0
        self.max = max
    def __iter__(self):
        return self
    def next(self):
        if self.counter < self.max:
            self.counter += 1
            return self.counter
        else:
            raise StopIteration

class Y(object):
    def __iter__(self):
        return X()

In this setup, X has the problem you are trying to avoid, but Y behaves
as a resettable iterable.

This does not work.

With this, y is not an interator, and not even an iterable.
for c in y:

This produces an error because by definition of for-loops
it is executed the same way as:

temp_iterator = iter(y) # temp_iterator is y
while True:
try:
print(next(temp_iterator)) # temp_iterator does not support
__next__()
except StopIteration:
break

Jan
 
J

J. Cliff Dyer

I disagree.
If ITRATOR is a true iterator, ITRATOR.__iter__() must return
ITERATOR.
If ITERABLE is an iterable (but not necessarily an iterator)
ITRABLE.__iter__() must return an iterator.

You are correct: It is an iterable, not an iterator. However, that's
not a disagreement with me. It may not be an iterator (and I probably
should have said so) but it works, and it solves the OP's problem.
For example:
l = [1,2,3]
l.__iter__()

<listiterator object at 0x7fd0da315850>>>> l is l.__iter__()

False

[1,2,3] is an iterable but not an iterator, so this False result is
expected.
Compare this with the following.
ii = iter([1,2,3]) # ii is an iterator.
next(ii) 1
jj = ii.__iter__() # call __iter__ method on an iterator
ii is jj True
next(jj)
2

Just create a class with an __iter__ method that returns a reset
iterator object.

class X(object):
def __init__(self, max=3):
self.counter = 0
self.max = max
def __iter__(self):
return self
def next(self):
if self.counter < self.max:
self.counter += 1
return self.counter
else:
raise StopIteration

class Y(object):
def __iter__(self):
return X()

In this setup, X has the problem you are trying to avoid, but Y behaves
as a resettable iterable.

This does not work.

With this, y is not an interator, and not even an iterable.
for c in y:

This produces an error because by definition of for-loops
it is executed the same way as:

temp_iterator = iter(y) # temp_iterator is y
while True:
try:
print(next(temp_iterator)) # temp_iterator does not support
__next__()
except StopIteration:
break

Did you try running my code? I did. It works on my computer. What
error message did you get?

Cheers,
Cliff
 
J

J. Clifford Dyer

This produces an error because by definition of for-loops
it is executed the same way as:

temp_iterator = iter(y) # temp_iterator is y
while True:
try:
print(next(temp_iterator)) # temp_iterator does not support
__next__()
except StopIteration:
break

I think this is where you missed my point.

iter(y) actually returns an instance of class X, which does support
iteration. And it returns a new X each time, thus resetting the
iterator.

That exact setup might or might not support your use case. I don't
know, because you haven't described it. However, whatever you need done
to X to get it back in shape to reiterate over can be done in
Y.__iter__().

Honestly, do you care if it's an iterator or an iterable, so long as
python can handle the job?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top