writable iterators?

N

Neal Becker

AFAICT, the python iterator concept only supports readable iterators, not write.
Is this true?

for example:

for e in sequence:
do something that reads e
e = blah # will do nothing

I believe this is not a limitation on the for loop, but a limitation on the
python iterator concept. Is this correct?
 
S

Steven D'Aprano

AFAICT, the python iterator concept only supports readable iterators,
not write. Is this true?

for example:

for e in sequence:
do something that reads e
e = blah # will do nothing

I believe this is not a limitation on the for loop, but a limitation on
the python iterator concept. Is this correct?

Have you tried it? "e = blah" certainly does not "do nothing", regardless
of whether you are in a for loop or not. It binds the name e to the value
blah.
.... print(e)
.... e = 42
.... print(e)
....
1
42
2
42


I *guess* that what you mean by "writable iterators" is that rebinding e
should change seq in place, i.e. you would expect that seq should now
equal [42, 42]. Is that what you mean? It's not clear.

Fortunately, that's not how it works, and far from being a "limitation",
it would be *disastrous* if iterables worked that way. I can't imagine
how many bugs would occur from people reassigning to the loop variable,
forgetting that it had a side-effect of also reassigning to the iterable.
Fortunately, Python is not that badly designed.

If you want to change the source iterable, you have to explicitly do so.
Whether you can or not depends on the source:

* iterators are lazy sequences, and cannot be changed because there's
nothing to change (they don't store their values anywhere, but calculate
them one by one on demand and then immediately forget that value);

* immutable sequences, like tuples, are immutable and cannot be changed
because that's what immutable means;

* mutable sequences like lists can be changed. The standard idiom for
that is to use enumerate:

for i, e in enumerate(seq):
seq = e + 42
 
M

Mel

Steven said:
AFAICT, the python iterator concept only supports readable iterators,
not write. Is this true?

for example:

for e in sequence:
do something that reads e
e = blah # will do nothing

I believe this is not a limitation on the for loop, but a limitation on
the python iterator concept. Is this correct?

Have you tried it? "e = blah" certainly does not "do nothing", regardless
of whether you are in a for loop or not. It binds the name e to the value
blah.
seq = [1, 2]
for e in seq:
... print(e)
... e = 42
... print(e)
...
1
42
2
42


I *guess* that what you mean by "writable iterators" is that rebinding e
should change seq in place, i.e. you would expect that seq should now
equal [42, 42]. Is that what you mean? It's not clear.

Fortunately, that's not how it works, and far from being a "limitation",
it would be *disastrous* if iterables worked that way. I can't imagine
how many bugs would occur from people reassigning to the loop variable,
forgetting that it had a side-effect of also reassigning to the iterable.
Fortunately, Python is not that badly designed.

And for an iterator like

def things():
yield 1
yield 11
yield 4
yield 9

I don't know what it could even mean.

Mel.
 
N

Neal Becker

Steven said:
Have you tried it? "e = blah" certainly does not "do nothing", regardless
of whether you are in a for loop or not. It binds the name e to the value
blah.

Yes, I understand that e = blah just rebinds e. I did not mean this as an
example of working code. I meant to say, does Python have any idiom that allows
iteration over a sequence such that the elements can be assigned?

....
* iterators are lazy sequences, and cannot be changed because there's
nothing to change (they don't store their values anywhere, but calculate
them one by one on demand and then immediately forget that value);

* immutable sequences, like tuples, are immutable and cannot be changed
because that's what immutable means;

* mutable sequences like lists can be changed. The standard idiom for
that is to use enumerate:

for i, e in enumerate(seq):
seq = e + 42

AFAIK, the above is the only python idiom that allows iteration over a sequence
such that you can write to the sequence. And THAT is the problem. In many
cases, indexing is much less efficient than iteration.
 
T

Thomas 'PointedEars' Lahn

Mel said:
Steven said:
I *guess* that what you mean by "writable iterators" is that rebinding e
should change seq in place, i.e. you would expect that seq should now
equal [42, 42]. Is that what you mean? It's not clear.

Fortunately, that's not how it works, and far from being a "limitation",
it would be *disastrous* if iterables worked that way. I can't imagine
how many bugs would occur from people reassigning to the loop variable,
forgetting that it had a side-effect of also reassigning to the iterable.
Fortunately, Python is not that badly designed.

And for an iterator like

def things():
yield 1
yield 11
yield 4
yield 9

I don't know what it could even mean.

<http://docs.python.org/reference/simple_stmts.html#the-yield-statement>

You could have tried to debug.

Please trim your quotes to the relevant minimum.
 
M

MRAB

Yes, I understand that e = blah just rebinds e. I did not mean this as an
example of working code. I meant to say, does Python have any idiom that allows
iteration over a sequence such that the elements can be assigned?
[snip]
Python has references to objects, but not references to references.
 
S

Steven D'Aprano

Yes, I understand that e = blah just rebinds e. I did not mean this as an
example of working code. I meant to say, does Python have any idiom that
allows iteration over a sequence such that the elements can be assigned?

Yes. I already gave one:

for i, e in enumerate(seq):
seq = e + 42


If you look at code written before the enumerate built-in, you will often
find code like this:

for i in range(len(seq)):
e = seq
seq = e + 42


Sometimes you'll find code that does this:

i = 0
while i < len(seq):
e = seq
seq = e + 42
i += 1

but don't do that, it's slow.

Or you can do this:

seq[:] = [e+42 for e in seq]

There are others.

[...]
AFAIK, the above is the only python idiom that allows iteration over a
sequence
such that you can write to the sequence. And THAT is the problem. In
many cases, indexing is much less efficient than iteration.

Are you aware that iteration is frequently based on indexing?

In the cases that it isn't, that's because the iterator generates values
lazily, without ever storing them. You *can't* write to them because
there's nowhere to write to! If you want to store the values so they are
writable, then you have to use indexing.

What makes you think that this is a problem in practice? Can you give an
example of some task you can't solve because you (allegedly) can't write to
a sequence?
 
S

Steven D'Aprano

Mel said:
Steven said:
I *guess* that what you mean by "writable iterators" is that rebinding e
should change seq in place, i.e. you would expect that seq should now
equal [42, 42]. Is that what you mean? It's not clear.

Fortunately, that's not how it works, and far from being a "limitation",
it would be *disastrous* if iterables worked that way. I can't imagine
how many bugs would occur from people reassigning to the loop variable,
forgetting that it had a side-effect of also reassigning to the
iterable. Fortunately, Python is not that badly designed.

And for an iterator like

def things():
yield 1
yield 11
yield 4
yield 9

I don't know what it could even mean.

<http://docs.python.org/reference/simple_stmts.html#the-yield-statement>

You could have tried to debug.

I think you have missed the point of Mel's comment. He knows what the yield
statement does. He doesn't know what it would mean to "write to" an
iterator like things().

Neither do I.
 
T

Thomas 'PointedEars' Lahn

[Sorry for over-quoting, I am not sure how to trim this properly]
Mel said:
Steven D'Aprano wrote:
I *guess* that what you mean by "writable iterators" is that rebinding
e should change seq in place, i.e. you would expect that seq should now
equal [42, 42]. Is that what you mean? It's not clear.

Fortunately, that's not how it works, and far from being a
"limitation", it would be *disastrous* if iterables worked that way. I
can't imagine how many bugs would occur from people reassigning to the
loop variable, forgetting that it had a side-effect of also reassigning
to the iterable. Fortunately, Python is not that badly designed.

And for an iterator like

def things():
yield 1
yield 11
yield 4
yield 9

I don't know what it could even mean.

<http://docs.python.org/reference/simple_stmts.html#the-yield-statement>

You could have tried to debug.

I think you have missed the point of Mel's comment. He knows what the
yield statement does. He doesn't know what it would mean to "write to" an
iterator like things().

Neither do I.

AIUI the OP is referring to write accesses to the iteration variable
(for want of a better term), not being aware what iterators are.
 
I

Ian Kelly

Fortunately, that's not how it works, and far from being a "limitation",
it would be *disastrous* if iterables worked that way. I can't imagine
how many bugs would occur from people reassigning to the loop variable,
forgetting that it had a side-effect of also reassigning to the iterable.
Fortunately, Python is not that badly designed.

The example syntax is a non-starter, but there's nothing wrong with
the basic idea. The STL of C++ uses output iterators and a quick
Google search doesn't turn up any "harmful"-style rants about those.

Of course, there are a couple of major differences between C++
iterators and Python iterators. FIrst, C++ iterators have an explicit
dereference step, which keeps the iterator variable separate from the
value that it accesses and also provides a possible target for
assignment. You could say that next(iterator) is the corresponding
dereference step in Python, but it is not accessible in a for loop and
it does not provide an assignment target in any case.

Second, C++ iterators separate out the dereference step from the
iterator advancement step. In Python, both next(iterator) and
generator.send() are expected to advance the iterator, which would be
problematic for creating an iterator that does both input and output.

I don't think that output iterators would be a "disaster" in Python,
but I also don't see a clean way to add them to the existing iterator
protocol.
If you want to change the source iterable, you have to explicitly do so.
Whether you can or not depends on the source:

* iterators are lazy sequences, and cannot be changed because there's
nothing to change (they don't store their values anywhere, but calculate
them one by one on demand and then immediately forget that value);

No, an iterator is an object that allows traversal over a collection
in a manner independent of the implementation of that collection. In
many instances, especially in Python and similar languages, the
"collection" is abstracted to an operation over another collection, or
even to the results of a serial computation where there is no actual
"collection" in memory.

Iterators are not lazy sequences, because they do not behave like
sequences. You can't index them, you can't reiterate them, you can't
get their length (and before you point out that there are ways of
doing each of these things -- yes, but none of those ways use
sequence-like syntax). For true lazy sequences, consider the concept
of streams and promises in the functional languages.

In any case, the desired behavior of an output iterator on a source
iterator is clear enough to me. If the source iterator is also an
output iterator, then it propagates the write to it. If the source
iterator is not an output iterator, then it raises a TypeError.
* mutable sequences like lists can be changed. The standard idiom for
that is to use enumerate:

for i, e in enumerate(seq):
   seq = e + 42


Unless the underlying collection is a dict, in which case I need to do:

for k, v in d.items():
d[k] = v + 42

Or a file:

for line in f:
# I'm not even sure whether this actually works.
f.seek(-len(line))
f.write(line.upper())

As I said above, iterators are supposed to provide
implementation-independent traversal over a collection. For writing,
enumerate fails in this regard.
 
N

Neal Becker

Ian said:
Fortunately, that's not how it works, and far from being a "limitation",
it would be *disastrous* if iterables worked that way. I can't imagine
how many bugs would occur from people reassigning to the loop variable,
forgetting that it had a side-effect of also reassigning to the iterable.
Fortunately, Python is not that badly designed.

The example syntax is a non-starter, but there's nothing wrong with
the basic idea. The STL of C++ uses output iterators and a quick
Google search doesn't turn up any "harmful"-style rants about those.

Of course, there are a couple of major differences between C++
iterators and Python iterators. FIrst, C++ iterators have an explicit
dereference step, which keeps the iterator variable separate from the
value that it accesses and also provides a possible target for
assignment. You could say that next(iterator) is the corresponding
dereference step in Python, but it is not accessible in a for loop and
it does not provide an assignment target in any case.

Second, C++ iterators separate out the dereference step from the
iterator advancement step. In Python, both next(iterator) and
generator.send() are expected to advance the iterator, which would be
problematic for creating an iterator that does both input and output.

I don't think that output iterators would be a "disaster" in Python,
but I also don't see a clean way to add them to the existing iterator
protocol.
If you want to change the source iterable, you have to explicitly do so.
Whether you can or not depends on the source:

* iterators are lazy sequences, and cannot be changed because there's
nothing to change (they don't store their values anywhere, but calculate
them one by one on demand and then immediately forget that value);

No, an iterator is an object that allows traversal over a collection
in a manner independent of the implementation of that collection. In
many instances, especially in Python and similar languages, the
"collection" is abstracted to an operation over another collection, or
even to the results of a serial computation where there is no actual
"collection" in memory.

Iterators are not lazy sequences, because they do not behave like
sequences. You can't index them, you can't reiterate them, you can't
get their length (and before you point out that there are ways of
doing each of these things -- yes, but none of those ways use
sequence-like syntax). For true lazy sequences, consider the concept
of streams and promises in the functional languages.

In any case, the desired behavior of an output iterator on a source
iterator is clear enough to me. If the source iterator is also an
output iterator, then it propagates the write to it. If the source
iterator is not an output iterator, then it raises a TypeError.
* mutable sequences like lists can be changed. The standard idiom for
that is to use enumerate:

for i, e in enumerate(seq):
seq = e + 42


Unless the underlying collection is a dict, in which case I need to do:

for k, v in d.items():
d[k] = v + 42

Or a file:

for line in f:
# I'm not even sure whether this actually works.
f.seek(-len(line))
f.write(line.upper())

As I said above, iterators are supposed to provide
implementation-independent traversal over a collection. For writing,
enumerate fails in this regard.



While python may not have output iterators, interestingly numpy has just added
this capability. It is part of nditer. So, this may suggest a syntax.

There have been a number of responses to my question that suggest using indexing
(maybe with enumerate). Once again, this is not suitable for many data
structures. c++ and stl teach that iteration is often far more efficient than
indexing. Think of a linked-list. Even for a dense multi-dim array, index
calculations are much slower than iteration.

I believe the lack of output iterators is a defienciency in the python iterator
concept.
 
C

Chris Torek

(I apologize for the length of this article -- if I had more time,
I could write something shorter...)

AFAICT, the python iterator concept only supports readable iterators,
not write.
Is this true?

for example:

for e in sequence:
do something that reads e
e = blah # will do nothing

I believe this is not a limitation on the for loop, but a limitation on the
python iterator concept. Is this correct?

Yes.

Having read through the subsequent discussion, I think in some ways
you have run into some of the same issues that I did in my originally
somewhat-vague thoughts on exceptions, in that your example is too
close to "real Python code" and led a number of followers (including
me, originally) astray. :)

It might be better expressed as, say:

for i in IndirectIter(sequence):
current_value = i.get()
result = compute(current_value)
i.set(result)

which is clearly rather klunky, and also does not fit super-well
into existing iter protocols, but could be implemented for lists
and dictionaries for instance; see below.

A "more direct" syntax (which I admit is pretty klunky, this is
kind of off the top of my head):

for item in sequence with newvalue:
newvalue = compute(item)

This leaves unresolved the issue of "what if you don't set the
variable newvalue", but perhaps the for loop could internally
bind both "item" *and* "newvalue" at the top of each iteration,
so that this is essentially:

for item in sequence with newvalue:
newvalue = item # automatically inserted for you
... user code; if it doesn't set newvalue the .set()
(or whatever equivalent) will re-save the original value ...

Or -- and I think this is actually a better idea -- perhaps it
could "pre-bind" newvalue = None and the automatic iter.set()
invocation would leave "None" undisturbed. In which case, the
internal implementation could even use .set() only, rather than
having to call iter.next(), as the primary protocol, with iter.set()
changing the current value and then doing, in essence, "return
iter.next()". Of course this is just a micro-optimization that
might only apply to CPython in the first place; I am getting way
ahead of myself here. :)

(To expand, what I am thinking at the moment is that if one had
this syntax, one would change the iter protocol. An iterator object
would still provide "__iter__" and "next" callables always. If it
also provides a "set" callable -- or "setitem" or something like
that; the name is clearly flexible at this point -- then this would
make it a "writeable iterator" that one could use with the new
syntax. The protocol would become:

for <var1> in <container> [with <var2>]:
<code>

which if the "with" is present would mean: "call <container>.__iter__
to get an iterable as usual, with the usual check that iter.__iter__
is also a callable. Then, though, check the iterable for the *new*
callable as well. If not present, you get an error. If present,
call iter.next() initially and bind <var2> to None. At the bottom
of the loop, to step the loop, call the iter's iter.set() with
var2; bind its return value to var1, and re-bind var2 to None again.
Both iter.next() and iter.set() can raise StopIteration to terminate
the loop.)

This idea needs more thought applied, of course.

Another possible syntax:

for item in container with key:

which translates roughly to "bind both key and item to the value
for lists, but bind key to the key and value for the value for
dictionary-ish items". Then instead of:

for elem in sequence:
...
elem = newvalue

the OP would write, e.g.:

for elem in sequence with index:
...
sequence[index] = newvalue

which of course calls the usual container.__setitem__. In this
case the "new protocol" is to have iterators define a function
that returns not just the next value in the sequence, but also
an appropriate "key" argument to __setitem__. For lists, this
is just the index; for dictionaries, it is the key; for other
containers, it is whatever they use for their keys.

I actually think I like this second syntax more, as it leaves the
container-modifying step explicitly spelled out in user code. It
would also eliminate much of the need for enumerate().

---- example IndirectIter below ----

class IndirectIterError(TypeError):
pass

class _IInner(object):
def __init__(self, outer, iterlist):
self.outer = outer
self.iterlist = iterlist
self.index = -1

def __iter__(self):
return self

def next(self):
self.index += 1
if self.index >= len(self.iterlist):
raise StopIteration
return self

def get(self):
return self.outer._get(self.index, self.iterlist)

def set(self, newvalue):
return self.outer._set(self.index, self.iterlist, newvalue)

class IndirectIter(object):
def __init__(self, sequence):
if isinstance(sequence, dict):
self._iter = self._dict_iter
self._get = self._dict_get
self._set = self._dict_set
elif isinstance(sequence, list):
self._iter = self._list_iter
self._get = self._list_get
self._set = self._list_set
else:
raise IndirectIterError(
"don't know how to IndirectIter over %s" % type(sequence))
self._seq = sequence

def __str__(self):
return '%s(%s)' % (self.__class__.__name__, self._iterover)

def __iter__(self):
return self._iter()

def _dict_iter(self):
return _IInner(self, self._seq.keys())

def _dict_get(self, index, keys):
return self._seq[keys[index]]

def _dict_set(self, index, keys, newvalue):
self._seq[keys[index]] = newvalue

def _list_iter(self):
return _IInner(self, self._seq)

def _list_get(self, index, _):
return self._seq[index]

def _list_set(self, index, _, newvalue):
self._seq[index] = newvalue

if __name__ == '__main__':
d = {'one': 1, 'two': 2, 'three': 3}
l = [9, 8, 7]
print 'modify dict %r' % d
for i in IndirectIter(d):
i.set(-i.get())
print 'result: %r' % d
print
print 'modify list %r' % l
for i in IndirectIter(l):
i.set(-i.get())
print 'result: %r' % l
 
C

Chris Torek

Another possible syntax:

for item in container with key:

which translates roughly to "bind both key and item to the value
for lists, but bind key to the key and value for the value for
dictionary-ish items". Then ... the OP would write, e.g.:

for elem in sequence with index:
...
sequence[index] = newvalue

which of course calls the usual container.__setitem__. In this
case the "new protocol" is to have iterators define a function
that returns not just the next value in the sequence, but also
an appropriate "key" argument to __setitem__. For lists, this
is just the index; for dictionaries, it is the key; for other
containers, it is whatever they use for their keys.

I note I seem to have switched halfway through thinking about
this from "value" to "index" for lists, and not written that. :)

Here's a sample of a simple generator that does the trick for
list, buffer, and dict:

def indexed_seq(seq):
"""
produce a pair
<key_or_index> <value>
such that seq[key_or_index] is <value> initially; you can
write on seq[key_or_index] to set a new value while this
operates. Note that we don't allow tuple and string here
since they are not writeable.
"""
if isinstance(seq, (list, buffer)):
for i, v in enumerate(seq):
yield i, v
elif isinstance(seq, dict):
for k in seq:
yield k, seq[k]
else:
raise TypeError("don't know how to index %s" % type(seq))

which shows that there is no need for a new syntax. (Turning the
above into an iterator, and handling container classes that have
an __iter__ callable that produces an iterator that defines an
appropriate index-and-value-getter, is left as an exercise. :) )
 
N

Neal Becker

Chris said:
Another possible syntax:

for item in container with key:

which translates roughly to "bind both key and item to the value
for lists, but bind key to the key and value for the value for
dictionary-ish items". Then ... the OP would write, e.g.:

for elem in sequence with index:
...
sequence[index] = newvalue

which of course calls the usual container.__setitem__. In this
case the "new protocol" is to have iterators define a function
that returns not just the next value in the sequence, but also
an appropriate "key" argument to __setitem__. For lists, this
is just the index; for dictionaries, it is the key; for other
containers, it is whatever they use for their keys.

I note I seem to have switched halfway through thinking about
this from "value" to "index" for lists, and not written that. :)

Here's a sample of a simple generator that does the trick for
list, buffer, and dict:

def indexed_seq(seq):
"""
produce a pair
<key_or_index> <value>
such that seq[key_or_index] is <value> initially; you can
write on seq[key_or_index] to set a new value while this
operates. Note that we don't allow tuple and string here
since they are not writeable.
"""
if isinstance(seq, (list, buffer)):
for i, v in enumerate(seq):
yield i, v
elif isinstance(seq, dict):
for k in seq:
yield k, seq[k]
else:
raise TypeError("don't know how to index %s" % type(seq))

which shows that there is no need for a new syntax. (Turning the
above into an iterator, and handling container classes that have
an __iter__ callable that produces an iterator that defines an
appropriate index-and-value-getter, is left as an exercise. :) )

Here is what numpy nditer does:

for item in np.nditer(u, [], ['readwrite'], order='C'):
.... item[...] = 10

Notice that the slice syntax is used to 'dereference' the iterator. This seems
like reasonably pythonic syntax, to my eye.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
474,160
Messages
2,570,889
Members
47,421
Latest member
StacyTaver

Latest Threads

Top