Tuples and immutability

Gregory Ewing · Mar 8, 2014

Duncan said:
Is there any reason why tuples need to throw an exception on assigning to
the element if the old value and new value are the same object?

It would make introspection misleading, because tuples
would have a __setitem__ method event though they don't
actually support item assignment.

Also, it would solve the problem for tuples in particular,
but not for any other immutable type -- they would all
have to implement the same behaviour independently to
enjoy the benefit.

Here's another idea: If the __iadd__ method returns the
same object, *and* the LHS doesn't have a __setitem__
method, then do nothing instead of raising an exception.

Peter said:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "<stdin>", line 4, in __setitem__
TypeError: 257 is not 257

I'm not sure "help" is the right word here

I don't think that's a problem, because the use case
being addressed is where the object performs in-place
modification and always returns itself. Any object that
doesn't return itself is not modifying in-place, even
if the returned object happens to be equal to the
original one.

Ian Kelly · Mar 8, 2014

Here's another idea: If the __iadd__ method returns the
same object, *and* the LHS doesn't have a __setitem__
method, then do nothing instead of raising an exception.

Maybe it doesn't have a __setitem__ because the object that was
retrieved is computed rather than stored, and the result of the
__iadd__ will simply be discarded. Somewhat contrived example:

class LessThanFilter:

def __init__(self, the_list):
self._the_list = the_list

def __getitem__(self, bound):
return [x for x in self._the_list if x < bound]

filter = LessThanFilter([10, 20, 30, 40, 50])
filter[25] += [15, 17, 23]

Should that last line not raise an exception? The __iadd__ call will
return the same object, and the LHS doesn't have a __setitem__ method.

I don't think that's a problem, because the use case
being addressed is where the object performs in-place
modification and always returns itself. Any object that
doesn't return itself is not modifying in-place, even
if the returned object happens to be equal to the
original one.

I already mentioned this earlier in the thread, but a balanced binary
tree might implement += as node insertion and then return a different
object if the balancing causes the root node to change.

Marko Rauhamaa · Mar 8, 2014

Ian Kelly said:
I already mentioned this earlier in the thread, but a balanced binary
tree might implement += as node insertion and then return a different
object if the balancing causes the root node to change.

True.

Speaking of which, are there plans to add a balanced tree to the
"batteries" of Python? Timers, cache aging and the like need it. I'm
using my own AVL tree implementation, but I'm wondering why Python
still doesn't have one.

In fact, since asyncio has timers but Python doesn't have balanced
trees, I'm led to wonder how good the asyncio implementation can be.

Note that Java "batteries" include TreeMap.

Marko

Ian Kelly · Mar 8, 2014

Speaking of which, are there plans to add a balanced tree to the
"batteries" of Python? Timers, cache aging and the like need it. I'm
using my own AVL tree implementation, but I'm wondering why Python
still doesn't have one.

None currently that I'm aware of. If you want to propose adding one,
I suggest reading:

http://docs.python.org/devguide/stdlibchanges.html

In fact, since asyncio has timers but Python doesn't have balanced
trees, I'm led to wonder how good the asyncio implementation can be.

Peeking at the code, it appears to use a heapq-based priority queue.
Why would a balanced binary tree be better?

Marko Rauhamaa · Mar 8, 2014

Ian Kelly said:
Peeking at the code, it appears to use a heapq-based priority queue.
Why would a balanced binary tree be better?

AFAIK, a heap queue doesn't allow for the deletion of a random element
forcing you to leave the canceled timers in the queue to be deleted
later.

In a very typical scenario, networking entities start timers very
frequently (depending on the load, maybe at 100..1000 Hz) but cancel
virtually every one of them, leading to some wakeup churn and extra
memory load. I don't know if the churn is better or worse than the tree
balancing overhead.

Imagine a web server that received HTTP connections. You might want to
specify a 10-minute idle timeout for the connections. In the heapq timer
implementation, your connection objects are kept in memory for 10
minutes even if they are closed gracefully because the canceled timer
maintains a reference to the object.

Of course, it may be that the heapq implementation sets the callback to
None leaving only the minimal timer object lingering and waiting to come
out of the digestive tract.

Marko

Dan Stromberg · Mar 8, 2014

True.

Speaking of which, are there plans to add a balanced tree to the
"batteries" of Python? Timers, cache aging and the like need it. I'm
using my own AVL tree implementation, but I'm wondering why Python
still doesn't have one.

I think it'd probably be a good idea to add one or more balanced
binary trees to the standard library. But I suspect it's been tried
before, and didn't happen. It might be good to add an _un_balanced
tree too, since they do quite well with random keys.

Here's a performance comparison I did of a bunch of tree types in Python:
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-01/

Marko Rauhamaa · Mar 8, 2014

Mark Lawrence said:
I believe that there are advantages to leaving specialist data
structures on pypi or other sites, plus it means Python in a Nutshell
can still fit in your pocket and not a 40 ton articulated lorry,
unlike the Java equivalent.

An ordered map is a foundational data structure as opposed to, say, a
priority queue, let alone something like urllib2.

If I had to choose between a hash table and AVL (or RB) tree in the
standard library, it would definitely have to be the latter. It is more
generally usable, has fewer corner cases and probably has an equal
performance even in hash tables' sweet spot.

Marko

Roy Smith · Mar 8, 2014

Marko Rauhamaa said:
If I had to choose between a hash table and AVL (or RB) tree in the
standard library, it would definitely have to be the latter. It is more
generally usable, has fewer corner cases and probably has an equal
performance even in hash tables' sweet spot.

The C++ folks made that decision, and people spent the next 10 years
complaining, "Why is there no hash table in STL?"

Gregory Ewing · Mar 9, 2014

Ian said:
class LessThanFilter:

def __init__(self, the_list):
self._the_list = the_list

def __getitem__(self, bound):
return [x for x in self._the_list if x < bound]

filter = LessThanFilter([10, 20, 30, 40, 50])
filter[25] += [15, 17, 23]

Should that last line not raise an exception?

In this case it will fail to catch what is probably an error,
but you can't expect the language to find all your bugs for
you. If you wrote the same bug this way:

filter[25].extend([15, 17, 23])

it wouldn't be caught either.

What's happening is that we're trying to use the syntax
a += b to mean two different things:

1) Shorthand for a = a + b

2) A way of expressing an in-place modification, such
as a.extend(b)

Case (2) is not really an assignment at all, so arguably
it shouldn't require the LHS to support assignment.

Gregory Ewing · Mar 9, 2014

Ian said:
I already mentioned this earlier in the thread, but a balanced binary
tree might implement += as node insertion and then return a different
object if the balancing causes the root node to change.

That would be a really bad way to design a binary tree
implementation. What if there is another reference to
the tree somewhere? It's still going to be referring to
the old root object, and will have an incoherent view
of the data -- partly old and partly new.

If you're going to have a mutable tree, it needs to be
encapsulated in a stable top-level object.

Dan Stromberg · Mar 9, 2014

If I had to choose between a hash table and AVL (or RB) tree in the
standard library, it would definitely have to be the latter. It is more
generally usable, has fewer corner cases and probably has an equal
performance even in hash tables' sweet spot.

Actually, in the performance comparison I mentioned previously, I
compared Python dict's to a bunch of different balanced trees and one
unbalanced tree. The dictionary was much faster, though granted, it
was the only one in C.

That URL again:
http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-01/

Ian Kelly · Mar 9, 2014

Ian said:
Ian said:

class LessThanFilter:

def __init__(self, the_list):
self._the_list = the_list

def __getitem__(self, bound):
return [x for x in self._the_list if x < bound]

filter = LessThanFilter([10, 20, 30, 40, 50])
filter[25] += [15, 17, 23]

Should that last line not raise an exception?

Click to expand...

In this case it will fail to catch what is probably an error,
but you can't expect the language to find all your bugs for
you. If you wrote the same bug this way:

filter[25].extend([15, 17, 23])

it wouldn't be caught either.

What's happening is that we're trying to use the syntax
a += b to mean two different things:

1) Shorthand for a = a + b

2) A way of expressing an in-place modification, such
as a.extend(b)

Case (2) is not really an assignment at all, so arguably
it shouldn't require the LHS to support assignment.

In my view the second one is wrong. a += b should be understood as
being equivalent to a = a + b, but with the *possible* and by no means
guaranteed optimization that the operation may be performed in-place.

In fact, if you read the documentation for lists, you may notice that
while they clearly cover the + operator and the extend method, they do
not explicitly document the list class's += operator. So although I'm
not entirely sure whether it is intentional or not, and I would be
quite surprised if some implementation were actually to differ on this
point, the language does *not* from what I can see guarantee that the
+= operator on lists is equivalent to calling .extend.

That having been said, code that uses += and relies on the operation
to be performed in-place should be considered buggy. If you need the
operation to be performed in-place, then use in-place methods like
list.extend. If you need the operation not to be performed in-place,
then use a = a + b. If you're ambivalent on the in-place issue and
just want to write polymorphic code, that's when you should consider
using +=.

Ian Kelly · Mar 9, 2014

That would be a really bad way to design a binary tree
implementation. What if there is another reference to
the tree somewhere? It's still going to be referring to
the old root object, and will have an incoherent view
of the data -- partly old and partly new.

If you're going to have a mutable tree, it needs to be
encapsulated in a stable top-level object.

Well, as I parenthetically noted the first time I brought it up,
"whether this is good design is tangential; it's a possible design".
The language shouldn't be written such that only those designs deemed
"good" by some external committee can be implemented. What you
dismiss as "really bad" may be exactly what somebody else needs, and
maybe they intend that there won't be other references.

Marko Rauhamaa · Mar 9, 2014

Roy Smith said:
The C++ folks made that decision, and people spent the next 10 years
complaining, "Why is there no hash table in STL?"

Well, luckily, nobody should need to choose. Even STL has an
unordered_map now.

Marko

Marko Rauhamaa · Mar 9, 2014

Dan Stromberg said:
Actually, in the performance comparison I mentioned previously, I
compared Python dict's to a bunch of different balanced trees and one
unbalanced tree. The dictionary was much faster, though granted, it
was the only one in C.

Yes, that is one major detail. There must be many more.

Marko

Marko Rauhamaa · Mar 9, 2014

Ian Kelly said:
In my view the second one is wrong. a += b should be understood as
being equivalent to a = a + b, but with the *possible* and by no means
guaranteed optimization that the operation may be performed in-place.

Some call it an optimization, others call it a side effect.

Anyway, that's how it has been explicitly defined in the language
specification so that's the reality whether one likes it or not.

Marko

Joshua Landau · Mar 9, 2014

That'd require figuring out whether or not the variable was actually
mutated, and that's pretty hard to work out.

It does not. First, the "warning" is actually an attachment to the
exception so is only shown if the exception is uncaught. This should
basically never happen in working code. The warning exists only to
remove likely misunderstanding in these odd cases.

Even if "x = (1,); x[0] += 1" warned "addition was inplace; possible
mutation occurred" or whatever phrasing you wish, this would only
cause a quick check of the results.

Chris Angelico · Mar 9, 2014

That'd require figuring out whether or not the variable was actually
mutated, and that's pretty hard to work out.

Click to expand...

It does not. First, the "warning" is actually an attachment to the
exception so is only shown if the exception is uncaught. This should
basically never happen in working code. The warning exists only to
remove likely misunderstanding in these odd cases.

Even if "x = (1,); x[0] += 1" warned "addition was inplace; possible
mutation occurred" or whatever phrasing you wish, this would only
cause a quick check of the results.

I think I see what you're saying here. But ignore "top-level"; this
should just be a part of the exception message, no matter what.
Otherwise things that recreate the REPL (like IDLE) or that isolate
two sections of the code (like a web server framework) wouldn't get
the benefit, because the exception's not caught at the true top-level.

x=1,
x[0]+=0

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
x[0]+=0
TypeError: 'tuple' object does not support item assignment

What you're saying is that this should notice that it's doing an
augmented assignment and give some more text. This can be done; all
you need to do is catch the error and reraise it with more info:
x[0]+=0
except TypeError as e:
if 'does not support item assignment' in e.args[0]:
raise TypeError(e.args[0]+"\nAugmented assignment used;
mutation may have occurred.") from None
raise

Traceback (most recent call last):
File "<pyshell#16>", line 5, in <module>
raise TypeError(e.args[0]+"\nAugmented assignment used; mutation
may have occurred.") from None
TypeError: 'tuple' object does not support item assignment
Augmented assignment used; mutation may have occurred.

Now you can look at writing an import hook that does an AST transform,
locating every instance of item assignment and wrapping it like that.
It's certainly possible. I'm not sure how much benefit you'd get, but
it could be done.

ChrisA

Joshua Landau · Mar 9, 2014

I think I see what you're saying here. But ignore "top-level"; this
should just be a part of the exception message, no matter what.

I don't think I was clear, but yes. That.

What you're saying is that this should notice that it's doing an
augmented assignment and give some more text. This can be done; all
you need to do is catch the error and reraise it with more info:
[...]
Now you can look at writing an import hook that does an AST transform,
locating every instance of item assignment and wrapping it like that.
It's certainly possible. I'm not sure how much benefit you'd get, but
it could be done.

I would probably implement it closer to home. Inside
tuple.__getitem__, there would be something like

if context_is_augmented_assignment():
raise TypeError(message+warning)
else:
raise TypeError(message)

which would have much lower technical costs. It does depend on how
costly "context_is_augmented_assignment" is, though. A speed argument
is totally valid, but I'd hope it's possible to work around.

Chris Angelico · Mar 9, 2014

I would probably implement it closer to home. Inside
tuple.__getitem__, there would be something like

if context_is_augmented_assignment():
raise TypeError(message+warning)
else:
raise TypeError(message)

which would have much lower technical costs. It does depend on how
costly "context_is_augmented_assignment" is, though. A speed argument
is totally valid, but I'd hope it's possible to work around.

Yeah, I'm not sure there's an easy way to tell the tuple that it's an
augmented assignment. You might be able to look at the backtrace, but
that's tricky.

In theory, you could catch TypeError after any augmented assignment,
and check several things:
1) The target object does not have __setitem__
2) The object being manipulated does have __iadd__ (or corresponding for others)
3) The error is that item assignment is not possible

and if all three are the case, then add a tag to the message. But this
is best done by catching the exception. Otherwise you'd be limiting
this to tuples and lists; not to mention that this is really only an
issue of error reporting, and correct code shouldn't be tripping this
at all. So put a check like that at the place where you display the
error, if you can. The AST transform that I described would also work.

But I really think it's not worth the effort. If you get this error,
understand that it may have consequences.

ChrisA

cmath, __float__ and __complex__	1	Feb 15, 2007
Issues with `codecs.register` and `codecs.CodecInfo` objects	2	Jul 6, 2012
Searching and manipulating lists of tuples	7	Jun 12, 2006
Lists and Tuples and Much More	15	Apr 12, 2007
Unsupported operand type(s) for +: 'float' and 'tuple'	6	Jun 10, 2011
Using tuples correctly?	3	Oct 11, 2004
__getattr__ and functions that don't exist	6	May 25, 2006
Lost in __setstate__() in C++ and swig	1	Sep 11, 2007

Tuples and immutability

Gregory Ewing

Ian Kelly

Marko Rauhamaa

Ian Kelly

Marko Rauhamaa

Dan Stromberg

Marko Rauhamaa

Roy Smith

Gregory Ewing

Gregory Ewing

Dan Stromberg

Ian Kelly

Ian Kelly

Marko Rauhamaa

Marko Rauhamaa

Marko Rauhamaa

Joshua Landau

Chris Angelico

Joshua Landau

Chris Angelico

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads