Tuples and immutability

C

Chris Angelico

Better to have tried and failed though than to have simply accepted
what the compiler was doing with no verification at all.

Maybe. But I've learned now that one guy who used to do assembly
language programming on an 8086 is unlikely to discover something that
the many authors of a C compiler haven't noticed. Yes, it's possible
there'll be something specific to my code, like if I'm doing a
strcpy-like operation that isn't *actually* strcpy (the function will
be optimized heavily, but a C-level loop might not be recognized), but
it's more likely the compiler knows better than I do.

That, by the way, was before I realized that *interpreter* writers are
more expert than I am, too, and therefore that I can trust a
heavily-optimized high level language to run my code faster than I
could write equivalent C.

ChrisA
 
G

Gregory Ewing

Ian said:
It's technically "possible" for this augmented assignment to be
performed in place:

x = 12
x += 4

But it's not done in-place, because ints are meant to be immutable.

Which means it's *not* possible, because doing so
would violate the documented properties of the int
type.
In any case, this means that whether the operation is actually
performed in-place is an implementation detail

The implementation could theoretically perform this
optimisation if there are no other references to the
object. But this will be completely invisible, because
to even find out whether it's the same object, you need
to keep another reference to the original object,
preventing the optimisation from being performed.

As far as observable effects are concerned, it's
quite clear: mutable objects can *always* be updated
in-place, and immutable objects can *never* be.
 
G

Gregory Ewing

Ian said:
If the in-place behavior of += is held to be part of the interface,
then we must accept that += is not polymorphic across mutable and
immutable types,

That's quite correct, it's not. As I said, it's one
notation doing double duty.

Usually there isn't any confusion, because you know
whether any particular instance of it is intended to
operate on a mutable or immutable type.

If that's not the case -- if you're writing a function
intended to operate on a variety of types, some
mutable and some not -- then using in-place operators
would not be appropriate.
If you want in-place concatenation, the
obvious way to do it is by calling extend.

It might be the obvious way for that particular operation on
that particular type. But what about all the others?
What's the obvious way to spell in-place set intersection,
for example? (Quickly -- no peeking at the docs!)

The in-place operators provide a standardised spelling
for in-place versions of all the binary operations.
That's a useful thing to have, I think.
 
M

Marko Rauhamaa

Steven D'Aprano said:
The problem with having a choice is that it opens up the possibility of
making the wrong one :)

[...]

In my experience, the average developer has an amazing talent for
pessimising code when they think they are optimising it.

Java is the straitjacket that prevents the "average developer" from
shooting himself in the foot.

Python should let skilled professionals do their work. Thankfully, for
the most part, it does.


Marko
 
I

Ian Kelly

As far as observable effects are concerned, it's
quite clear: mutable objects can *always* be updated
in-place, and immutable objects can *never* be.

Hm. Consider the circle-ellipse problem. Briefly, a circle is-an
ellipse, so in an inheritance hierarchy it is natural to make Circle a
subclass of Ellipse. Now suppose the Ellipse has a stretch method
that mutates the ellipse by changing the length of one of its axes
while preserving the other. To avoid violating LSP, the Circle class
must support all the methods of its ancestor. However it cannot,
because the stretch method would invalidate the invariant of the
Circle class that both of its axes must always be equal.

There are a number of possible solutions. One possibility would be to
copy the Circle as an Ellipse and return the new object instead of
mutating it. Then you have the situation where, given a mutable
object x that satisfies isinstance(x, Ellipse), the stretch method
*may* be able to update the object in-place, or it *may* not.

I can't think of a reasonable example that would replace the stretch
method here with an augmented assignment, but then it is rather late.
It might be the obvious way for that particular operation on
that particular type. But what about all the others?
What's the obvious way to spell in-place set intersection,
for example? (Quickly -- no peeking at the docs!)

You mean set.intersection_update? The in-place set methods are not
hard to remember, because they all end in _update.
 
S

Steven D'Aprano

Hm. Consider the circle-ellipse problem.

Oh, not that old chestnut! The circle-ellipse problem demonstrates one
limitation of OOP (specifically "subtype polymorphism"), as well as a
general difficulty with hierarchical taxonomies. Mammals have hair --
what do you make of hairless[1] dolphins? Circle/Ellipse is not a good
place to start from in order to critic augmented assignment in Python.
You're starting from a place where inheritance itself is problematic, so
naturally you're going to find problems.

Briefly, a circle is-an ellipse, so in an inheritance hierarchy it is
natural to make Circle a subclass of Ellipse.

Natural and wrong. It's only natural if you don't think through the
consequences. As you go on to say:
Now suppose the Ellipse has a stretch method that
mutates the ellipse by changing the length of one of its axes while
preserving the other. To avoid violating LSP, the Circle class must
support all the methods of its ancestor. However it cannot, because the
stretch method would invalidate the invariant of the Circle class that
both of its axes must always be equal.

Right. So *Circles cannot be Ellipses*, not without violating the Liskov
Substitution Principle. If I think that they are, I haven't thought it
through. Nor can Ellipses be Circles. That's the problem of the Circle/
Ellipse relationship.

(Aside: the LSP is not a Law Of Physics that cannot be touched. There are
other OOP models that don't require LSP.)

Even in the case of immutable shapes, one might not wish to inherit
Circle from Ellipsis. Ellipses have five degrees of freedom:

2 x position
size (scale)
orientation
shape

while circles only have three:

2 x position
size

Orientation and shape are meaningless for circles! So they should not
inherit from a class where they are meaningful: given the LSP, a subclass
cannot be more restrictive than a superclass.

There are a number of possible solutions. One possibility would be to
copy the Circle as an Ellipse and return the new object instead of
mutating it. Then you have the situation where, given a mutable object
x that satisfies isinstance(x, Ellipse), the stretch method *may* be
able to update the object in-place, or it *may* not.

That is a really lousy design. Of course we are free to create classes
with ugly, inconsistent, stupid or unworkable APIs if we like. Python
won't stop us:

class MyList(list):
def append(self, obj):
if len(self) % 17 == 3:
return self + [obj]
super(MyList, self).append(obj)


I can't think of a reasonable example that would replace the stretch
method here with an augmented assignment, but then it is rather late.

Um, yes? Nobody suggests that a method of type X has to be the most
obvious way for *any operation* on *any type*. What's your point?

But what about all the others? What's the obvious way

I would expect it to be &=, let's find out if I'm right:

py> a = set("abcde")
py> b = a # save a reference to it
py> c = set("cdefg")
py> a &= c
py> a, b
({'c', 'd', 'e'}, {'c', 'd', 'e'})
py> a is b
True


The only reason I'd need to look at the docs is because I always forget
whether & is intersection and | is union, or the other way around. But
having remembered which is which, going from & to &= was easy.

You mean set.intersection_update? The in-place set methods are not hard
to remember, because they all end in _update.

And hard to spell.




[1] Technically they aren't *entirely* hairless. They may have a few
hairs around the blowhole, and baby dolphins are born with whiskers which
they soon lose. But from a functional perspective, they are hairless.
 
G

Gregory Ewing

Steven said:
I would expect it to be &=,

That's my point -- once you know the binary operator for
an operation, the corresponding in-place operator is
obvious. But there's no established standard set of
method names for in-place operations -- each type
does its own thing.

For that particular type, maybe, but I wouldn't trust
that rule to extend to other types.
 
I

Ian Kelly

That is a really lousy design. Of course we are free to create classes
with ugly, inconsistent, stupid or unworkable APIs if we like. Python
won't stop us:

That's true but irrelevant to my point, which was to counter the
assertion that mutable types can always be assumed to be able to
perform operations in-place.
 
A

alex23

Python should let skilled professionals do their work. Thankfully, for
the most part, it does.

Skilled professionals don't solely rely on the standard library, either.
If you know you need a balanced tree, you'll also know where to find an
implementation of one.
 
R

Rick Johnson

x += y is meant to be equivalent, except possibly in-place and more
efficient, than x = x + y.

In an ideal world, the speed of these two codes should be the same, of course i'm "assuming" that most competent language designers would optimise the slower version.

"""But Rick, Python is an interpreted language and does not benefit from a compile stage."""

Even if the bytecode can't be optimized on the current run, it CAN be optimized by updating the .pyo file for future runs without degrading current (or future) runtime performance.
 
C

Chris Angelico

In an ideal world, the speed of these two codes should be the same, of course i'm "assuming" that most competent language designers would optimise the slower version.

Except that they don't mean the same thing, so it's not an optimization target.

ChrisA
 
T

Terry Reedy

The manual actually says "An augmented assignment expression like x += 1
can be rewritten as x = x + 1 to achieve a similar, but not exactly
equal effect. In the augmented version, x is only evaluated once. Also,
when possible, the actual operation is performed in-place, meaning that
rather than creating a new object and assigning that to the target, the
old object is modified instead.

In an ideal world, the speed of these two codes should be the same,

Nope, 'similar' is not 'equivalent'. Evaluating x twice instead of once
and possibly allocating a new object versus not take extra time. In a
statement like "x.y.z[3*n+m] += 1", calculating the target dominates the
time to increment, so this form should be nearly twice as fast.
 
S

Steven D'Aprano

The manual actually says "An augmented assignment expression like x += 1
can be rewritten as x = x + 1 to achieve a similar, but not exactly
equal effect. In the augmented version, x is only evaluated once. Also,
when possible, the actual operation is performed in-place, meaning that
rather than creating a new object and assigning that to the target, the
old object is modified instead.

In an ideal world, the speed of these two codes should be the same,

Nope, 'similar' is not 'equivalent'. Evaluating x twice instead of once
and possibly allocating a new object versus not take extra time. In a
statement like "x.y.z[3*n+m] += 1", calculating the target dominates the
time to increment, so this form should be nearly twice as fast.

Excellent point Terry!

I always forget that the target of an augmented assignment may not be a
simple name like "x" but can be an arbitrary complex reference, anything
that is a legal assignment target. Because += is documented as only
evaluating the expression once it can behave quite differently to the
`spam = spam + 1` case. Evaluating the right hand side may have side-
effects that change what the left hand side evaluates to. This is not the
case with the augmented assignment.
 
E

Ethan Furman

The manual actually says "An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to achieve a
similar, but not exactly equal effect. In the augmented version, x is only evaluated once. Also, when possible, the
actual operation is performed in-place, meaning that rather than creating a new object and assigning that to the target,
the old object is modified instead.

.... and reassigned to the target. :) (If it doesn't say that last part, it should.)
 
A

Antoon Pardon

Op 11-03-14 00:24, Roy Smith schreef:
The problem with having a choice is that it opens up the possibility of
making the wrong one :)

This is just a standard defense for the status quo. Introducing the decimal
module also added a choice.
As this discussion has shown, figuring out whether a hash table or a
tree is better for a given problem is non-trivial. My guess is that if
you gave 1000 typical developers both data structures and let them pick
freely, the number of cases where it really mattered and the developer
picked the right one would be approximately equal to the number of cases
where they picked the wrong one.

You are only illustrating one part. How about all those cases now where the
wrong choice is more or less forced on the developer for lack of the alternative?
 
A

Antoon Pardon

Op 12-03-14 07:28, Steven D'Aprano schreef:
Nope, 'similar' is not 'equivalent'. Evaluating x twice instead of once
and possibly allocating a new object versus not take extra time. In a
statement like "x.y.z[3*n+m] += 1", calculating the target dominates the
time to increment, so this form should be nearly twice as fast.
Excellent point Terry!

I always forget that the target of an augmented assignment may not be a
simple name like "x" but can be an arbitrary complex reference, anything
that is a legal assignment target. Because += is documented as only
evaluating the expression once it can behave quite differently to the
`spam = spam + 1` case. Evaluating the right hand side may have side-
effects that change what the left hand side evaluates to. This is not the
case with the augmented assignment.

The documentation is wrong at that point as the following code illustrates.

| import sys
| write = sys.stdout.write
|
| class logdict:
| def __init__(self):
| self.lst = {}
|
| def __setitem__(self, key, value):
| write('[%s] <= %s\n' % (key, value))
| self.lst[key] = value
|
| def __getitem__(self, key):
| value = self.lst[key]
| write('[%s] => %s\n' % (key, value))
| return value
|
| tab = logdict()
| tab['key'] = 'value'
| tab['key'] += ' with extra tail'
| write('====\n')
| tab = logdict()
| tab['key'] = 'value'
| tab['key'] = tab['key'] + ' with extra tail'

If you run this code, you get the following result:

| [key] <= value
| [key] => value
| [key] <= value with extra tail
| ====
| [key] <= value
| [key] => value
| [key] <= value with extra tail

As you can see there is no difference here in the evaluations done
between using

| tab['key'] += ' with extra tail'

or

| tab['key'] = tab['key'] + ' with extra tail'
 
I

Ian Kelly

Nope, 'similar' is not 'equivalent'. Evaluating x twice instead of once
and possibly allocating a new object versus not take extra time. In a
statement like "x.y.z[3*n+m] += 1", calculating the target dominates the
time to increment, so this form should be nearly twice as fast.

Excellent point Terry!

I always forget that the target of an augmented assignment may not be a
simple name like "x" but can be an arbitrary complex reference, anything
that is a legal assignment target. Because += is documented as only
evaluating the expression once it can behave quite differently to the
`spam = spam + 1` case. Evaluating the right hand side may have side-
effects that change what the left hand side evaluates to. This is not the
case with the augmented assignment.

Of course one could also do:

z = x.y.z
i = 3*n+m
z = z + 1

which reduces the duplicated work down to storing and loading a couple
of locals, and also prevents side-effects from affecting the LHS
evaluation.
 
I

Ian Kelly

The documentation is wrong at that point as the following code illustrates.

Either way it still has to do a getitem and a setitem, but if you have
a more nested structure then the extra getitems are not repeated. For
example, using your logdict class:
tab = logdict()
tab[1] = logdict()
tab[1][2] = logdict()
[1] => said:
tab[1][2][3] = ['value']
[1] => <__main__.logdict object at 0x02A2E430>
[2] => said:
tab[1][2][3] += [' with extra tail']
[1] => <__main__.logdict object at 0x02A2E430>
[2] => <__main__.logdict object at 0x02A2EB10>
[3] => ['value']
[3] <= ['value', ' with extra tail']

versus:
[1] => <__main__.logdict object at 0x02A2E430>
[2] => said:
tab[1][2][3] = tab[1][2][3] + [' with extra tail']
[1] => <__main__.logdict object at 0x02A2E430>
[2] => <__main__.logdict object at 0x02A2EB10>
[3] => ['value']
[1] => <__main__.logdict object at 0x02A2E430>
[2] => <__main__.logdict object at 0x02A2EB10>
[3] <= ['value', ' with extra tail']

As you can see the += version does two fewer getitem calls in this case.
 
O

Oscar Benjamin

The manual actually says "An augmented assignment expression like x += 1 can
be rewritten as x = x + 1 to achieve a similar, but not exactly equal
effect. In the augmented version, x is only evaluated once. Also, when
possible, the actual operation is performed in-place, meaning that rather
than creating a new object and assigning that to the target, the old object
is modified instead.


In an ideal world, the speed of these two codes should be the same,


Nope, 'similar' is not 'equivalent'. Evaluating x twice instead of once and
possibly allocating a new object versus not take extra time. In a statement
like "x.y.z[3*n+m] += 1", calculating the target dominates the time to
increment, so this form should be nearly twice as fast.

An example where the result is semantically different:
from numpy import array
a = array([1, 2, 3], dtype=int)
a array([1, 2, 3])
a + 1.1 array([ 2.1, 3.1, 4.1])
a += 1.1
a
array([2, 3, 4])

The reason is that numpy arrays are both mutable and have statically
determined elementary data type:
dtype('int64')


Oscar
 
A

Antoon Pardon

Op 12-03-14 10:51, Ian Kelly schreef:
Either way it still has to do a getitem and a setitem, but if you have
a more nested structure then the extra getitems are not repeated. For
example, using your logdict class:

Sure, but the documentation doesn't say for sufficiently nested structures
some evaluations are not repeated.

Take the following example.

| tab['key'] = [1]
| tab['key'] += [2]

Now let us rewrite that second statment in two different ways.

| tab['key'] = tab['key'] + [2]

or

| tmp = tab['key']
| tmp += [2]

Now which of these two rewritings is closer to the augmented concatenation?
A natural reading of the documentation would conclude the second one, because
in that case we only need to evaluate tab['key'] once as righthand sided.
However it turns out the augmented concantenation is closer to the first
rewriting here, evaluating tab['key'] twice once a lefthand sided and once
as right hand sided, which IMO will surprise people that rely on the documentation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,567
Members
47,203
Latest member
EmmaSwank1

Latest Threads

Top