Tuples and immutability

Dan Stromberg · Mar 9, 2014

Yes, that is one major detail. There must be many more.

This is not just a detail: O(1) tends to be beat O(logn) pretty easily
for large n.

Marko Rauhamaa · Mar 9, 2014

Dan Stromberg said:
This is not just a detail: O(1) tends to be beat O(logn) pretty easily
for large n.

There is no O(1) hash table.

Marko

Dan Stromberg · Mar 9, 2014

There is no O(1) hash table.

http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1

Marko Rauhamaa · Mar 9, 2014

Dan Stromberg said:
http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1

Please elaborate.

Marko

Gregory Ewing · Mar 9, 2014

Ian said:
In my view the second one is wrong. a += b should be understood as
being equivalent to a = a + b, but with the *possible* and by no means
guaranteed optimization that the operation may be performed in-place.

This interpretation is at odds with the Language Reference,
section 6.2.1, Augmented Assignment Statements:

"An augmented assignment expression like x += 1 can be rewritten as x = x + 1 to
achieve a similar, but not exactly equal effect... when possible, the actual
operation is performed
in-place, meaning that rather than creating a new object and assigning that to
the target, the old object is modified instead."

Note that it says "when possible", not "if the implementation
feels like it".

In fact, if you read the documentation for lists, you may notice that
while they clearly cover the + operator and the extend method, they do
not explicitly document the list class's += operator.

The "when possible" clause, together with the fact that lists
are mutable, implies that it *will* be done in-place. There
is no need to document all the in-place operators explicitly
for every type.

Dan Stromberg · Mar 9, 2014

Please elaborate.

A hash table of fixed size is O(1) until you fill it so full that you
start getting enough collisions to make it O(n), as one bucket becomes
a linked list. This is because the hash function is O(1), and looking
up a value in a C array is O(1).

A more flexible hash table will not have a fixed size; instead it will
grow itself as needed. This growth operation tends to be O(n), but
happens vanishingly infrequently as the table grows, so it's still
amortized O(1).

Marko Rauhamaa · Mar 9, 2014

Dan Stromberg said:
[...]

it's still amortized O(1).

So we are in perfect agreement.

Hash tables are a useful family of techniques but involve quite a bit of
cost/benefit heuristics. You can only claim O(1) if your hash table is
at least as large as the number of elements.

As for comparing them with balanced trees, I think one of the main
advantages hash tables have is better CPU cache performance. A tree
involves much more jumping around the RAM than a hash table.

Still, trees have many things going for them:

* no need to find a good hashing function

* no need to spend time on calculating the hash value

* no need to find optimal hash table sizes -- no costly resizing

And of course, the main point:

* trees are ordered

Note that most key comparisons tend to be very quick as you don't need
to traverse the whole key to locate the element. Also, it is hard to say
if going O(log n) levels deep in the tree is slower than calculating the
hash value, although, as I said, the latter operation tends to benefit
from a cache locality.

Trees are also crucial when you don't have exact matches, but for
example, when you are looking for prefix or wild-card matches.

Marko

Terry Reedy · Mar 9, 2014

This interpretation is at odds with the Language Reference,
section 6.2.1, Augmented Assignment Statements:

"An augmented assignment expression like x += 1 can be rewritten as x =
x + 1 to
achieve a similar, but not exactly equal effect... when possible, the
actual operation is performed
in-place, meaning that rather than creating a new object and assigning
that to
the target, the old object is modified instead."

Note that it says "when possible", not "if the implementation
feels like it".

The "when possible" clause, together with the fact that lists
are mutable, implies that it *will* be done in-place. There
is no need to document all the in-place operators explicitly
for every type.

The discussion of .__iop__ in the datamodel chapter was just revised
slightly.
http://bugs.python.org/issue19953

Ian Kelly · Mar 9, 2014

This interpretation is at odds with the Language Reference,
section 6.2.1, Augmented Assignment Statements:

"An augmented assignment expression like x += 1 can be rewritten as x = x +
1 to
achieve a similar, but not exactly equal effect... when possible, the actual
operation is performed

in-place, meaning that rather than creating a new object and assigning that
to
the target, the old object is modified instead."

Note that it says "when possible", not "if the implementation
feels like it".

That's quite vague, and not much stronger a guarantee than "maybe".
It's technically "possible" for this augmented assignment to be
performed in place:

x = 12
x += 4

But it's not done in-place, because ints are meant to be immutable.
In any case, this means that whether the operation is actually
performed in-place is an implementation detail -- if not of the Python
implementation then at least of the class -- and not something the
user should take for granted.

Ian Kelly · Mar 10, 2014

A hash table of fixed size is O(1) until you fill it so full that you
start getting enough collisions to make it O(n), as one bucket becomes
a linked list. This is because the hash function is O(1), and looking
up a value in a C array is O(1).

A more flexible hash table will not have a fixed size; instead it will
grow itself as needed. This growth operation tends to be O(n), but
happens vanishingly infrequently as the table grows, so it's still
amortized O(1).

A hash table can also only ever be O(1) in the average case. Worst
case, everything you put into the hash table has the same hash value,
and so the time to fetch an item is entirely dependent on the
collision resolution scheme -- e.g. O(n) if a linked list or linear
probe is used, or perhaps O(log n) if each bucket is a balanced binary
tree.

There are schemes such as cuckoo hashing that allow true O(1) worst
case access, but they have other drawbacks -- inserts with cuckoo
hashing are amortized O(1), and it is even possible for an insert to
fail entirely after spending exponential time trying to find a way to
arrange things.

Steven D'Aprano · Mar 10, 2014

On Sun, Mar 9, 2014 at 4:03 PM, Gregory Ewing

That's quite vague, and not much stronger a guarantee than "maybe". It's
technically "possible" for this augmented assignment to be performed in
place:

x = 12
x += 4

But it's not done in-place, because ints are meant to be immutable.

That's incorrect. Ints aren't merely "meant" to be immutable, which
implies that's it's optional, they are defined by the language
specification and the reference implementation as immutable. Any
interpreter where ints are mutable *is not Python*.

In any case, this means that whether the operation is actually performed
in-place is an implementation detail -- if not of the Python
implementation then at least of the class -- and not something the user
should take for granted.

Whether += operates in place or not is part of the interface of the
class, not the implementation.

Would you say that whether list.append operates in place or creates a new
list is an implementation detail? Whether str.upper() creates a new
string or modifies the existing one in place? Mutability versus
immutability is part of the interface, not implementation, not
withstanding that somebody could create an alternative class with the
opposite behaviour: a MutableStr, or ImmutableList.

Steven D'Aprano · Mar 10, 2014

There is no O(1) hash table.

Of course there are.

While it is true that hash tables *in general* are not *always* O(1), the
claim that hash tables are O(1) is *less wrong* than your claim that
there is "no O(1) hash table".

Proof: I create a hash table that accepts unsigned bytes as keys, where
the hash is the value of the byte. So byte 0x42 hashes to 0x42, or
decimal 66. I give the hash table 256 slots, numbered from 0 to 255.
Every hash takes constant time, there are no collisions at all, lookups,
insertions and deletions all take constant time regardless of how full
the table is. The best, worst and average cases are not just O(1) but
also Î©(1).

Since I have proven that there is *at least one* hash table that is O(1)
for best, worst and average case, your claim there are no such hash
tables is completely wrong, and certainly more wrong than the claim that
Python dicts are O(1) (which is only a little bit wrong, but typically
right).

See also "The Relativity Of Wrong":

http://chem.tufts.edu/answersinscience/relativityofwrong.htm

Steven D'Aprano · Mar 10, 2014

A hash table can also only ever be O(1) in the average case.

Since this discussion has apparently devolved into people trying to out-
pedant each other, I'm going to dispute that. That will depend on the
distribution of hash collisions. With a sufficiently big table,
sufficiently small set of possible data, a sufficiently good hash
function, or some combination of the above, a hash table may be O(1) even
in the worst case.

E.g. if you hash a number n to itself, e.g. hash(42) == 42, your data
consists of single byte numbers 0 through 255, and your hash table has
256 slots, *there are no collisions* and every lookup is O(1).

There are technical meanings for Big Oh notation, which can be quite
complicated. There's Big O, Little o, Big Î© (omega), Little Ï‰ (omega),
Big Î˜ (theta), although I don't think there's a Little Î¸. They can refer
to the best case, worst case, average (mean) case, median case, "typical"
case where you're making some guess of what occurs typically, and any
other situation you care about. The algorithmic complexity can apply
consistently to each and every run of the algorithm, or they can be
amortized over many runs. If you're doing a technical analysis of an
algorithm, this pedantic detail is important.

But when people are just talking informally in a hand-wavy manner, they
usually say "O(foo)" when they actually mean "for typical data under
typical conditions, the algorithm runs with a complexity of the order of
foo". And that's perfectly okay, just like it's perfectly okay to
describe the Earth as a sphere even though a pedant will call it a
wrinkly asymmetrical oblate spheroid.

Marko Rauhamaa · Mar 10, 2014

Steven D'Aprano said:
Proof: I create a hash table that accepts unsigned bytes as keys, where

The O(f(n)) notation has no meaning when n is limited.

This thing is not just pedantry. The discussion was how a balanced tree
fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
presented as a proof that balanced trees don't have a chance in practice
or theory. I wasn't so easily convinced.

Marko

Ian Kelly · Mar 10, 2014

That's incorrect. Ints aren't merely "meant" to be immutable, which
implies that's it's optional, they are defined by the language
specification and the reference implementation as immutable. Any
interpreter where ints are mutable *is not Python*.

That's true, but is beside the point, which is that "when possible" is
not very meaningful.

Whether += operates in place or not is part of the interface of the
class, not the implementation.

Would you say that whether list.append operates in place or creates a new
list is an implementation detail? Whether str.upper() creates a new
string or modifies the existing one in place?

Of course not. list.append is documented as modifying the list.
str.upper is documented as returning a copy of the string.

Mutability versus
immutability is part of the interface, not implementation, not
withstanding that somebody could create an alternative class with the
opposite behaviour: a MutableStr, or ImmutableList.

If the in-place behavior of += is held to be part of the interface,
then we must accept that += is not polymorphic across mutable and
immutable types, which in my mind largely* defeats the purpose of
having it. After all, there should be one -- and preferably only one
-- obvious way to do it. If you want in-place concatenation, the
obvious way to do it is by calling extend. If you want copy
concatenation, the obvious way to do it is with the + operator. Why
then should not just mutable sequences but immutable sequences as well
even offer the += operator?

* The one exception I can think of is numpy, where there is no more
obvious way to do in-place addition, and in that context I would
consider the in-place behavior to be part of the interface.

Steven D'Aprano · Mar 10, 2014

The O(f(n)) notation has no meaning when n is limited.

It has obvious meaning: O(1) means that look-ups take constant time, not
(for example) proportional to the number of keys in the table.

This thing is not just pedantry.

Yes it is. You're not even being pedantic for the sake of educating
people and helping them learn. If that were the case, I would completely
understand. Rather, it looks to me that you're being obnoxiously pedantic
for the sake of trying to get attention. I think you are trolling. Take
your comment that started this dispute:

[responding to Dan Stromberg]

This is not just a detail: O(1) tends to be beat O(logn)
pretty easily for large n.

[to which your entire response was]
There is no O(1) hash table.

As pedantry, it is an utter failure: it's *wrong*, for starters, but you
didn't even make a pretence of trying to justify it. It's not
educational, and it doesn't advance your argument in any way. And just
*two posts later*, you acknowledged without apology or embarrassment that
hash tables actually are O(1). So it seems that you didn't even believe
your claim when you made it. This is why I think you are trolling.

All your comment accomplished was to wrongly contradict a well-
established and often-repeated fact that hash tables are usually O(1):

https://duckduckgo.com/html/?q=big+oh+hash+tables

which is great if your aim is to trick people into giving you attention
(as I suppose I am giving you attention now) but useless for advancing
the debate or educating people.

It is as if you had declared "Actually the Allies had lost World War
Two", and then tried to justify such a ridiculously false statement on
the basis that while they didn't actually *lose* according to the
normally accepted meaning of the word, some of the Allies may have failed
to accomplish every single one of their objectives in the war.

The discussion was how a balanced tree
fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
presented as a proof that balanced trees don't have a chance in practice
or theory. I wasn't so easily convinced.

If you had wanted to put the case for balanced trees, you could mention
that in practice they perform comparably to hash tables for reasonable
sizes of data, and may require less memory. You could have made an
attempt to teach people about the difference between O and Î© complexity,
the difference between best case, worst case, average case, and typical
behaviour. You could have mentioned the difference between amortized and
non-amortized behaviour, or go into detail about the various assumptions
made when doing Big Oh analysis (e.g. that all "simple" operations take
constant time, which is not strictly speaking true).

Any of these things would have helped people understand your position
better. But you did *none* of these things. It seems to me that your
stated position is actually irrelevant to you, what you want is not
better data structures in Python, but for people to respond to your posts.

In other words, I suspect you are trolling.

If I am right, that certainly would explain your apparent inability to
understand the difference between "is" and == operators, your insistence
that object IDs are addresses, and your declaration that object identity
is philosophically untenable.

Steven D'Aprano · Mar 10, 2014

That's true, but is beside the point, which is that "when possible" is
not very meaningful.

It's meaningful. It refers not to ints, but the infinite number of
possible classes which might include augmented assignment. Some of them
will be immutable, in which case it is not possible for += etc. to be
performed in-place. Some of them will be mutable, but there won't be any
reasonable (or even unreasonable) way to perform += in-place.

But for some mutable classes, it will be possible to perform += in place,
in which case the docs say that they have to do so.

Of course not. list.append is documented as modifying the list.
str.upper is documented as returning a copy of the string.

Right. And += is documented as modifying the list too.

If the in-place behavior of += is held to be part of the interface, then
we must accept that += is not polymorphic across mutable and immutable
types,

I'm fine with that.

which in my mind largely* defeats the purpose of having it.

Not to my mind. I think that having augmented assignment is worthwhile
even if it behaves differently and incompatibly for different classes.
After all, so does * (multiplication):

py> x = 24
py> x*x
576

but:

py> x = []
py> x*x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'list'

I don't think that we ought to throw away * and I don't think we ought to
throw away *= either.

After all, there should be one -- and preferably only one -- obvious way
to do it. If you want in-place concatenation, the obvious way to do it
is by calling extend. If you want copy concatenation, the obvious way
to do it is with the + operator. Why then should not just mutable
sequences but immutable sequences as well even offer the += operator?

*shrug*

I can see that each individual operation:

list.extend
list +
list +=

makes good sense in isolation, but I can also see that the combination
don't quite gel together as smoothly as we might hope.

Marko Rauhamaa · Mar 10, 2014

Steven D'Aprano said:
If I am right, that certainly would explain your apparent inability to
understand the difference between "is" and == operators, your
insistence that object IDs are addresses, and your declaration that
object identity is philosophically untenable.

You and I certainly have a very hard time communicating. Your
paraphrasing of my positions demonstrates that.

Marko

Mark Lawrence · Mar 10, 2014

In other words, I suspect you are trolling.

s/suspect/know/ , he didn't make captain of my dream team for nothing,
you know

Ned Batchelder · Mar 10, 2014

You and I certainly have a very hard time communicating. Your
paraphrasing of my positions demonstrates that.

Marko, welcome to the community. We enjoy technical discourse, even
debate. It's clear that you bring a lot of knowledge, intelligence, and
passion. But you have been involved in a number of long contentious
threads in the last few weeks.

You are right that you and Steven have had a hard time communicating.
You are part of "you and Steven", it would be at least polite to
consider that part of the reason for the difficulty has to do with your
style. It can be brief and contrarian, which puts people off. Perhaps
if you tried to understand the gap and bridge it more, people would be
less inclined to think that you were trying to widen the gap.

--Ned.

cmath, __float__ and __complex__	1	Feb 15, 2007
Issues with `codecs.register` and `codecs.CodecInfo` objects	2	Jul 6, 2012
Searching and manipulating lists of tuples	7	Jun 12, 2006
Lists and Tuples and Much More	15	Apr 12, 2007
Unsupported operand type(s) for +: 'float' and 'tuple'	6	Jun 10, 2011
Using tuples correctly?	3	Oct 11, 2004
__getattr__ and functions that don't exist	6	May 25, 2006
Lost in __setstate__() in C++ and swig	1	Sep 11, 2007

Tuples and immutability

Dan Stromberg

Marko Rauhamaa

Dan Stromberg

Marko Rauhamaa

Gregory Ewing

Dan Stromberg

Marko Rauhamaa

Terry Reedy

Ian Kelly

Ian Kelly

Steven D'Aprano

Steven D'Aprano

Steven D'Aprano

Marko Rauhamaa

Ian Kelly

Steven D'Aprano

Steven D'Aprano

Marko Rauhamaa

Mark Lawrence

Ned Batchelder

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads