Ordered dict by default

bearophileHUGS · Feb 5, 2009

Choosing the right data structure is usually a matter of compromises,
and sometimes the best you can do is to change some data structures
and look for the faster running time. To do this it helps to have a
language that allows you to swap data structures in the more
transparent way possible.

It's not just a matter of the specific program, it's also a matter of
the nature of the programming language. Python isn't meant to be high-
performance in running time, it's meant to be easy to use, flexible,
and easy to understand. (Ruby shares almost the same purposes, but it
looks a bit less easy, a bit more flexible, and offers a bit more
freedom, and it used to be a little slower).

Now Ruby dicts are ordered by default:
http://www.igvita.com/2009/02/04/ruby-19-internals-ordered-hash/

Of course Python can grow odicts into its collections module, that can
be used everywhere a normal dict is used. But while this may be a good
solution for faster languages like Java, C# and D, for me it doesn't
sound like the best solution for Python.

That page about Ruby dicts show a higher traversal speed (probably
just because the CPU has to scan less memory, but I am not sure,
because mordern CPUs are very complex) but lower insertion speed (I
think mostly not because the added management of two pointers, but
because the memory used increases, so there are more cache misses. If
this turns out as true, then using the xor trick may halve the extra
memory needed). I have no idea if such different balance of
performance will lead to on average faster or slower Python programs,
this has to be tested. But even if the average performance becomes a
little worse I think making the default Python dict as ordered is a
positive change for Python too, because built-ins are meant to be as
flexible as possible, even if they aren't the fastest ones or the less
memory hungry ones (and keeping dicts ordered decreases the surprises
a newbie finds, makes some code cleaner, and doctests become simpler
to write because the order of items may be more defined).

Once the default dicts are ordered, it can be possible to add an
unordereddict to the collections module to be used by programmers when
max performance or low memory usage is very important

Namespaces and other internal things of Python can keep using the un-
ordereddicts.

I don't know what to do regarding Python sets yet. In mathematics sets
aren't ordered, so they may be kept as they are now.

Bye,
bearophile

Steve Holden · Feb 5, 2009

Once the default dicts are ordered, it can be possible to add an
unordereddict to the collections module to be used by programmers when
max performance or low memory usage is very important

I have no real idea why you think it's desirable to turn this important
feature on its head, but if you've followed the various past threads on
ordered dicts you would know that one of the prime issues every time
this comes up is the inability of the various proponents to agree no ow
dicts should actually be ordered.

Namespaces and other internal things of Python can keep using the un-
ordereddicts.

I don't know what to do regarding Python sets yet. In mathematics sets
aren't ordered, so they may be kept as they are now.

In mathematics mappings aren't ordered either, and a pure dict is pretty
much a mapping. So leave them alone, they are fine as they are!

regards
Steve

Paul Rubin · Feb 5, 2009

Now Ruby dicts are ordered by default:
http://www.igvita.com/2009/02/04/ruby-19-internals-ordered-hash/

Maybe I didn't read that carefully enough, but it looks like "ordered"
means the dict records come out in the same order you inserted them
in. That is if you insert B,A,D,C in that order, you get them out in
that order. I would have thought an ordered dict meant you get A,B,C,D,
which seems a lot more useful.

Red-black trees as described at the bottom of the page are a good way
to get A,B,C,D. They also make it simple to have "persistent" or
"functional" dictionaries, which I've written about a few times before.

Paul Rubin · Feb 5, 2009

Steve Holden said:
In mathematics mappings aren't ordered either, and a pure dict is pretty
much a mapping. So leave them alone, they are fine as they are!

Ehhh, an ordered dict has to support a comparison operation on the
keys, while a Python dict has to support a hashing operation on them.
Neither really captures the idea of a pure mapping, which shouldn't
have to support any operations (even equality) on the keys.

There have been several times when I've wished for persistent ordered
dicts in Python. I started writing a big post about it a while ago
and maybe I'll finish it someday.

Paul Rubin · Feb 5, 2009

Duncan Booth said:
If you want to write doctests then any stable order in the default dict
type would be helpful no matter whether it means that keys are in original
insertion or latest insertion order or sorted.

Just use "sorted" in the test code: [0, 1, 4, 9, 16]

There are other use cases where maintaining insertion order is
important. For example any automated code generator which parses
its previous output and then regenerates it maintaining edits. I ran
into that before with one which just stored the methods in a Python
dict so that every so often you'd find the entire source file had
rearranged itself making a mess of version control.

Still doesn't sound so bad: just include a counter with the value.
Something like (untested):

odict = dict(zip(methodnames, zip(methodbodies, itertools.count())))
...
gen_code(sorted(odict.iteritems(), key=lambda(name,(body,n)): n))

It would certainly be an interesting experiment to mimic the Ruby dict
implementation in Python's dict code and see what effect it has on
performance: the Ruby page claims that while inserts are slower traversal
is much faster. I think it would need some pretty convincing arguments
though to become the default.

In my usage of dicts, traversal is rather rare, but YMMV.

andrew cooke · Feb 5, 2009

so what is happening with pep 372?

http://www.python.org/dev/peps/pep-0372/

Stephen Hansen · Feb 5, 2009

That page about Ruby dicts show a higher traversal speed (probably

just because the CPU has to scan less memory, but I am not sure,
because mordern CPUs are very complex) but lower insertion speed (I
think mostly not because the added management of two pointers, but
because the memory used increases, so there are more cache misses. If
this turns out as true, then using the xor trick may halve the extra
memory needed). I have no idea if such different balance of
performance will lead to on average faster or slower Python programs,
this has to be tested. But even if the average performance becomes a
little worse I think making the default Python dict as ordered is a
positive change for Python too, because built-ins are meant to be as
flexible as possible, even if they aren't the fastest ones or the less
memory hungry ones (and keeping dicts ordered decreases the surprises
a newbie finds, makes some code cleaner, and doctests become simpler
to write because the order of items may be more defined).

Er, doing a cursory review of my code has an overwhelmingly higher incidence
of insertions then traversals with my dictionaries: many dictionaries are
never traversed and when they are it is only once, and often had tens to
thousands of insertions before that traversal happens.

So a slight decrease in insertion speed at the cost of faster traversal seems
like a complete loss to me.

Now, I also do recognize the utility of ordered dictionaries in some cases, but
exactly what you mean by ordered varies. I have two cases where "ordered"
has the keys are in a specific custom order. I have four cases where "ordered"
means maintaining insertion order. For the former I do sorted(dict) on access
when I need to do something order-related. For the latter I have a simple
ordered dictionary implementation that maintains the keys as a separate list
attached to the dictionary.

But interestingly enough to me, in all the cases where I use an ordered dict
in my code, performance is completely irrelevant. Its tens to hundreds of
items usually, and the best optimization in the world wouldn't add more then
an imperceptible fraction of a second, and the memory lost to the extra
list adds up to nothing relevant.

I'm not claiming my use cases are everyones, not by any means. But: this
seems like a complete lose-lose for an idea from my perspective. Sure you
say an unordered dictionary could then be added for my needs, but... I like
my syntactic sugar, thank you

Ordered dictionaries can be useful, its good to see a standard implementation
going into the core, but the default? I hope not!

--Stephen

Stephen Hansen · Feb 5, 2009

Now, I also do recognize the utility of ordered dictionaries in some cases, but

exactly what you mean by ordered varies. I have two cases where "ordered"
has the keys are in a specific custom order. I have four cases where "ordered"
means maintaining insertion order. For the former I do sorted(dict) on access
when I need to do something order-related. For the latter I have a simple
ordered dictionary implementation that maintains the keys as a separate list
attached to the dictionary.

Ooh, as an addendum... I found one case where I want insertion-and-update
order: meaning that its an ordered dictionary that maintains insertion
order, but
an update to a particular item moves that item to the back so an update behaves
like del d[key]; d[key] = value in terms of the key order.

"Ordered Dictionaries" are a bit fuzzy.

--S

MRAB · Feb 5, 2009

Paul said:
Maybe I didn't read that carefully enough, but it looks like "ordered"
means the dict records come out in the same order you inserted them
in. That is if you insert B,A,D,C in that order, you get them out in
that order. I would have thought an ordered dict meant you get A,B,C,D,
which seems a lot more useful.

[snip]
You're confusing "ordered" with "sorted"!

A list is an ordered collection, for example.

Bryan Olson · Feb 5, 2009

MRAB said:
Paul said:

Maybe I didn't read that carefully enough, but it looks like "ordered"
means the dict records come out in the same order you inserted them
in. That is if you insert B,A,D,C in that order, you get them out in
that order. I would have thought an ordered dict meant you get A,B,C,D,
which seems a lot more useful.

Click to expand...

[snip]
You're confusing "ordered" with "sorted"!

He's really not.

http://en.wikipedia.org/wiki/Ordered_set

A list is an ordered collection, for example.

True.

Bryan Olson · Feb 5, 2009

Stephen said:
Ooh, as an addendum... I found one case where I want insertion-and-update
order: meaning that its an ordered dictionary that maintains insertion
order, but
an update to a particular item moves that item to the back so an update behaves
like del d[key]; d[key] = value in terms of the key order.

"Ordered Dictionaries" are a bit fuzzy.

A few bits fuzzy. Is the following True or False if dict is insert-ordered?

dict(a=6, b=7) == dict(b=7, a=6)

bearophileHUGS · Feb 5, 2009

Bryan Olson:

A few bits fuzzy. Is the following True or False if dict is insert-ordered?
dict(a=6, b=7) == dict(b=7, a=6)

In my odict implementation I have disallowed that syntax because if
you want to define a mydict(**kwds) function that allows a syntax
like:

mydict(a=6, b=7)

it takes a dict as argument, such dict is currently not ordered. If
built-in dicts become ordered, then that dict too may become
ordered

Anyway, maybe it's good to ignore the inserting order of items when
you look for odict equality.

Bye,
bearophile

Steve Holden · Feb 5, 2009

Bryan Olson:

In my odict implementation I have disallowed that syntax because if
you want to define a mydict(**kwds) function that allows a syntax
like:

mydict(a=6, b=7)

it takes a dict as argument, such dict is currently not ordered. If
built-in dicts become ordered, then that dict too may become
ordered

Anyway, maybe it's good to ignore the inserting order of items when
you look for odict equality.

But then they wouldn't be equal. As I said right at the start if this
thread:

one of the prime issues every time
this comes up is the inability of the various proponents to agree on how
dicts should actually be ordered.

The remainder of the thread vindicates that comment.

regards
Steve

Terry Reedy · Feb 5, 2009

Paul said:
Maybe I didn't read that carefully enough, but it looks like "ordered"
means the dict records come out in the same order you inserted them
in. That is if you insert B,A,D,C in that order, you get them out in
that order.

This seems to have become the more common meaning of 'ordered dict'.

I would have thought an ordered dict meant you get A,B,C,D,
which seems a lot more useful.

I once thought that too, but this would be a 'sorted dict'.

tjr

Terry Reedy · Feb 5, 2009

andrew said:
so what is happening with pep 372?

http://www.python.org/dev/peps/pep-0372/

There seems to be a number of unanswered questions as to the exact
behavior (see Q and A section). The author needs to promote more
discussion by those interested, including here, and make a decision.

Efficiently building ordered dict	14	Feb 22, 2010
Chanelling Guido - dict subclasses	11	Jan 15, 2014
PEP 372 -- Adding an ordered directory to collections	28	Jun 16, 2008
Ordered dicts	6	Sep 26, 2006
1st Sketch at at ReadOnly dict	0	Jan 20, 2014
Unexpected comparisons in dict lookup	1	Mar 18, 2014
set and dict iteration	42	Aug 16, 2012
issubclass(dict, Mapping)	10	Dec 22, 2010

Ordered dict by default

bearophileHUGS

Steve Holden

Paul Rubin

Paul Rubin

Paul Rubin

andrew cooke

Stephen Hansen

Stephen Hansen

MRAB

Bryan Olson

Bryan Olson

bearophileHUGS

Steve Holden

Terry Reedy

Terry Reedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads