Ordered dict by default

B

bearophileHUGS

Choosing the right data structure is usually a matter of compromises,
and sometimes the best you can do is to change some data structures
and look for the faster running time. To do this it helps to have a
language that allows you to swap data structures in the more
transparent way possible.

It's not just a matter of the specific program, it's also a matter of
the nature of the programming language. Python isn't meant to be high-
performance in running time, it's meant to be easy to use, flexible,
and easy to understand. (Ruby shares almost the same purposes, but it
looks a bit less easy, a bit more flexible, and offers a bit more
freedom, and it used to be a little slower).

Now Ruby dicts are ordered by default:
http://www.igvita.com/2009/02/04/ruby-19-internals-ordered-hash/

Of course Python can grow odicts into its collections module, that can
be used everywhere a normal dict is used. But while this may be a good
solution for faster languages like Java, C# and D, for me it doesn't
sound like the best solution for Python.

That page about Ruby dicts show a higher traversal speed (probably
just because the CPU has to scan less memory, but I am not sure,
because mordern CPUs are very complex) but lower insertion speed (I
think mostly not because the added management of two pointers, but
because the memory used increases, so there are more cache misses. If
this turns out as true, then using the xor trick may halve the extra
memory needed). I have no idea if such different balance of
performance will lead to on average faster or slower Python programs,
this has to be tested. But even if the average performance becomes a
little worse I think making the default Python dict as ordered is a
positive change for Python too, because built-ins are meant to be as
flexible as possible, even if they aren't the fastest ones or the less
memory hungry ones (and keeping dicts ordered decreases the surprises
a newbie finds, makes some code cleaner, and doctests become simpler
to write because the order of items may be more defined).

Once the default dicts are ordered, it can be possible to add an
unordereddict to the collections module to be used by programmers when
max performance or low memory usage is very important :)

Namespaces and other internal things of Python can keep using the un-
ordereddicts.

I don't know what to do regarding Python sets yet. In mathematics sets
aren't ordered, so they may be kept as they are now.

Bye,
bearophile
 
S

Steve Holden

Once the default dicts are ordered, it can be possible to add an
unordereddict to the collections module to be used by programmers when
max performance or low memory usage is very important :)
I have no real idea why you think it's desirable to turn this important
feature on its head, but if you've followed the various past threads on
ordered dicts you would know that one of the prime issues every time
this comes up is the inability of the various proponents to agree no ow
dicts should actually be ordered.
Namespaces and other internal things of Python can keep using the un-
ordereddicts.

I don't know what to do regarding Python sets yet. In mathematics sets
aren't ordered, so they may be kept as they are now.
In mathematics mappings aren't ordered either, and a pure dict is pretty
much a mapping. So leave them alone, they are fine as they are!

regards
Steve
 
P

Paul Rubin


Maybe I didn't read that carefully enough, but it looks like "ordered"
means the dict records come out in the same order you inserted them
in. That is if you insert B,A,D,C in that order, you get them out in
that order. I would have thought an ordered dict meant you get A,B,C,D,
which seems a lot more useful.

Red-black trees as described at the bottom of the page are a good way
to get A,B,C,D. They also make it simple to have "persistent" or
"functional" dictionaries, which I've written about a few times before.
 
P

Paul Rubin

Steve Holden said:
In mathematics mappings aren't ordered either, and a pure dict is pretty
much a mapping. So leave them alone, they are fine as they are!

Ehhh, an ordered dict has to support a comparison operation on the
keys, while a Python dict has to support a hashing operation on them.
Neither really captures the idea of a pure mapping, which shouldn't
have to support any operations (even equality) on the keys.

There have been several times when I've wished for persistent ordered
dicts in Python. I started writing a big post about it a while ago
and maybe I'll finish it someday.
 
P

Paul Rubin

Duncan Booth said:
If you want to write doctests then any stable order in the default dict
type would be helpful no matter whether it means that keys are in original
insertion or latest insertion order or sorted.

Just use "sorted" in the test code: [0, 1, 4, 9, 16]
There are other use cases where maintaining insertion order is
important. For example any automated code generator which parses
its previous output and then regenerates it maintaining edits. I ran
into that before with one which just stored the methods in a Python
dict so that every so often you'd find the entire source file had
rearranged itself making a mess of version control.

Still doesn't sound so bad: just include a counter with the value.
Something like (untested):

odict = dict(zip(methodnames, zip(methodbodies, itertools.count())))
...
gen_code(sorted(odict.iteritems(), key=lambda(name,(body,n)): n))
It would certainly be an interesting experiment to mimic the Ruby dict
implementation in Python's dict code and see what effect it has on
performance: the Ruby page claims that while inserts are slower traversal
is much faster. I think it would need some pretty convincing arguments
though to become the default.

In my usage of dicts, traversal is rather rare, but YMMV.
 
S

Stephen Hansen

That page about Ruby dicts show a higher traversal speed (probably
just because the CPU has to scan less memory, but I am not sure,
because mordern CPUs are very complex) but lower insertion speed (I
think mostly not because the added management of two pointers, but
because the memory used increases, so there are more cache misses. If
this turns out as true, then using the xor trick may halve the extra
memory needed). I have no idea if such different balance of
performance will lead to on average faster or slower Python programs,
this has to be tested. But even if the average performance becomes a
little worse I think making the default Python dict as ordered is a
positive change for Python too, because built-ins are meant to be as
flexible as possible, even if they aren't the fastest ones or the less
memory hungry ones (and keeping dicts ordered decreases the surprises
a newbie finds, makes some code cleaner, and doctests become simpler
to write because the order of items may be more defined).

Er, doing a cursory review of my code has an overwhelmingly higher incidence
of insertions then traversals with my dictionaries: many dictionaries are
never traversed and when they are it is only once, and often had tens to
thousands of insertions before that traversal happens.

So a slight decrease in insertion speed at the cost of faster traversal seems
like a complete loss to me.

Now, I also do recognize the utility of ordered dictionaries in some cases, but
exactly what you mean by ordered varies. I have two cases where "ordered"
has the keys are in a specific custom order. I have four cases where "ordered"
means maintaining insertion order. For the former I do sorted(dict) on access
when I need to do something order-related. For the latter I have a simple
ordered dictionary implementation that maintains the keys as a separate list
attached to the dictionary.

But interestingly enough to me, in all the cases where I use an ordered dict
in my code, performance is completely irrelevant. Its tens to hundreds of
items usually, and the best optimization in the world wouldn't add more then
an imperceptible fraction of a second, and the memory lost to the extra
list adds up to nothing relevant.

I'm not claiming my use cases are everyones, not by any means. But: this
seems like a complete lose-lose for an idea from my perspective. Sure you
say an unordered dictionary could then be added for my needs, but... I like
my syntactic sugar, thank you :)

Ordered dictionaries can be useful, its good to see a standard implementation
going into the core, but the default? I hope not!

--Stephen
 
S

Stephen Hansen

Now, I also do recognize the utility of ordered dictionaries in some cases, but
exactly what you mean by ordered varies. I have two cases where "ordered"
has the keys are in a specific custom order. I have four cases where "ordered"
means maintaining insertion order. For the former I do sorted(dict) on access
when I need to do something order-related. For the latter I have a simple
ordered dictionary implementation that maintains the keys as a separate list
attached to the dictionary.

Ooh, as an addendum... I found one case where I want insertion-and-update
order: meaning that its an ordered dictionary that maintains insertion
order, but
an update to a particular item moves that item to the back so an update behaves
like del d[key]; d[key] = value in terms of the key order.

"Ordered Dictionaries" are a bit fuzzy. :)

--S
 
M

MRAB

Paul said:
Maybe I didn't read that carefully enough, but it looks like "ordered"
means the dict records come out in the same order you inserted them
in. That is if you insert B,A,D,C in that order, you get them out in
that order. I would have thought an ordered dict meant you get A,B,C,D,
which seems a lot more useful.
[snip]
You're confusing "ordered" with "sorted"! :)

A list is an ordered collection, for example.
 
B

Bryan Olson

MRAB said:
Paul said:
Maybe I didn't read that carefully enough, but it looks like "ordered"
means the dict records come out in the same order you inserted them
in. That is if you insert B,A,D,C in that order, you get them out in
that order. I would have thought an ordered dict meant you get A,B,C,D,
which seems a lot more useful.
[snip]
You're confusing "ordered" with "sorted"! :)

He's really not.

http://en.wikipedia.org/wiki/Ordered_set
A list is an ordered collection, for example.

True.
 
B

Bryan Olson

Stephen said:
Ooh, as an addendum... I found one case where I want insertion-and-update
order: meaning that its an ordered dictionary that maintains insertion
order, but
an update to a particular item moves that item to the back so an update behaves
like del d[key]; d[key] = value in terms of the key order.

"Ordered Dictionaries" are a bit fuzzy. :)

A few bits fuzzy. Is the following True or False if dict is insert-ordered?

dict(a=6, b=7) == dict(b=7, a=6)
 
B

bearophileHUGS

Bryan Olson:
A few bits fuzzy. Is the following True or False if dict is insert-ordered?
    dict(a=6, b=7) == dict(b=7, a=6)

In my odict implementation I have disallowed that syntax because if
you want to define a mydict(**kwds) function that allows a syntax
like:

mydict(a=6, b=7)

it takes a dict as argument, such dict is currently not ordered. If
built-in dicts become ordered, then that dict too may become
ordered :)

Anyway, maybe it's good to ignore the inserting order of items when
you look for odict equality.

Bye,
bearophile
 
S

Steve Holden

Bryan Olson:

In my odict implementation I have disallowed that syntax because if
you want to define a mydict(**kwds) function that allows a syntax
like:

mydict(a=6, b=7)

it takes a dict as argument, such dict is currently not ordered. If
built-in dicts become ordered, then that dict too may become
ordered :)

Anyway, maybe it's good to ignore the inserting order of items when
you look for odict equality.
But then they wouldn't be equal. As I said right at the start if this
thread:
one of the prime issues every time
this comes up is the inability of the various proponents to agree on how
dicts should actually be ordered.

The remainder of the thread vindicates that comment.

regards
Steve
 
T

Terry Reedy

Paul said:
Maybe I didn't read that carefully enough, but it looks like "ordered"
means the dict records come out in the same order you inserted them
in. That is if you insert B,A,D,C in that order, you get them out in
that order.

This seems to have become the more common meaning of 'ordered dict'.
I would have thought an ordered dict meant you get A,B,C,D,
which seems a lot more useful.

I once thought that too, but this would be a 'sorted dict'.

tjr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top