Why are there no ordered dictionaries?

Mike Meyer · Nov 21, 2005

Christoph Zwerschke said:
The 'sorted' function does not help in the case I have indicated,
where "I do not want the keys to be sorted alphabetically, but
according to some criteria which cannot be derived from the keys
themselves."

And how would an ordered dictionary help in this case?

<mike

bonono · Nov 21, 2005

Alex said:
there are at least two behaviour. What I need is a "preferred order".
Say if I have designed a web form(correspond to a database table), I
just want say 3 fields that goes before anything else in the
presentation. The rest I don't care as the DBA may create more fields
later which I don't want to then update my code yet again.

Click to expand...

preferred_fields = ['foo', 'bar', 'baz']

def process_preferred_fields():
global preferred_fields
temp = {}
for i, field in enumerate(preferred_fields):
temp[field] = '%s%s' % (chr(0), chr(i))
preferred_fields = temp
process_preferred_fields()
del process_preferred_fields

def sort_key(akey, preferred_fields=preferred_fields):
return preferred_fields.get(akey, akey)
del preferred_fields

## ...build dictionary d...

# now output d...:
for k in sorted(d, key=sort_key):
print k, d[k]

Season to taste if you want non-preferred fields emitted other than
alphabetically, or if you want to wrap this stuff into a class, etc.
(Note: untested code, so typos &c are quite possible). This assumes
that no 'real' key is a non-string, and no 'real' key starts with
chr(0), but it's quite easy to tweak for slightly different specs (at
worst by defining a custom type designed to always compare less than any
real key, and wrapping the preferred_fields entry in instances of that
custom type... having such instances compare with each other based on
the index within preferred_fields of the key they're wrapping, etc etc).

Thanks. For me, I don't need such complex thing, it is just like :

d = somedict_from_db()
prefer=['f','a',b']

def my_order(d):
for x in prefer:
if x in d: yield x
s = frozenset(prefer)
for x in d:
if x not in s: yield x

Peter Hansen · Nov 21, 2005

Using the same logic, we don't need types other than string in a DBMS
as we can always convert a string field into some other types when it
is needed.

You mean, like SQLite does? (http://www.sqlite.org/datatypes.html)

-Peter

bonono · Nov 21, 2005

Peter said:
You mean, like SQLite does? (http://www.sqlite.org/datatypes.html)

Yup, they are using similar logic.

Bengt Richter · Nov 21, 2005

That depends. Maybe I do not want the keys to be sorted alphabetically,
but according to some criteria which cannot be derived from the keys
themselves.

You mean involving also the values? What's wrong with
sorted(plaindict.items(), key=your_ordering_function) ?
(('a', 97),)
(('c', 99),)
(('b', 98),)
(('d', 100),)
[('a', 97), ('c', 99), ('b', 98), ('d', 100)]

What key function would you like, to generate the value that is actually used
to define the ordering?

>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:t[0]) [('a', 97), ('b', 98), ('c', 99), ('d', 100)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:t[1]) [('a', 97), ('b', 98), ('c', 99), ('d', 100)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:-t[1]) [('d', 100), ('c', 99), ('b', 98), ('a', 97)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:t[1]&1) [('b', 98), ('d', 100), ('a', 97), ('c', 99)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda tt[1]&1,t[1])) [('b', 98), ('d', 100), ('a', 97), ('c', 99)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda tt[1]&1,-t[1]))

Click to expand...

Click to expand...

[('d', 100), ('b', 98), ('c', 99), ('a', 97)]
And being able to reverse the end result is handy

>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda tt[1]&1,-t[1]), reverse=True)

Click to expand...

Click to expand...

[('a', 97), ('c', 99), ('b', 98), ('d', 100)]

You may need to upgrade your Python though ;-)

Regards,
Bengt Richter

bonono · Nov 21, 2005

Bengt said:
You mean involving also the values? What's wrong with
sorted(plaindict.items(), key=your_ordering_function) ?

Not according to the content of the data, not just the "key". Or in
other words, some other metadata that is not present in the data. A
typical thing, like order of creation. Or some arbitary order. For
example :

I present a data grid/table in a HTML form and the user just drag and
drop and rearrange the columns order.

Of course, you may say, just put another column that represent
this(some reporting programs I have seen do it this way) and that is an
option but not the only option.

Ben Finney · Nov 21, 2005

[sort by] some other metadata that is not present in the data.
[...]
Of course, you may say, just put another column that represent
this(some reporting programs I have seen do it this way) and that is
an option but not the only option.

It's a pretty good option, and very commonly used. It's known as the
"Schwartzian transform", or more descriptively, the "Decorate, Sort,
Undecorate" pattern.

<URL:http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52234>

bonono · Nov 21, 2005

Ben said:
[sort by] some other metadata that is not present in the data.
[...]
Of course, you may say, just put another column that represent
this(some reporting programs I have seen do it this way) and that is
an option but not the only option.

Click to expand...

It's a pretty good option, and very commonly used. It's known as the
"Schwartzian transform", or more descriptively, the "Decorate, Sort,
Undecorate" pattern.

Whether it is a good option is judged by the person implement it as he
is the one seeing the whole thing, and not some snippet(or concept) on
the usenet.

Fredrik Lundh · Nov 21, 2005

Fredrik said:
Fredrik said:

but you can easily generate an index when you need it:

index = dict(d)

name, type = index["pid"]
print name

the index should take less than a microsecond to create, and since it
points to the members of the original dict, it doesn't use much memory
either...

Click to expand...

Using the same logic, we don't need types other than string in a DBMS
as we can always convert a string field into some other types when it
is needed.

No, that's not the same logic. The dict() in my example doesn't convert be-
tween data types; it provides a new way to view an existing data structure.
There's no parsing involved, nor any type guessing. And given the use case,
it's more than fast enough, and doesn't copy any data.

If you think that's the same thing as parsing strings, you've completely missed
the point.

</F>

bonono · Nov 21, 2005

Fredrik said:
Fredrik said:

but you can easily generate an index when you need it:

index = dict(d)

name, type = index["pid"]
print name

the index should take less than a microsecond to create, and since it
points to the members of the original dict, it doesn't use much memory
either...

Click to expand...

Using the same logic, we don't need types other than string in a DBMS
as we can always convert a string field into some other types when it
is needed.

Click to expand...

No, that's not the same logic. The dict() in my example doesn't convert be-
tween data types; it provides a new way to view an existing data structure.
There's no parsing involved, nor any type guessing. And given the use case,
it's more than fast enough, and doesn't copy any data.

If you think that's the same thing as parsing strings, you've completely missed
the point.

Well, forget about the missing/not missing the point.

My point is, there are various of reasons why we need different data
types in an RDBMS, just the same as why we need list, dict. There is
nothing stop me from using a list as dict(just scan it till I find it),
why would I I create a dict(your new view of the same data) ? Coding
convenience, speed or whatever.

If I need the dict feature 90% of the time, and the list feature 10% of
the time. I want an ordered dict. Rather than a list and create this
new view every time and every where I want to use it as a dict.

parsing or not parsing is not the point, and parsing/converting is
still "create a new view" of an existing data structure.

Fredrik Lundh · Nov 21, 2005

If I need the dict feature 90% of the time, and the list feature 10% of
the time.

Wasn't your use case that you wanted to specify form fields in
a given order (LIST), render a default view of the form in that
order (LIST), and, later on, access the field specifiers in an
arbitrary order, based on their key (DICT). Sure looks like it's
the LIST aspect that's important here...

("but assume that I have some other use case" isn't a valid use
case)

I want an ordered dict. Rather than a list and create this new view every
time and every where I want to use it as a dict.

You want an ordered dict because you say you want one, not be-
cause it's the best way to address your use case. That's fine, but
it's not really related to the question asked in the subject line.

parsing or not parsing is not the point, and parsing/converting is
still "create a new view" of an existing data structure.

Copying the entire data structure hardly qualifies as "creating a
new view". dict() doesn't do that; in this use case, it doesn't cost
you anything to use it.

Everything has a cost in Python. Things aren't free just because
they're implemented by some other module.

But when things are free, they're often a good choice.

</F>

bonono · Nov 21, 2005

Fredrik said:
Wasn't your use case that you wanted to specify form fields in
a given order (LIST), render a default view of the form in that
order (LIST), and, later on, access the field specifiers in an
arbitrary order, based on their key (DICT). Sure looks like it's
the LIST aspect that's important here...

Yes. But whether LIST aspect or DICT is important is well, opinion. So
let's leave it there.

You want an ordered dict because you say you want one, not be-
cause it's the best way to address your use case. That's fine, but
it's not really related to the question asked in the subject line.

Again, best way is decided by ME. If I am entering a coding contest
which is organized by YOU, that is a different story. As for related to
the subject line, since when I said my preference or use case has
anything to do with the subject line ? I have said in another post that
I don't think there should be one in the standard library, which is
directly about the subject line.

Copying the entire data structure hardly qualifies as "creating a
new view". dict() doesn't do that; in this use case, it doesn't cost
you anything to use it.

doesn't cost me anything ? That is good news to me.

Antoon Pardon · Nov 21, 2005

Op 2005-11-21 said:
I have started the thread in the first place because I believed it is
pretty unabmiguous what an "ordered dictionary" is and how it should
behave. That's why I asked myself why something that straigthforward has
not been added to the standard lib yet. Maybe I'm wrong; I must admit
that I haven't meditated about it very much.

Well it doesn't seem that obvious, because the two recipes you have
gotten, do something different from what I understand as an ordered
dictionary.

The two recipes order the keys by insertion order.

My idea would have been that some order was defined
on your keys in advance and that when you iterated over
the dictionary, the results would be ordered in sequence
of key order.

Do you have an example for different options of behavior?

Well you have two above. Maybe someone can think of something
else.

Which behaviour are you looking for?

Ben Sizer · Nov 21, 2005

Fredrik said:
No, that's not the same logic. The dict() in my example doesn't convert be-
tween data types; it provides a new way to view an existing data structure.

This is interesting; I would have thought that the tuple is read and a
dictionary created by inserting each pair sequentially. Is this not the
case?

Fredrik Lundh · Nov 21, 2005

Ben said:
This is interesting; I would have thought that the tuple is read and a
dictionary created by inserting each pair sequentially. Is this not the
case?

pointers to the members of each pair, yes. but a pointer copy is a
cheap operation (for the given use case, we're only talking about a
few dozen pairs anyway, at the most).

this is a common fallacy; Python programmers underestimate the
cost of adding extra layers to their code (e.g. by using an ordered
dict data structure that has to incrementally update both a list and
a dictionary), and overestimate the cost of letting the C layer do
bulk operations.

(as an example, on my machine, using Foord's OrderedDict class
on Zwerschke's example, creating the dictionary in the first place
takes 5 times longer than the index approach, and accessing an
item takes 3 times longer. you can in fact recreate the index 6
times before OrderedDict is faster; if you keep the index around,
the OrderedDict approach never wins...)

</F>

Fuzzyman · Nov 21, 2005

Fredrik said:
Ben said:

This is interesting; I would have thought that the tuple is read and a
dictionary created by inserting each pair sequentially. Is this not the
case?

Click to expand...

[snip..]
(as an example, on my machine, using Foord's OrderedDict class
on Zwerschke's example, creating the dictionary in the first place
takes 5 times longer than the index approach, and accessing an
item takes 3 times longer. you can in fact recreate the index 6
times before OrderedDict is faster; if you keep the index around,
the OrderedDict approach never wins...)

So, so long as you want to use the dictionary less than six times -
it's faster to store/access it as a list of tuples. ;-)

Everytime you want to access (or assign to) the data structure as a
dictionary, you have to re-create the index.

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

Alex Martelli · Nov 21, 2005

d = somedict_from_db()
prefer=['f','a',b']

def my_order(d):
for x in prefer:
if x in d: yield x
s = frozenset(prefer)
for x in d:
if x not in s: yield x

Yes, a much cleaner architecture (if you don't need any sorting on
non-preferred keys of d) than the ponderous general one I proposed. A
'key' approach with this behavior would be:

def my_key(k):
try: return prefer.index(k)
except ValueError: return len(prefer)

Now, 'for x in sorted(d, key=my_key)' should be equivalent to 'for x in
my_order(d)' thanks to the stability of sorting when the 'key' callable
returns equal values.

Neither of these way-simpler approaches is (I suspect) optimal for
speed, in the unlikely event one cares about that. The idea of
preprocessing the 'preferred' list once and for all outside of the
function (which I used heavily in my previous post) might yield some
speed-up, for example:

def my_key_fast(k, _aux=dict((k,i) for i,k in enumerate(prefer),
_l=len(prefer)):
return _aux.get(k, _l)

It's very unlikely that this situation warrants such optimization, of
course, I'm just "thinking aloud" about abstract possibilities.

Alex

Fredrik Lundh · Nov 21, 2005

Fuzzyman said:
[snip..]

(as an example, on my machine, using Foord's OrderedDict class
on Zwerschke's example, creating the dictionary in the first place
takes 5 times longer than the index approach, and accessing an
item takes 3 times longer. you can in fact recreate the index 6
times before OrderedDict is faster; if you keep the index around,
the OrderedDict approach never wins...)

Click to expand...

So, so long as you want to use the dictionary less than six times -
it's faster to store/access it as a list of tuples. ;-)

nope. that's not what I said. I said that you can recreate the index
six times in the time it takes to create a single OrderedDict instance.
if you need to use index more than that, it's not that hard to keep a
reference to it.

Everytime you want to access (or assign to) the data structure as a
dictionary, you have to re-create the index.

the use case we're talking about here (field descriptors) doesn't involve
assigning to the data structure, once it's created.

I'll repeat this one last time: for the use cases presented by Zwerschke
and "bonono", using a list as the master data structure, and creating the
dictionary on demand, is a lot faster than using a ready-made ordered
dict implementation. if you will access things via the dictionary a lot,
you can cache the dictionary somewhere. if not, you can recreate it
several times and still get a net win.

for other use cases, things may be different, but nobody has presented
such a use case yet. as I said earlier, "let's assume we have another use
case" is not a valid use case.

</F>

Christoph Zwerschke · Nov 21, 2005

Mike said:
> And how would an ordered dictionary help in this case?

Maybe there is some confusion between an "orderable" and an "ordered"
dictionary. When I talk about "ordered dictionary", then in the simplest
case I just set up my ordered dictionary with my preferred key order and
it stays like that. This allows me to later iterate through the
dictionary in this preferred order, while being still able to randomly
access data from the dictionary at other places.

-- Christoph

Alex Martelli · Nov 21, 2005

Fredrik Lundh said:
("but assume that I have some other use case" isn't a valid use
case)

+1 QOTW

Alex

PEP 372 -- Adding an ordered directory to collections	28	Jun 16, 2008
Why do only callable objects get a __name__?	10	Nov 18, 2013
Efficient lookup in list of dictionaries	2	Dec 5, 2005
Why are there no static virtual functions?	15	Jan 17, 2007
Why is there no instancemethod builtin?	15	Jun 17, 2005
Python-URL! - weekly Python news and links (Nov 3)	1	Nov 3, 2009
Python-URL! - weekly Python news and links (Nov 24)	2	Nov 24, 2009
Python-URL! - weekly Python news and links (Nov 17)	14	Nov 17, 2008

Why are there no ordered dictionaries?

Mike Meyer

bonono

Peter Hansen

bonono

Bengt Richter

bonono

Ben Finney

bonono

Fredrik Lundh

bonono

Fredrik Lundh

bonono

Antoon Pardon

Ben Sizer

Fredrik Lundh

Fuzzyman

Alex Martelli

Fredrik Lundh

Christoph Zwerschke

Alex Martelli

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads