Why are there no ordered dictionaries?

  • Thread starter Christoph Zwerschke
  • Start date
M

Mike Meyer

Christoph Zwerschke said:
The 'sorted' function does not help in the case I have indicated,
where "I do not want the keys to be sorted alphabetically, but
according to some criteria which cannot be derived from the keys
themselves."

And how would an ordered dictionary help in this case?

<mike
 
B

bonono

Alex said:
there are at least two behaviour. What I need is a "preferred order".
Say if I have designed a web form(correspond to a database table), I
just want say 3 fields that goes before anything else in the
presentation. The rest I don't care as the DBA may create more fields
later which I don't want to then update my code yet again.

preferred_fields = ['foo', 'bar', 'baz']

def process_preferred_fields():
global preferred_fields
temp = {}
for i, field in enumerate(preferred_fields):
temp[field] = '%s%s' % (chr(0), chr(i))
preferred_fields = temp
process_preferred_fields()
del process_preferred_fields

def sort_key(akey, preferred_fields=preferred_fields):
return preferred_fields.get(akey, akey)
del preferred_fields

## ...build dictionary d...

# now output d...:
for k in sorted(d, key=sort_key):
print k, d[k]

Season to taste if you want non-preferred fields emitted other than
alphabetically, or if you want to wrap this stuff into a class, etc.
(Note: untested code, so typos &c are quite possible). This assumes
that no 'real' key is a non-string, and no 'real' key starts with
chr(0), but it's quite easy to tweak for slightly different specs (at
worst by defining a custom type designed to always compare less than any
real key, and wrapping the preferred_fields entry in instances of that
custom type... having such instances compare with each other based on
the index within preferred_fields of the key they're wrapping, etc etc).
Thanks. For me, I don't need such complex thing, it is just like :

d = somedict_from_db()
prefer=['f','a',b']

def my_order(d):
for x in prefer:
if x in d: yield x
s = frozenset(prefer)
for x in d:
if x not in s: yield x
 
B

Bengt Richter

That depends. Maybe I do not want the keys to be sorted alphabetically,
but according to some criteria which cannot be derived from the keys
themselves.
You mean involving also the values? What's wrong with
sorted(plaindict.items(), key=your_ordering_function) ?
(('a', 97),)
(('c', 99),)
(('b', 98),)
(('d', 100),)
[('a', 97), ('c', 99), ('b', 98), ('d', 100)]

What key function would you like, to generate the value that is actually used
to define the ordering?
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:t[0]) [('a', 97), ('b', 98), ('c', 99), ('d', 100)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:t[1]) [('a', 97), ('b', 98), ('c', 99), ('d', 100)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:-t[1]) [('d', 100), ('c', 99), ('b', 98), ('a', 97)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:t[1]&1) [('b', 98), ('d', 100), ('a', 97), ('c', 99)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:(t[1]&1,t[1])) [('b', 98), ('d', 100), ('a', 97), ('c', 99)]
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:(t[1]&1,-t[1]))
[('d', 100), ('b', 98), ('c', 99), ('a', 97)]
And being able to reverse the end result is handy
>>> sorted(dict((c,ord(c)) for c in 'abcd').items(), key=lambda t:(t[1]&1,-t[1]), reverse=True)
[('a', 97), ('c', 99), ('b', 98), ('d', 100)]

You may need to upgrade your Python though ;-)

Regards,
Bengt Richter
 
B

bonono

Bengt said:
You mean involving also the values? What's wrong with
sorted(plaindict.items(), key=your_ordering_function) ?
Not according to the content of the data, not just the "key". Or in
other words, some other metadata that is not present in the data. A
typical thing, like order of creation. Or some arbitary order. For
example :

I present a data grid/table in a HTML form and the user just drag and
drop and rearrange the columns order.

Of course, you may say, just put another column that represent
this(some reporting programs I have seen do it this way) and that is an
option but not the only option.
 
B

Ben Finney

[sort by] some other metadata that is not present in the data.
[...]
Of course, you may say, just put another column that represent
this(some reporting programs I have seen do it this way) and that is
an option but not the only option.

It's a pretty good option, and very commonly used. It's known as the
"Schwartzian transform", or more descriptively, the "Decorate, Sort,
Undecorate" pattern.

<URL:http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52234>
 
B

bonono

Ben said:
[sort by] some other metadata that is not present in the data.
[...]
Of course, you may say, just put another column that represent
this(some reporting programs I have seen do it this way) and that is
an option but not the only option.

It's a pretty good option, and very commonly used. It's known as the
"Schwartzian transform", or more descriptively, the "Decorate, Sort,
Undecorate" pattern.
Whether it is a good option is judged by the person implement it as he
is the one seeing the whole thing, and not some snippet(or concept) on
the usenet.
 
F

Fredrik Lundh

Fredrik said:
but you can easily generate an index when you need it:

index = dict(d)

name, type = index["pid"]
print name

the index should take less than a microsecond to create, and since it
points to the members of the original dict, it doesn't use much memory
either...
Using the same logic, we don't need types other than string in a DBMS
as we can always convert a string field into some other types when it
is needed.

No, that's not the same logic. The dict() in my example doesn't convert be-
tween data types; it provides a new way to view an existing data structure.
There's no parsing involved, nor any type guessing. And given the use case,
it's more than fast enough, and doesn't copy any data.

If you think that's the same thing as parsing strings, you've completely missed
the point.

</F>
 
B

bonono

Fredrik said:
Fredrik said:
but you can easily generate an index when you need it:

index = dict(d)

name, type = index["pid"]
print name

the index should take less than a microsecond to create, and since it
points to the members of the original dict, it doesn't use much memory
either...
Using the same logic, we don't need types other than string in a DBMS
as we can always convert a string field into some other types when it
is needed.

No, that's not the same logic. The dict() in my example doesn't convert be-
tween data types; it provides a new way to view an existing data structure.
There's no parsing involved, nor any type guessing. And given the use case,
it's more than fast enough, and doesn't copy any data.

If you think that's the same thing as parsing strings, you've completely missed
the point.

Well, forget about the missing/not missing the point.

My point is, there are various of reasons why we need different data
types in an RDBMS, just the same as why we need list, dict. There is
nothing stop me from using a list as dict(just scan it till I find it),
why would I I create a dict(your new view of the same data) ? Coding
convenience, speed or whatever.

If I need the dict feature 90% of the time, and the list feature 10% of
the time. I want an ordered dict. Rather than a list and create this
new view every time and every where I want to use it as a dict.

parsing or not parsing is not the point, and parsing/converting is
still "create a new view" of an existing data structure.
 
F

Fredrik Lundh

If I need the dict feature 90% of the time, and the list feature 10% of
the time.

Wasn't your use case that you wanted to specify form fields in
a given order (LIST), render a default view of the form in that
order (LIST), and, later on, access the field specifiers in an
arbitrary order, based on their key (DICT). Sure looks like it's
the LIST aspect that's important here...

("but assume that I have some other use case" isn't a valid use
case)
I want an ordered dict. Rather than a list and create this new view every
time and every where I want to use it as a dict.

You want an ordered dict because you say you want one, not be-
cause it's the best way to address your use case. That's fine, but
it's not really related to the question asked in the subject line.
parsing or not parsing is not the point, and parsing/converting is
still "create a new view" of an existing data structure.

Copying the entire data structure hardly qualifies as "creating a
new view". dict() doesn't do that; in this use case, it doesn't cost
you anything to use it.

Everything has a cost in Python. Things aren't free just because
they're implemented by some other module.

But when things are free, they're often a good choice.

</F>
 
B

bonono

Fredrik said:
Wasn't your use case that you wanted to specify form fields in
a given order (LIST), render a default view of the form in that
order (LIST), and, later on, access the field specifiers in an
arbitrary order, based on their key (DICT). Sure looks like it's
the LIST aspect that's important here...
Yes. But whether LIST aspect or DICT is important is well, opinion. So
let's leave it there.
You want an ordered dict because you say you want one, not be-
cause it's the best way to address your use case. That's fine, but
it's not really related to the question asked in the subject line.
Again, best way is decided by ME. If I am entering a coding contest
which is organized by YOU, that is a different story. As for related to
the subject line, since when I said my preference or use case has
anything to do with the subject line ? I have said in another post that
I don't think there should be one in the standard library, which is
directly about the subject line.
Copying the entire data structure hardly qualifies as "creating a
new view". dict() doesn't do that; in this use case, it doesn't cost
you anything to use it.
doesn't cost me anything ? That is good news to me.
 
A

Antoon Pardon

Op 2005-11-21 said:
I have started the thread in the first place because I believed it is
pretty unabmiguous what an "ordered dictionary" is and how it should
behave. That's why I asked myself why something that straigthforward has
not been added to the standard lib yet. Maybe I'm wrong; I must admit
that I haven't meditated about it very much.

Well it doesn't seem that obvious, because the two recipes you have
gotten, do something different from what I understand as an ordered
dictionary.

The two recipes order the keys by insertion order.

My idea would have been that some order was defined
on your keys in advance and that when you iterated over
the dictionary, the results would be ordered in sequence
of key order.
Do you have an example for different options of behavior?

Well you have two above. Maybe someone can think of something
else.

Which behaviour are you looking for?
 
B

Ben Sizer

Fredrik said:
No, that's not the same logic. The dict() in my example doesn't convert be-
tween data types; it provides a new way to view an existing data structure.

This is interesting; I would have thought that the tuple is read and a
dictionary created by inserting each pair sequentially. Is this not the
case?
 
F

Fredrik Lundh

Ben said:
This is interesting; I would have thought that the tuple is read and a
dictionary created by inserting each pair sequentially. Is this not the
case?

pointers to the members of each pair, yes. but a pointer copy is a
cheap operation (for the given use case, we're only talking about a
few dozen pairs anyway, at the most).

this is a common fallacy; Python programmers underestimate the
cost of adding extra layers to their code (e.g. by using an ordered
dict data structure that has to incrementally update both a list and
a dictionary), and overestimate the cost of letting the C layer do
bulk operations.

(as an example, on my machine, using Foord's OrderedDict class
on Zwerschke's example, creating the dictionary in the first place
takes 5 times longer than the index approach, and accessing an
item takes 3 times longer. you can in fact recreate the index 6
times before OrderedDict is faster; if you keep the index around,
the OrderedDict approach never wins...)

</F>
 
F

Fuzzyman

Fredrik said:
Ben said:
This is interesting; I would have thought that the tuple is read and a
dictionary created by inserting each pair sequentially. Is this not the
case?
[snip..]
(as an example, on my machine, using Foord's OrderedDict class
on Zwerschke's example, creating the dictionary in the first place
takes 5 times longer than the index approach, and accessing an
item takes 3 times longer. you can in fact recreate the index 6
times before OrderedDict is faster; if you keep the index around,
the OrderedDict approach never wins...)

So, so long as you want to use the dictionary less than six times -
it's faster to store/access it as a list of tuples. ;-)

Everytime you want to access (or assign to) the data structure as a
dictionary, you have to re-create the index.

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
A

Alex Martelli

d = somedict_from_db()
prefer=['f','a',b']

def my_order(d):
for x in prefer:
if x in d: yield x
s = frozenset(prefer)
for x in d:
if x not in s: yield x

Yes, a much cleaner architecture (if you don't need any sorting on
non-preferred keys of d) than the ponderous general one I proposed. A
'key' approach with this behavior would be:

def my_key(k):
try: return prefer.index(k)
except ValueError: return len(prefer)

Now, 'for x in sorted(d, key=my_key)' should be equivalent to 'for x in
my_order(d)' thanks to the stability of sorting when the 'key' callable
returns equal values.

Neither of these way-simpler approaches is (I suspect) optimal for
speed, in the unlikely event one cares about that. The idea of
preprocessing the 'preferred' list once and for all outside of the
function (which I used heavily in my previous post) might yield some
speed-up, for example:

def my_key_fast(k, _aux=dict((k,i) for i,k in enumerate(prefer),
_l=len(prefer)):
return _aux.get(k, _l)

It's very unlikely that this situation warrants such optimization, of
course, I'm just "thinking aloud" about abstract possibilities.


Alex
 
F

Fredrik Lundh

Fuzzyman said:
[snip..]
(as an example, on my machine, using Foord's OrderedDict class
on Zwerschke's example, creating the dictionary in the first place
takes 5 times longer than the index approach, and accessing an
item takes 3 times longer. you can in fact recreate the index 6
times before OrderedDict is faster; if you keep the index around,
the OrderedDict approach never wins...)

So, so long as you want to use the dictionary less than six times -
it's faster to store/access it as a list of tuples. ;-)

nope. that's not what I said. I said that you can recreate the index
six times in the time it takes to create a single OrderedDict instance.
if you need to use index more than that, it's not that hard to keep a
reference to it.
Everytime you want to access (or assign to) the data structure as a
dictionary, you have to re-create the index.

the use case we're talking about here (field descriptors) doesn't involve
assigning to the data structure, once it's created.

I'll repeat this one last time: for the use cases presented by Zwerschke
and "bonono", using a list as the master data structure, and creating the
dictionary on demand, is a lot faster than using a ready-made ordered
dict implementation. if you will access things via the dictionary a lot,
you can cache the dictionary somewhere. if not, you can recreate it
several times and still get a net win.

for other use cases, things may be different, but nobody has presented
such a use case yet. as I said earlier, "let's assume we have another use
case" is not a valid use case.

</F>
 
C

Christoph Zwerschke

Mike said:
> And how would an ordered dictionary help in this case?

Maybe there is some confusion between an "orderable" and an "ordered"
dictionary. When I talk about "ordered dictionary", then in the simplest
case I just set up my ordered dictionary with my preferred key order and
it stays like that. This allows me to later iterate through the
dictionary in this preferred order, while being still able to randomly
access data from the dictionary at other places.

-- Christoph
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,274
Messages
2,571,366
Members
48,051
Latest member
noblesalt

Latest Threads

Top