Why are there no ordered dictionaries?

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

Fredrik said:
(as an example, on my machine, using Foord's OrderedDict class
on Zwerschke's example, creating the dictionary in the first place
takes 5 times longer than the index approach, and accessing an
item takes 3 times longer. you can in fact recreate the index 6
times before OrderedDict is faster; if you keep the index around,
the OrderedDict approach never wins...)

You're right; I found creating a Larosa/Foord OrderedDict in this
example to be even 8 times slower than an ordinary dict. However, two
things need to be said here: 1) The dictionary in my exmaple was pretty
small (only 3 items), so you are not really measuring the performance of
the ordered dict, but mainly the overhead of using a user derived class
in comparison with the built-in dict type. And 2) the implementation by
Larosa/Foord is very slow and can be easily improved, for instance like
that:

def __init__(self, init_val = ()):
dict.__init__(self, init_val)
self.sequence = [x[0] for x in init_val]

With this change, creating the ordered dictionary is considerably faster
and for an average size dictionary, the factor of 8 pretty quickly
converges against 1.

But of course, it will always be slower since it is constructed on top
of the built-in dict. In end effect, you always have to maintain a
sequence *plus* a dictionary, which will be always slower than a sheer
dictionary. The ordered dictionary class just hides this uglyness of
having to maintain a dictionary plus a sequence, so it's rather an issue
of convenience in writing and reading programs than a performance issue.

It may be different if the ordered dict would be implemented directly as
an ordered hash table in C.

-- Christoph
 
C

Christoph Zwerschke

Alex said:
Note the plural in 'insertion orderS': some people care about the FIRST
time a key was added to a dict, some about the LAST time it was added,
some about the latest time it was 'first inserted' (added and wasn't
already there) as long as it's never been deleted since that occasion --
and these are just a few of the multifarious orders based on the time of
insertions and deletions of keys.

Ok, I start to understand that ambiguity emerges when you delete and
insert items. I didn't think much about this problem because my use
cases usually do not involve inserttion or deletion after the ordered
dictionary has been created.

But I think the following rule is "natural" enough to consider it as THE
standard behavior of ordered dictionaries:

"Insertion: If the key exists: Don't change the order. If it does not
exist: Append it to the sequence of keys. Deletion: Remove from the
sequence of keys."

I think this is also the behavior of associative arrays in PHP or Perl
and could be considered as the "ONE unambiguous definition".

-- Christoph
 
A

Aahz

I think you're wrong here. People in the past who have requested or
implemented stuff they called 'ordered dicts' in the past had in mind
drastically different things, based on some combination of insertion
orders, keys, and _values_. So, ambiguity is definitely present in the
phrase 'ordered dictionary', because there are so many different
criteria whereby the 'ordering' could take place.

Note the plural in 'insertion orderS': some people care about the FIRST
time a key was added to a dict, some about the LAST time it was added,
some about the latest time it was 'first inserted' (added and wasn't
already there) as long as it's never been deleted since that occasion --
and these are just a few of the multifarious orders based on the time of
insertions and deletions of keys.

Ayup. In our application, not only do we have ordered dicts, we also
have something called a "sectioned dict", which is a dict-like object
that also looks like a regular class instance with attribute access. The
section part actually has multiple dicts (the sections) which are
layered, so that a dict key in the top layer overrides the value of the
key in lower layers. We traditionally have used it such that the
sections are accessed in MRU orders; last week, we added a new feature
that allows setting section values without changing section order (to
allow setting a default, essentially).
 
T

Tom Anderson

Ah, but WHAT 'some criteria'? There's the rub! First insertion, last
insertion, last insertion that wasn't subsequently deleted, last
insertion that didn't change the corresponding value, or...???

All the requests for an ordered dictionary that i've seen on this group,
and all the cases where i've needed on myself, want one which behaves like
a list - order of first insertion, with no memory after deletion. Like the
Larosa-Foord ordered dict.

Incidentally, can we call that the "Larosa-Foord ordered mapping"? Then it
sounds like some kind of rocket science discrete mathematics stuff, which
(a) is cool and (b) will make Perl programmers feel even more inadequate
when faced with the towering intellectual might of Python. Them and their
Scwartzian transform. Bah!

tom
 
K

Kay Schluehr

Fredrik said:
huh? if you want a list, use a list.

d = [('a', {...}), ('b', {....})]

If one wants uniform access to a nested data structure like this one
usually starts writing a wrapper class. I do not think the requirement
is anyhow deeper than a standard wrapper around such a list ( as a
model ) but the implementation may be different with respect to optimal
time complexitiy of element access. But the interface of the wrapper
class of d might resemble that of a dict. While the interface is that
of a dict the implementation is closer to a nested list. An "ordered
dict" would lower the impedance between a dict and a list.

Kay
 
C

Christoph Zwerschke

Fredrik said:
I'll repeat this one last time: for the use cases presented by Zwerschke
and "bonono", using a list as the master data structure, and creating the
dictionary on demand, is a lot faster than using a ready-made ordered
dict implementation. if you will access things via the dictionary a lot,
you can cache the dictionary somewhere. if not, you can recreate it
several times and still get a net win.

You're right in pointing out that the advantage of ordered dictionaries
(unless you use an omptimized C implementation) is not a performance gain.

But please see my other reply: If the dictionary has more than 3 items
(say 10 or 20), and an effective ordered dict is used, it's not really
"a lot" slower. At least if we are talking about a situation were "on
demand" is "always". So, on the other side there isn't such a big
performance loss when using ordered dictionaries as well.

The advantage of using an ordered dictionary is that you can set up your
ordered dictionary (say, describing your database columns) once, and
then can access it in any way you like in the following: Iterate over it
in a guaranteed order or access item, always refering to the same
object, without needing to care about building and caching auxiliary
objects with different names depending on what you are doing.

-- Christoph
 
B

Bengt Richter

Not according to the content of the data, not just the "key". Or in
other words, some other metadata that is not present in the data. A
typical thing, like order of creation. Or some arbitary order. For
example :

I present a data grid/table in a HTML form and the user just drag and
drop and rearrange the columns order.
^^[1]
[1] implies known info of before and after rearrangement. Where do these
come from, and are the two states expressed as ordered sets of keys generated and stored somewhere?
The point is, to re-order, you need a mapping from unordered data dict keys to values which the sorted
builtin function will order in the way you want. (BTW, if you use DSU, make sure the data is not modifying
your sort in an undesired way. Passing a key function to sorted makes it easy to exclude unwanted data from
the sort). If you have data that determines a new ordering of keys, it has to be accessed somehow, so
you just need to make it accessible to a handy helper that will generate your key function. E.g,
with before and after lists of keys expressing e.g. drag-drop before and after orderings, lambda can do the
job of getting you dict items in the new order, e.g., where bef and aft are lists that define the desired orderings
before and after in the sense of sort_precedence = bef.index[key_in_bef] and same for aft.

sorted(thedict.items(),key=lambda t:dict(zip(bef,((k in aft and aft.index(k) or len(aft)+bef.index(k)) for k in bef))[t[0]])

Ok, that one-liner grew a bit ;-)
Of course, you may say, just put another column that represent
this(some reporting programs I have seen do it this way) and that is an
option but not the only option.
Maybe you could keep the rearranged_keys vector in a per-user cookie, if it's a web app
and amounts to a user personalization?

( posting delayed >12 hrs due to news server prob ;-/ )

Regards,
Bengt Richter
 
B

Bengt Richter

That's exactly the kind of things I find myself doing too often and what
I was talking about: You are using *two* pretty redundant data
structures, a dictionary and a list/tuple to describe the same thing.
Ok, you can use a trick to automatically create the dictionary from the
tuple, but still it feels somewhat "unnatural" for me. A "ordered
dictionary" would be the more "natural" data structure here.
But, as has been mentioned**n, this is only one example of an ordering one
could make default for an "ordered" dictionary. Suppose you say it should
be ordered by insertion order, so
d = OrderedDict(); d[1]='one'; d[2]='two' =>> list(d) => [1, 2]
ok, now we do d[1]='ein' and what is the order? list(d) => [2, 1] ??
Or do replacements not count as "insertions"? The devil is always going
to be in the details. Maybe you want a model that works more like a list
of key:value pairs with just optimized access to a pair by key name as
well as position in the list. Or maybe you want to permit append and
NOT prevent [('a',1), ('a':2)] and maybe d['a'] => [1, 2] ???

The point is that Python is a nice lego set, and pre-molded castles
don't re-use well, even if they suit a particular you to a t ;-)

Note that is isn't hard to snap a few pieces together to make an ordered
dict to your own specs. But IMO it belongs in pyPI or such, not in the system
library. At least until it gets a lot of mileage -- and MMV ;-)
I also wanted to mention the uglyness in the definition (nested tuples),
but then I understood that even an ordered dictionary would not
eliminate that uglyness, since the curly braces are part of the Python
syntax and cannot be used for creating ordered dictionaries anyway. I
would have to define the ordered dictionary in the very same ugly way:

d = odict(('pid', ('Employee ID', 'int')),
('name', ('Employee name', 'varchar')),
('sal', ('Salary', 'float')))

(Unless the Python syntax would be extend to use double curly braces or
something for ordered dictionaries - but I understand that this is not
an option.)
Whatever your odict does, if I had type a lot of definitions for it
I think I would write a QnD helper to make this work:

d = odict(prep("""

pid, Employee ID, int
name, Employee name, varchar # (comments to be ignored)
sal, Salary, float # alignment as above not mandatory
other, Something else, long, additional elements, allowed in second tuple?
"""))

( posting delayed >12 hrs due to news server prob ;-/ )

Regards,
Bengt Richter
 
B

Bengt Richter

Yes. But whether LIST aspect or DICT is important is well, opinion. So
let's leave it there.
Again, best way is decided by ME. If I am entering a coding contest
which is organized by YOU, that is a different story. As for related to
the subject line, since when I said my preference or use case has
anything to do with the subject line ? I have said in another post that
I don't think there should be one in the standard library, which is
directly about the subject line.
Ok, so if not in the standard library, what is the problem? Can't find what
you want with google and PyPI etc.? Or haven't really settled on what your
_requirements_ are? That seems to be the primary problem people who complain
with "why no sprollificator mode?" questions. They don't know what they really
mean when it comes down to a DYFR (Define Your Felicitous Requirements) challenge.
So DYFR ;-)
Then someone can take less time than many of these posts takes to make a
list subclass that also acts like the dict when you want or a dict subclass that
also acts like a list when you want. Which methods from which would you like
as-is, and which modified? Any additional methods or properties? DYFR ;-)
So you'd like the mechanics to be automated and hidden? Then you need to
DYFR for using the black box you want. Methods, semantics.
doesn't cost me anything ? That is good news to me.
Well, if you want something specific, it WILL cost you the effort to DYFR in detail ;-)

( posting delayed >12 hrs due to news server prob ;-/ )

Regards,
Bengt Richter
 
A

Alex Martelli

Christoph Zwerschke said:
But I think the following rule is "natural" enough to consider it as THE
standard behavior of ordered dictionaries:

"Insertion: If the key exists: Don't change the order. If it does not
exist: Append it to the sequence of keys. Deletion: Remove from the
sequence of keys."

I think this is also the behavior of associative arrays in PHP or Perl

Perl hashes now keep track of 'order of keys'? That's new to me, they
sure didn't back when I used Perl! It's been a while, but a little
googling shows me, e.g at
<http://www.openarchives.org/pipermail/oai-implementers/2002-September/0
00642.html>, assertions such as:
"""
Hashes don't maintain key order. To get them in sorted order try:

foreach $i (sort keys(%afiliacao))
"""
which fully match my memories. Could you produce a URL to support the
hypothesis that Perl has changed its behavior? What about PHP? Thanks!
and could be considered as the "ONE unambiguous definition".

"first insertion (since the last deletion if any)" is ONE unambiguous
definition, but surely not "_the_ ONE" with emphasis on ``the''. I see
nothing _ambiguous_ (nor _unnatural_) in being interested in the *last*
insertion, for example; indeed if phrased as "upon insertion, put the
key at the end of the sequence" (whether it was already elsewhere in the
sequence of not), with no need for conditionals regarding previous
existence, it might appear more "conceptually compact".

Anyway -- subclassing dict to implement your definition is reasonably
easy, and we could put the resulting package on the Cheese Shop. I hope
python.org keeps good enough statistics to be able to tell us, a couple
months later, how many people downloaded said package, vs how many
people downloaded a complete Python distro; of course, that ratio is
biased (in favour of the package) by the fact that many people already
have a complete distro available, while initially nobody would have the
package, but if we measure when things settle, after letting a month of
two or 'transient' pass, that effect might be lessened.

If we ran such an experiment, what fraction do you think would serve to
convince Guido that a dict 'ordered' by your definition is necessary in
Python 2.5's standard library (presumably in module 'collections')?


Alex
 
F

Fredrik Lundh

Tom said:
Incidentally, can we call that the "Larosa-Foord ordered mapping"?

The implementation, sure.
Then it sounds like some kind of rocket science discrete mathematics stuff

But math folks usually name things after the person(s) who came
up with the idea, not just some random implementer. The idea of
combining unordered mappings and ordered sequences is older than
Python itself.

</F>
 
C

Christoph Zwerschke

Alex said:
Perl hashes now keep track of 'order of keys'? That's new to me, they
sure didn't back when I used Perl!

Maybe I shouldn't have talked about Perl when I'm an ignoramus about
that language... You're right, Perl has unordered arrays. That was new
to me since I associate the term "array" always with "ordered" and I
just remembered that PHP's assoc arrays are similar to Perl's but in
fact, and the examples I have read did not mention about that problem.
What about PHP?

You can conclude that PHP's assoc arrays are ordered from the fact that
the language provides a ksort() function to order the keys. And I think
PHP's insertion order is the one I mentioned in my last post.

Anyway, it would be interesting to examine this in detail and how this
is implemented in other languages.
"first insertion (since the last deletion if any)" is ONE unambiguous
definition, but surely not "_the_ ONE" with emphasis on ``the''.
I see nothing _ambiguous_ (nor _unnatural_) in being interested in the
> *last* insertion, for example; indeed if phrased as "upon insertion, put
> the key at the end of the sequence" (whether it was already elsewhere in
> the sequence of not), with no need for conditionals regarding previous
existence, it might appear more "conceptually compact".

But it would not satisfy the concept of "keys of a dictionary" which are
always unique.

BTW, there are some boundary conditions that should be fulfilled for the
insertion order, most obviously:

If you define an ordered dict that way:

d = odict()
d['a'] = 1
d['b'] = 2
d['c'] = 3

The keys should then be orderes as ('a', 'b', 'c').
Anyway -- subclassing dict to implement your definition is reasonably
easy, and we could put the resulting package on the Cheese Shop. I hope
python.org keeps good enough statistics to be able to tell us, a couple
months later, how many people downloaded said package, vs how many
people downloaded a complete Python distro; of course, that ratio is
biased (in favour of the package) by the fact that many people already
have a complete distro available, while initially nobody would have the
package, but if we measure when things settle, after letting a month of
two or 'transient' pass, that effect might be lessened.

That would be also biased (in favour of Python) by the fact that
probably very little people would look for and use the package in the
cheese shop if they were looking for ordered dicts. I for example would
probably use ordered dicts if they were part of the standard lib, but
not if I have to install it as an obscure separate package. So I don't
think that will give us a real clue how many people would like ordered
dicts in the standard lib.

But anyway, if I find some time, I will research a little bit more about
the issue and create such a package, because it seems to me that the
existing packages and recipes are not really satisfying and you're right
it seems to be reasonably easy. It's on my todo list now...

-- Christoph
 
C

Christoph Zwerschke

Bengt said:
Ok, so if not in the standard library, what is the problem? Can't find what
you want with google and PyPI etc.? Or haven't really settled on what your
_requirements_ are? That seems to be the primary problem people who complain
with "why no sprollificator mode?" questions.

What I don't understand is why legitimate questions such as "why are
there no ordered dictionaries" are immediately interpreted as
*complaints* and not just as questions. If I ask such a question, I am
not complaining but trying to simply figure out *why* there is no such
thing. Probably there are reasons and all I want to know is find these
reasons and learn a little bit more about Python in doing so.

Why can't such questions be discussed in a factual, calm and friendly way?
> They don't know what they really mean when it comes down to a DYFR
> (Define Your Felicitous Requirements) challenge.

I don't think that this was true in this case, and even if this is the
outcome, those who asked the question will have learned something.

I think a discussion group is not there for only presenting mature,
sophisticated thoughts and concepts, but also for "thinking loud"
together with other about these issues. We all know that clarifying our
thoughts works often best if you discuss them with others. And I think
that's one purpose of discussion lists. Asking questions should not be
immediately be discouraged, even silly questions. If it is really a FAQ,
you can simply point to the FAQ or add the answer in the FAQ list if it
is missing there.

-- Chris
 
F

Fredrik Lundh

Alex said:
What about PHP? Thanks!

according to some random PHP documentation I found on the intarweb:

An array in PHP is actually an ordered map. A map is a type that
maps values to keys.

and later:

A key may be either an integer or a string. If a key is the standard
representation of an integer, it will be interpreted as such (i.e. "8"
will be interpreted as 8, while "08" will be interpreted as "08"). Floats
in key are truncated to integer.

and later:

You cannot use arrays or objects as keys. Doing so will result in a
warning: Illegal offset type.


at which point my brain raised an exception.

</F>
 
F

Fuzzyman

Christoph said:
Fredrik Lundh wrote: [snip..]
You're right; I found creating a Larosa/Foord OrderedDict in this
example to be even 8 times slower than an ordinary dict. However, two
things need to be said here: 1) The dictionary in my exmaple was pretty
small (only 3 items), so you are not really measuring the performance of
the ordered dict, but mainly the overhead of using a user derived class
in comparison with the built-in dict type. And 2) the implementation by
Larosa/Foord is very slow and can be easily improved, for instance like
that:

def __init__(self, init_val = ()):
dict.__init__(self, init_val)
self.sequence = [x[0] for x in init_val]

But that doesn't allow you to create an ordered dict from another
ordered dict.

It also allows duplicates in the sequence attribute. It's a useful idea
though.

Thanks

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
 
C

Christoph Zwerschke

Bengt said:
> d = OrderedDict(); d[1]='one'; d[2]='two' =>> list(d) => [1, 2]
> ok, now we do d[1]='ein' and what is the order? list(d) => [2, 1] ??
> Or do replacements not count as "insertions"?

If you simply set a value for a key that already exists, the order
should not be changed. I think this is intuitive.
Or maybe you want to permit append and NOT prevent
> [('a',1), ('a':2)] and maybe d['a'] => [1, 2] ???

You could ask the same question about dict. I think that is not an
option. Why should you want odict behave different than dict?

I still believe that the concept of an "ordered dictionary" ("behave
like dict, only keep the order of the keys") is intuitive and doesn't
give you so much scope for ambiguity. But probably I need to work on an
implementation to become more clear about possible hidden subtleties.

-- Christoph
 
K

Kay Schluehr

Christoph said:
That would be also biased (in favour of Python) by the fact that
probably very little people would look for and use the package in the
cheese shop if they were looking for ordered dicts.

Does anyone actually use this site? While the Vaults offered a nice
place and a nice interface the Cheese Shop has the appeal of a code
slum.

Kay
 
M

Magnus Lycka

Christoph said:
But please see my other reply: If the dictionary has more than 3 items
(say 10 or 20), and an effective ordered dict is used, it's not really
"a lot" slower. At least if we are talking about a situation were "on
demand" is "always". So, on the other side there isn't such a big
performance loss when using ordered dictionaries as well.

There is no performance issue involved with this usecase at all!

It doesn't matter if it's hundreds of tuples of strings in a list
if it's supposed to be presented in a GUI. Recreating a dict from
that is bound to be magnitudes faster than getting the stuff
visible on the screen, at least if we're talking web apps. So is
using a reasonable odict implementation, if that's what you want.

I think the issue is not to overload the already extensive standard
library with trivial things that can easily be replaced by your own
three line wrapper, especially if there are a number of different
semantics that could be imagined for such a thingie.

The C++ std lib has an ordered "dict" class called map, and that's
ordered by key. Others suggested ordering by value, and there are a
number of different interpretations of the 'order by insertion time'
theme in the air. This clearly smells like "fix your own thing and
leave it out of the standard library".

With one of these solutions picked as Python's "ordered dict", we'll
make things slightly more convenient for a few programmers and just
burden others with something that sounds right for them, but turns
out not to solve their problems. Or, we could try to make a bunch
of different solution, increasing the cognitive burden for all who
learn Python while solving non-problems. This just isn't going to
happen!
 
S

Steven D'Aprano

The implementation, sure.


But math folks usually name things after the person(s) who came
up with the idea, not just some random implementer.

No no no! In maths things are usually named after Euler, or the first
person to discover them after Euler.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,274
Messages
2,571,366
Members
48,052
Latest member
EvaW192252

Latest Threads

Top