sorteddict PEP proposal [started off as orderedict]

M

Mark Summerfield

Hi,

Below is a PEP proposal for a sorteddict. It arises out of a
discussion on this list that began a few weeks ago with the subject of
"An ordered dictionary for the Python library?", and a similar one on
the P3K mailing list with the subject "ordered dict for p3k
collections?".

If there is positive feedback I will submit the PEP to the reviewers,
so if you think it is a good idea please say so. (I'm sure that if you
_don't_ like it you'll tell me anyway:)

PEP: XXX
Title: Sorted Dictionary
Version: $Revision$
Last-Modified: $Date$
Author: Mark Summerfield
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 25-Sep-2007
Python-Version: 2.6
Post-History:


Abstract

This PEP proposes the addition of a sorted dictionary class to the
standard library's collections module.


Rationale

When handling collections of key-value data, it is often
convenient to
access the data in key order. One way to do this is to use an
unsorted
data structure (e.g., a Python dict), and then sort the keys as
needed.
Another way is to use a sorted data structure (e.g., a sorteddict
as
proposed in this PEP), and simply access the keys, knowing that
they
are always in sorted order.

Both approaches make sense in the right circumstances, but Python
currently only supports the first approach out of the box. A
sorteddict never needs sorting and is ideal for creating indexes
to
data where the keys are based on some property of the data.
Adding a
sorteddict to the collections module would not add significantly
to the
size of Python's standard library, but would provide a very useful
data
structure.


Specification

The proposed sorteddict has the same API to the builtin dict, so
can be
used as a drop-in replacement. The only behavioural difference
being
that any list returned by the sorteddict (whether of keys, values,
or
items) will be in key order, and similarly any iterator returned
will
iterate in key order.

In addition, the keys() method has two optional arguments:

keys(firstindex : int = None, secondindex : int = None) -> list of
keys

The parameter names aren't nice, but using say "start" and "end"
would
be misleading since the intention is for the parameters to work
like
they do in range(), e.g.

sd.keys() # returns a list of all the sorteddict's keys
sd.keys(10) # returns a list of the first 10 keys
sd.keys(19, 35) # returns a list of the 19th-34th keys
inclusive

If an out of range index is given, an IndexError exception is
raised.

Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:

key(index : int) -> value

item(index : int) -> (key, value)

value(index : int) -> key

set_value(index : int, value)

delete(index : int)

Items and values can still be accessed using the key, e.g.,
sd[key],
since all the dict's methods are available.


Examples

To keep a collection of filenames on a case-insensitive file
system in
sorted order, we could use code like this:

files = collections.sorteddict.sorteddict()
for name in os.listdir("."):
files[name.lower()] = name

We can add new filenames at any time, safe in the knowledge that
whenever we call files.values(), we will get the files in
case-insensitive alphabetical order.

To be able to iterate over a collection of data items in various
predetermined orders, we could maintain two or more sorteddicts:

itemsByDate = collections.sorteddict.sorteddict()
itemsByName = collections.sorteddict.sorteddict()
itemsBySize = collections.sorteddict.sorteddict()
for item in listOfItems:
itemsByDate["%s\t%17X" % (item.date, id(item))] = item
itemsByName["%s\t%17X" % (item.name, id(item))] = item
itemsBySize["%s\t%17X" % (item.size, id(item))] = item

Here we have used the item IDs to ensure key uniqueness.

Now we can iterate in date or name order, for example:

for item in itemsByDate:
print item.name, item.date


Implementation

A pure Python implementation is available at:

http://pypi.python.org/pypi/sorteddict


Copyright

This document has been placed in the public domain.
 
A

A.T.Hofkamp

If there is positive feedback I will submit the PEP to the reviewers,
so if you think it is a good idea please say so. (I'm sure that if you
_don't_ like it you'll tell me anyway:)

I like the idea, ie +1.
This PEP proposes the addition of a sorted dictionary class to the
standard library's collections module.

You don't seem to mention the sort criterium. I'd suggest to use the __lt__
operator, which you probably intended since it is commonly used.

Also, can I specify a custom sort function, as with list.sort() ?
In addition, the keys() method has two optional arguments:

keys(firstindex : int = None, secondindex : int = None) -> list of

Not sure this is a good idea. Wouldn't simply

mysorteddict.keys()[firstindex:secondindex]

be much better? It can do all you propose, and more (with iterators/generators
and such). I see no need to implement it inside the sorteddict as well.

Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:

key(index : int) -> value

item(index : int) -> (key, value)

value(index : int) -> key

set_value(index : int, value)

delete(index : int)

I wouldn't do this. It breaks compability with the normal dict. A beginning
user will expect dict and sorteddict to behave the same (except for
sortedness), including their set of supporting functions imho.

Otherwise, please make a case for them (and maybe even a new PEP to get them in
both types of dicts).
Examples

To keep a collection of filenames on a case-insensitive file
system in
sorted order, we could use code like this:

files = collections.sorteddict.sorteddict()
for name in os.listdir("."):
files[name.lower()] = name

The more interesting case would be to you preserve case of the files within the
keys.

I usually need this data structure with an A* algorithm implementation, where I
need to obtain the cheapest solution found so far.
for item in listOfItems:
itemsByDate["%s\t%17X" % (item.date, id(item))] = item
itemsByName["%s\t%17X" % (item.name, id(item))] = item
itemsBySize["%s\t%17X" % (item.size, id(item))] = item

Wouldn't a line like "itemsBySize[(item.size, id(item))] = item" do as well?

(or with a custom sort function on items?)
Now we can iterate in date or name order, for example:

for item in itemsByDate:
print item.name, item.date

Hmm, not with dict:
.... print x
....
1
2

So you should get your "%s\t%17X" strings here.


Sincerely,
Albert
 
A

Andrew Durdin

Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:

key(index : int) -> value

item(index : int) -> (key, value)

value(index : int) -> key

set_value(index : int, value)

delete(index : int)

But what about using non-sequential integer keys (something I do quite often)?

e.g. sorteddict({1:'a', 3:'b': 5:'c', 99:'d'})[3] should return 'b', not 'd'.

Andrew
 
M

Mark Summerfield

I like the idea, ie +1.


You don't seem to mention the sort criterium. I'd suggest to use the __lt__
operator, which you probably intended since it is commonly used.

You are right that I implicitly use __lt__ (or __cmp__ if there's no
__lt__).
Also, can I specify a custom sort function, as with list.sort() ?

No. That would make comparing two sorteddicts problematic. You can
always use munged string keys as my second example shows.
In addition, the keys() method has two optional arguments:
keys(firstindex : int = None, secondindex : int = None) -> list of

Not sure this is a good idea. Wouldn't simply

mysorteddict.keys()[firstindex:secondindex]

be much better? It can do all you propose, and more (with iterators/generators
and such). I see no need to implement it inside the sorteddict as well.

The reason for making it part of th
Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:
key(index : int) -> value
item(index : int) -> (key, value)
value(index : int) -> key
set_value(index : int, value)
delete(index : int)

I wouldn't do this. It breaks compability with the normal dict. A beginning
user will expect dict and sorteddict to behave the same (except for
sortedness), including their set of supporting functions imho.

Otherwise, please make a case for them (and maybe even a new PEP to get them in
both types of dicts).
To keep a collection of filenames on a case-insensitive file
system in
sorted order, we could use code like this:
files = collections.sorteddict.sorteddict()
for name in os.listdir("."):
files[name.lower()] = name

The more interesting case would be to you preserve case of the files within the
keys.

I usually need this data structure with an A* algorithm implementation, where I
need to obtain the cheapest solution found so far.
for item in listOfItems:
itemsByDate["%s\t%17X" % (item.date, id(item))] = item
itemsByName["%s\t%17X" % (item.name, id(item))] = item
itemsBySize["%s\t%17X" % (item.size, id(item))] = item

Wouldn't a line like "itemsBySize[(item.size, id(item))] = item" do as well?

(or with a custom sort function on items?)
Now we can iterate in date or name order, for example:
for item in itemsByDate:
print item.name, item.date

Hmm, not with dict:

... print x
...
1
2

So you should get your "%s\t%17X" strings here.

Sincerely,
Albert
 
M

Mark Summerfield

Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:

key(index : int) -> value

item(index : int) -> (key, value)

value(index : int) -> key

set_value(index : int, value)

delete(index : int)

But what about using non-sequential integer keys (something I do quite
often)?

e.g. sorteddict({1:'a', 3:'b': 5:'c', 99:'d'})[3] should return 'b', not
'd'.

Andrew

The sorteddict really does work in key order, so:

d = sorteddict({1:'a', 3:'b', 5:'c', 99:'d'})
d.items()
[(1, 'a'), (3, 'b'), (5, 'c'), (99, 'd')]

If you do d[3] you will get 'd' since that is the 4th sequential item.
If you want to get the item with _key_ 3 then use value():

d.value(3)
'd'


PS In my previous reply I got my example wrong a second time: should have
been:
for item in itemsByDate.values():...
 
M

Mark Summerfield

Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:

key(index : int) -> value

item(index : int) -> (key, value)

value(index : int) -> key

set_value(index : int, value)

delete(index : int)

But what about using non-sequential integer keys (something I do quite
often)?

e.g. sorteddict({1:'a', 3:'b': 5:'c', 99:'d'})[3] should return 'b', not
'd'.

Andrew

Hmmm, managed to confuse myself with 'b' and 'd'!

d = sorteddict({1:"one", 3:"three", 5:"five", 99:"ninetynine"})

d.items()

[(1, 'one'), (3, 'three'), (5, 'five'), (99, 'ninetynine')]

d[3], d.value(3)

('three', 'ninetynine')

So using [] returns the value for the given key and using value()
returns the value for the given index position.
 
D

Duncan Booth

Mark Summerfield said:
When handling collections of key-value data, it is often
convenient to access the data in key order. One way to do this is
to use an unsorted data structure (e.g., a Python dict), and then
sort the keys as needed. Another way is to use a sorted data
structure (e.g., a sorteddict as proposed in this PEP), and simply
access the keys, knowing that they are always in sorted order.

Please define 'sorted order' and make sure it is customisable.
keys(firstindex : int = None, secondindex : int = None) ->
list of keys

I don't really see the point of this, but general support for slicing
might be interesting:

sd[start:end] -> list of values for all keys such that start <= key <
end (using the appropriate definition of 'sorted order').

key(index : int) -> value

item(index : int) -> (key, value)

value(index : int) -> key

I'm confused: the key method returns a value and the value method
returns a key???
set_value(index : int, value)

delete(index : int)

All of those can of course be replaced with a single method that returns
the key at a specified index and then using that key. Yes that would be
less efficient, but I'd rather have as close to a standard dictionary
interface as possible.
Examples

To keep a collection of filenames on a case-insensitive file
system in sorted order, we could use code like this:

files = collections.sorteddict.sorteddict()
for name in os.listdir("."):
files[name.lower()] = name

We can add new filenames at any time, safe in the knowledge that
whenever we call files.values(), we will get the files in
case-insensitive alphabetical order.

I don't see this as a terribly useful example since you don't explain
why we need to keep filenames in case-insensitive alphabetical order at
all. If we want to print a list of filenames we can sort them once when
printing them, why do we need to keep them 'always sorted'?
To be able to iterate over a collection of data items in various
predetermined orders, we could maintain two or more sorteddicts:

itemsByDate = collections.sorteddict.sorteddict()
itemsByName = collections.sorteddict.sorteddict()
itemsBySize = collections.sorteddict.sorteddict()
for item in listOfItems:
itemsByDate["%s\t%17X" % (item.date, id(item))] = item
itemsByName["%s\t%17X" % (item.name, id(item))] = item
itemsBySize["%s\t%17X" % (item.size, id(item))] = item

I think perl might make you use strings as keys, fortunately Python
doesn't. What I really don't like about this is that you've copied the
date, so if you mutate item.date the itemsByDate are no longer sorted.

Again it sounds like sorting for printing will be less overhead that
keeping them always sorted. You need to come up with examples where
there is an actual benefit to not sorting for output.
Here we have used the item IDs to ensure key uniqueness.

If you are going to do this then it really does need to be:

itemsByDate = sorteddict(items, key=lambda self, k: self[k].date)

so that you can maintain sorted order when mutating an item. How the
dictionary knows that its contents have been mutated is another question
entirely.
Now we can iterate in date or name order, for example:

for item in itemsByDate:
print item.name, item.date

Which we can do just as well with an ordinary dictionary:

for item in sorted(items, key=byDate):
...

given an appropriate definition of byDate.

The cases I can think of where a sorteddict might be useful would be
things like collecting web server statistics and maintaining a regularly
updated display of the top n pages. You need a combination of factors
for this to be useful: frequent updates, and frequent extraction of
sorted elements + the requirement to access elements as a dictionary.

Also, none of your use cases target any of the additional fripperies you
threw into your proposal.
 
J

Jeremy Sanders

Mark said:
If there is positive feedback I will submit the PEP to the reviewers,
so if you think it is a good idea please say so. (I'm sure that if you
_don't_ like it you'll tell me anyway:)

It would be nice to have the ability to use numerical indexes and the key,
but I don't think they should share the same methods.

A useful use case would be to make a LRU (least recently used) dictionary,
where the keys are time-based (e.g. an incrementing counter). You should be
able to identify the least recently used object for discarding by just
accessing the last item in the dictionary.

By the way, I think a LRU cache dictionary would be a great addition to the
standard library.

Is there any speed advantage from implementing the sorteddict as a red-black
tree or something similar in C?

Jeremy
 
P

Paul Hankin

Recall sorted...
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted
list

So why not construct sorteddicts using the same idea of sortedness?

sorteddict((mapping | sequence | nothing), cmp=None, key=None,
reverse=None)

Then we can specify the exact behaviour of sorteddicts: it's the same
as a regular dict, except when the order of keys is important they're
ordered as if they were sorted by sorted(keys, cmp=sd._cmp,
key=sd._key, reverse=sd._reverse). Here I imagine cmp, key and reverse
are stored in the new sorteddict as attributes _cmp, _key, _reverse -
obviously the actual implementation may differ.

This has the benefit of making sorteddict's behaviour explicit and
easy to understand, and doesn't introduce a new sorting API when we
already have a perfectly decent one.

The only problem here is the **kwargs form of the dict constructor
doesn't translate as we're using keyword args to pass in the sort
criterion. Perhaps someone has an idea of how this can be solved. If
nothing else, this constructor could be dropped for sorteddicts, and
sorteddict(dict(**kwargs), cmp=..., key=..., reverse=...) when that
behaviour is wanted.

I don't like the integral indexing idea or the slicing: indexing and
slicing are already part of python and it's bad to have different
syntax for the same concept on sorteddicts. I'd say it's not an
important enough for sorteddicts anyway.
 
M

Mark Summerfield

It would be nice to have the ability to use numerical indexes and the key,
but I don't think they should share the same methods.

They don't; you use [] for key index and key(), value(), or item() for
index access.
A useful use case would be to make a LRU (least recently used) dictionary,
where the keys are time-based (e.g. an incrementing counter). You should be
able to identify the least recently used object for discarding by just
accessing the last item in the dictionary.

You could do that with a sorteddict simply by using datetime.date
objects or timestamp strings for keys. You could get the first or last
item (no matter what its key). For example:

d = sorteddict({1200:"lunch", 1100:"elevensies", 1600:"tea",
1900:"dinner"})
d.items()
[(1100, 'elevensies'), (1200, 'lunch'), (1600, 'tea'), (1900,
'dinner')]
d.item(0), d.item(-1)
((1100, 'elevensies'), (1900, 'dinner'))

So you could remove the last item using d.delete(-1).
By the way, I think a LRU cache dictionary would be a great addition to the
standard library.

Is there any speed advantage from implementing the sorteddict as a red-black
tree or something similar in C?

I'm sure that a C-based implementation using red-black or AVL trees or
skiplists would be faster, but right now I'm just trying to get an
acceptable API.

Corrections to my original PEP:

value(index : int) -> value # I incorrectly had the return value as
key

Also, now in the second example I use a tuple of string, int ID rather
than strings.
 
M

Mark Summerfield

Recall sorted...
sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted
list

So why not construct sorteddicts using the same idea of sortedness?

sorteddict((mapping | sequence | nothing), cmp=None, key=None,
reverse=None)

Then we can specify the exact behaviour of sorteddicts: it's the same
as a regular dict, except when the order of keys is important they're
ordered as if they were sorted by sorted(keys, cmp=sd._cmp,
key=sd._key, reverse=sd._reverse). Here I imagine cmp, key and reverse
are stored in the new sorteddict as attributes _cmp, _key, _reverse -
obviously the actual implementation may differ.

This has the benefit of making sorteddict's behaviour explicit and
easy to understand, and doesn't introduce a new sorting API when we
already have a perfectly decent one.

The only problem here is the **kwargs form of the dict constructor
doesn't translate as we're using keyword args to pass in the sort
criterion. Perhaps someone has an idea of how this can be solved. If
nothing else, this constructor could be dropped for sorteddicts, and
sorteddict(dict(**kwargs), cmp=..., key=..., reverse=...) when that
behaviour is wanted.

I don't like the integral indexing idea or the slicing: indexing and
slicing are already part of python and it's bad to have different
syntax for the same concept on sorteddicts. I'd say it's not an
important enough for sorteddicts anyway.


This makes a lot of sense to me. But how do you envisage it would be
implemented?
 
S

Steven Bethard

Mark said:
PEP: XXX
Title: Sorted Dictionary [snip]
In addition, the keys() method has two optional arguments:

keys(firstindex : int = None, secondindex : int = None) -> list of keys

The parameter names aren't nice, but using say "start" and "end" would
be misleading since the intention is for the parameters to work like
they do in range(), e.g.

sd.keys() # returns a list of all the sorteddict's keys
sd.keys(10) # returns a list of the first 10 keys
sd.keys(19, 35) # returns a list of the 19th-34th keys inclusive

You should use the range() names, "start" and "stop":
Help on built-in function range in module __builtin__:

range(...)
range([start,] stop[, step]) -> list of integers

But I also agree that this is probably not necessary given the existence
of slicing and itertools.islice().
Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:

key(index : int) -> value

item(index : int) -> (key, value)

value(index : int) -> key

set_value(index : int, value)

delete(index : int)

Items and values can still be accessed using the key, e.g., sd[key],
since all the dict's methods are available.

I agree with others that these APIs are really not necessary. Simply
call keys(), values() or items() and then use that list if you really
want indexing. If you're dealing with a lot of these kind of indexing
behaviors, working with the plain list object will be faster anyway.
Examples

To keep a collection of filenames on a case-insensitive file system in
sorted order, we could use code like this:

files = collections.sorteddict.sorteddict()

The type should probably just be in collections directly. No need for a
sub-package. So that would look like::

files = collections.sorteddict()

I also strongly support Paul Hankin's proposal for the constructor API::

sorteddict((mapping|sequence|..), cmp=None, key=None, reverse=False)

Being able to sort by a key= function would make sorteddict()
dramatically more useful. I almost never use the builtin heap() because
it doesn't support a key= function and I always need one. I suspect use
cases for sorteddict() will be similar. For example::

files = collections.sorteddict(key=lambda s: s.lower())
for name in os.listdir("."):
files[name] = ... some calculated value ...

Now the file names are in lower-case sorted order, but if you actually
retrieve the keys, e.g. through .items(), you'll see the real filename,
not the lower-cased ones.

STeVe
 
T

thebjorn

Hi,

Below is a PEP proposal for a sorteddict. It arises out of a
discussion on this list that began a few weeks ago with the subject of
"An ordered dictionary for the Python library?", and a similar one on
the P3K mailing list with the subject "ordered dict for p3k
collections?".

If there is positive feedback I will submit the PEP to the reviewers,
so if you think it is a good idea please say so. (I'm sure that if you
_don't_ like it you'll tell me anyway:)

I can't see much advantage over:

for key in sorted(mydict):
...

A much simpler data-structure, that is also very useful, is a
dictionary that keeps insertion order -- it's also something that's a
tad bit more difficult to implement yourself than the above. My
personal implementation is documented at http://blog.tkbe.org/archive/python-property-set/
It's very tempting to add value access by numerical index (I did as
well), however I now think it's a mistake.

-1 from me.

-- bjorn
 
P

Paul Hankin

This makes a lot of sense to me. But how do you envisage it would be
implemented?

Here's a first go. Sorting occurs when the keys are iterated over,
making it fast (almost as a dict) for construction, insertion, and
deletion, but slow if you're iterating a lot. You should look at some
use cases to decide if this approach is best, or if a sorted
datastructure should be used instead, but my instinct is that this is
a decent approach. Certainly, you're unlikely to get a simpler
implementation :)

class sorteddict(dict):
"A sorted dictionary"
def __init__(self, arg=None, cmp=None, key=None, reverse=False):
if arg:
super(sorteddict, self).__init__(arg)
else:
super(sorteddict, self).__init__()
self._cmp = cmp
self._key = key
self._reverse = reverse
def keys(self):
return sorted(super(sorteddict, self).keys(), cmp=self._cmp,
key=self._key, reverse=self._reverse)
def iter_keys(self):
return (s for s in self.keys())
def items(self):
return [(key, self[key]) for key in self.keys()]
def iter_items(self):
return ((key, self[key]) for key in self.keys())
def values(self):
return [self[key] for key in self.keys()]
def iter_values(self):
return (self[key] for key in self.keys())
def __str__(self):
return '{' + ', '.join('%s: %s' % (repr(k), repr(v))
for k, v in self.iter_items()) + '}'
def __repr__(self):
return str(self)
def __iter__(self):
return self.iter_keys()
 
S

Steven Bethard

Paul said:
This makes a lot of sense to me. But how do you envisage it would be
implemented?

Here's a first go. Sorting occurs when the keys are iterated over,
making it fast (almost as a dict) for construction, insertion, and
deletion, but slow if you're iterating a lot. You should look at some
use cases to decide if this approach is best, or if a sorted
datastructure should be used instead, but my instinct is that this is
a decent approach. Certainly, you're unlikely to get a simpler
implementation :)

class sorteddict(dict):
"A sorted dictionary"
def __init__(self, arg=None, cmp=None, key=None, reverse=False):
if arg:
super(sorteddict, self).__init__(arg)
else:
super(sorteddict, self).__init__()
self._cmp = cmp
self._key = key
self._reverse = reverse
def keys(self):
return sorted(super(sorteddict, self).keys(), cmp=self._cmp,
key=self._key, reverse=self._reverse)
def iter_keys(self):
return (s for s in self.keys())
def items(self):
return [(key, self[key]) for key in self.keys()]
def iter_items(self):
return ((key, self[key]) for key in self.keys())
def values(self):
return [self[key] for key in self.keys()]
def iter_values(self):
return (self[key] for key in self.keys())
def __str__(self):
return '{' + ', '.join('%s: %s' % (repr(k), repr(v))
for k, v in self.iter_items()) + '}'
def __repr__(self):
return str(self)
def __iter__(self):
return self.iter_keys()

With this is the implementation, I'm definitely -1. Not because it's a
bad implementation, but because if the iteration is always doing a sort,
then there's no reason for a separate data structure. Just use sorted()
directly on a regular dict. That has the advantage of being much more
obvious about where the expensive operations are::

for key in sorted(d, ...):
... whatever you want to do ...

IMHO, the only reason to introduce a sorteddict() is if it minimizes the
cost of sorting the keys.

STeVe
 
H

Hrvoje Niksic

Steven Bethard said:
With this is the implementation, I'm definitely -1. Not because it's a
bad implementation, but because if the iteration is always doing a
sort, then there's no reason for a separate data structure.

Agreed. A true sorted dict would keep its keys sorted in the first
place, a la C++ std::map.
 
P

Paul Hankin

With this is the implementation, I'm definitely -1. Not because it's a
bad implementation, but because if the iteration is always doing a sort,
then there's no reason for a separate data structure. Just use sorted()
directly on a regular dict. That has the advantage of being much more
obvious about where the expensive operations are::

for key in sorted(d, ...):
... whatever you want to do ...

IMHO, the only reason to introduce a sorteddict() is if it minimizes the
cost of sorting the keys.

I disagree with your reason for introducing a sorteddict - the main
reason should be if it makes code that uses it simpler and more
readable, with efficiency (within reasonable limits) being of
secondary importance. Unfortunately for sorteddict, your code is
simpler and more explicit for most obvious use cases.

But, with Bill's cached sorted key list it'll be reasonably efficient,
and it's likely possible to update the sorted cache more efficiently
than resorting if there's only been a small number of insertions since
the last sort, but I haven't the time to do a thorough job.
 
C

chris.monsanto

I can't see much advantage over:

for key in sorted(mydict):
...

A much simpler data-structure, that is also very useful, is a
dictionary that keeps insertion order -- it's also something that's a
tad bit more difficult to implement yourself than the above. My
personal implementation is documented athttp://blog.tkbe.org/archive/python-property-set/
It's very tempting to add value access by numerical index (I did as
well), however I now think it's a mistake.

-1 from me.

-- bjorn

I agree on both counts. When I saw this topic, I was hoping it would
be about a structure that remembers insertion order. Now if you
drafted a pep on THAT, I would stand behind it.
 
J

James Stroud

Mark said:
e.g. sorteddict({1:'a', 3:'b': 5:'c', 99:'d'})[3] should return 'b', not
'd'.

The sorteddict really does work in key order, so:

d = sorteddict({1:'a', 3:'b', 5:'c', 99:'d'})
d.items()
[(1, 'a'), (3, 'b'), (5, 'c'), (99, 'd')]

If you do d[3] you will get 'd' since that is the 4th sequential item.
If you want to get the item with _key_ 3 then use value():

d.value(3)
'd'

If a quorum opinion might sway you, I have found, after prolonged and
almost metaphysical type meditation on and experimentation with the
subject, that it would be undesirable to support the type of ambiguous
item access that you suggest.

1. It would break symmetry with __setitem__:
>>> d = sorteddict({1:'a', 3:'b', 5:'c', 99:'d'})
>>> d[2] = 'x'
>>> d[2]
'b'

The need for a work-around in this case (i.e. value()) sends
a flag that something is clumsy about the interface. It seems that
the presence of the value() method exists simply to compensate
for the "broken" design. More symmetrical would be to make value()
or some other well named method return a value at an index.

2. Such ambiguity breaks 100% compatiblity with the built-in
dict. I think the success of a sorted dict would be tied quite
closely to its being a superset of dict. This compatibility
not only makes it easy for people to start using sorteddict
without remembering the distinctions but should also reduce
the need for explicit type checking in libraries that use
sorteddict.


James
 
J

James Stroud

Mark said:
Hmmm, managed to confuse myself with 'b' and 'd'!

d = sorteddict({1:"one", 3:"three", 5:"five", 99:"ninetynine"})

d.items()

[(1, 'one'), (3, 'three'), (5, 'five'), (99, 'ninetynine')]

d[3], d.value(3)

('three', 'ninetynine')

So using [] returns the value for the given key and using value()
returns the value for the given index position.

I didn't read enough messages. My last post is redundant.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top