M
Mark Summerfield
Hi,
Below is a PEP proposal for a sorteddict. It arises out of a
discussion on this list that began a few weeks ago with the subject of
"An ordered dictionary for the Python library?", and a similar one on
the P3K mailing list with the subject "ordered dict for p3k
collections?".
If there is positive feedback I will submit the PEP to the reviewers,
so if you think it is a good idea please say so. (I'm sure that if you
_don't_ like it you'll tell me anyway
PEP: XXX
Title: Sorted Dictionary
Version: $Revision$
Last-Modified: $Date$
Author: Mark Summerfield
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 25-Sep-2007
Python-Version: 2.6
Post-History:
Abstract
This PEP proposes the addition of a sorted dictionary class to the
standard library's collections module.
Rationale
When handling collections of key-value data, it is often
convenient to
access the data in key order. One way to do this is to use an
unsorted
data structure (e.g., a Python dict), and then sort the keys as
needed.
Another way is to use a sorted data structure (e.g., a sorteddict
as
proposed in this PEP), and simply access the keys, knowing that
they
are always in sorted order.
Both approaches make sense in the right circumstances, but Python
currently only supports the first approach out of the box. A
sorteddict never needs sorting and is ideal for creating indexes
to
data where the keys are based on some property of the data.
Adding a
sorteddict to the collections module would not add significantly
to the
size of Python's standard library, but would provide a very useful
data
structure.
Specification
The proposed sorteddict has the same API to the builtin dict, so
can be
used as a drop-in replacement. The only behavioural difference
being
that any list returned by the sorteddict (whether of keys, values,
or
items) will be in key order, and similarly any iterator returned
will
iterate in key order.
In addition, the keys() method has two optional arguments:
keys(firstindex : int = None, secondindex : int = None) -> list of
keys
The parameter names aren't nice, but using say "start" and "end"
would
be misleading since the intention is for the parameters to work
like
they do in range(), e.g.
sd.keys() # returns a list of all the sorteddict's keys
sd.keys(10) # returns a list of the first 10 keys
sd.keys(19, 35) # returns a list of the 19th-34th keys
inclusive
If an out of range index is given, an IndexError exception is
raised.
Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:
key(index : int) -> value
item(index : int) -> (key, value)
value(index : int) -> key
set_value(index : int, value)
delete(index : int)
Items and values can still be accessed using the key, e.g.,
sd[key],
since all the dict's methods are available.
Examples
To keep a collection of filenames on a case-insensitive file
system in
sorted order, we could use code like this:
files = collections.sorteddict.sorteddict()
for name in os.listdir("."):
files[name.lower()] = name
We can add new filenames at any time, safe in the knowledge that
whenever we call files.values(), we will get the files in
case-insensitive alphabetical order.
To be able to iterate over a collection of data items in various
predetermined orders, we could maintain two or more sorteddicts:
itemsByDate = collections.sorteddict.sorteddict()
itemsByName = collections.sorteddict.sorteddict()
itemsBySize = collections.sorteddict.sorteddict()
for item in listOfItems:
itemsByDate["%s\t%17X" % (item.date, id(item))] = item
itemsByName["%s\t%17X" % (item.name, id(item))] = item
itemsBySize["%s\t%17X" % (item.size, id(item))] = item
Here we have used the item IDs to ensure key uniqueness.
Now we can iterate in date or name order, for example:
for item in itemsByDate:
print item.name, item.date
Implementation
A pure Python implementation is available at:
http://pypi.python.org/pypi/sorteddict
Copyright
This document has been placed in the public domain.
Below is a PEP proposal for a sorteddict. It arises out of a
discussion on this list that began a few weeks ago with the subject of
"An ordered dictionary for the Python library?", and a similar one on
the P3K mailing list with the subject "ordered dict for p3k
collections?".
If there is positive feedback I will submit the PEP to the reviewers,
so if you think it is a good idea please say so. (I'm sure that if you
_don't_ like it you'll tell me anyway
PEP: XXX
Title: Sorted Dictionary
Version: $Revision$
Last-Modified: $Date$
Author: Mark Summerfield
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 25-Sep-2007
Python-Version: 2.6
Post-History:
Abstract
This PEP proposes the addition of a sorted dictionary class to the
standard library's collections module.
Rationale
When handling collections of key-value data, it is often
convenient to
access the data in key order. One way to do this is to use an
unsorted
data structure (e.g., a Python dict), and then sort the keys as
needed.
Another way is to use a sorted data structure (e.g., a sorteddict
as
proposed in this PEP), and simply access the keys, knowing that
they
are always in sorted order.
Both approaches make sense in the right circumstances, but Python
currently only supports the first approach out of the box. A
sorteddict never needs sorting and is ideal for creating indexes
to
data where the keys are based on some property of the data.
Adding a
sorteddict to the collections module would not add significantly
to the
size of Python's standard library, but would provide a very useful
data
structure.
Specification
The proposed sorteddict has the same API to the builtin dict, so
can be
used as a drop-in replacement. The only behavioural difference
being
that any list returned by the sorteddict (whether of keys, values,
or
items) will be in key order, and similarly any iterator returned
will
iterate in key order.
In addition, the keys() method has two optional arguments:
keys(firstindex : int = None, secondindex : int = None) -> list of
keys
The parameter names aren't nice, but using say "start" and "end"
would
be misleading since the intention is for the parameters to work
like
they do in range(), e.g.
sd.keys() # returns a list of all the sorteddict's keys
sd.keys(10) # returns a list of the first 10 keys
sd.keys(19, 35) # returns a list of the 19th-34th keys
inclusive
If an out of range index is given, an IndexError exception is
raised.
Since the sorteddict's data is always kept in key order, indexes
(integer offsets) into the sorteddict make sense. Five additional
methods are proposed to take advantage of this:
key(index : int) -> value
item(index : int) -> (key, value)
value(index : int) -> key
set_value(index : int, value)
delete(index : int)
Items and values can still be accessed using the key, e.g.,
sd[key],
since all the dict's methods are available.
Examples
To keep a collection of filenames on a case-insensitive file
system in
sorted order, we could use code like this:
files = collections.sorteddict.sorteddict()
for name in os.listdir("."):
files[name.lower()] = name
We can add new filenames at any time, safe in the knowledge that
whenever we call files.values(), we will get the files in
case-insensitive alphabetical order.
To be able to iterate over a collection of data items in various
predetermined orders, we could maintain two or more sorteddicts:
itemsByDate = collections.sorteddict.sorteddict()
itemsByName = collections.sorteddict.sorteddict()
itemsBySize = collections.sorteddict.sorteddict()
for item in listOfItems:
itemsByDate["%s\t%17X" % (item.date, id(item))] = item
itemsByName["%s\t%17X" % (item.name, id(item))] = item
itemsBySize["%s\t%17X" % (item.size, id(item))] = item
Here we have used the item IDs to ensure key uniqueness.
Now we can iterate in date or name order, for example:
for item in itemsByDate:
print item.name, item.date
Implementation
A pure Python implementation is available at:
http://pypi.python.org/pypi/sorteddict
Copyright
This document has been placed in the public domain.