B
bearophileHUGS
Choosing the right data structure is usually a matter of compromises,
and sometimes the best you can do is to change some data structures
and look for the faster running time. To do this it helps to have a
language that allows you to swap data structures in the more
transparent way possible.
It's not just a matter of the specific program, it's also a matter of
the nature of the programming language. Python isn't meant to be high-
performance in running time, it's meant to be easy to use, flexible,
and easy to understand. (Ruby shares almost the same purposes, but it
looks a bit less easy, a bit more flexible, and offers a bit more
freedom, and it used to be a little slower).
Now Ruby dicts are ordered by default:
http://www.igvita.com/2009/02/04/ruby-19-internals-ordered-hash/
Of course Python can grow odicts into its collections module, that can
be used everywhere a normal dict is used. But while this may be a good
solution for faster languages like Java, C# and D, for me it doesn't
sound like the best solution for Python.
That page about Ruby dicts show a higher traversal speed (probably
just because the CPU has to scan less memory, but I am not sure,
because mordern CPUs are very complex) but lower insertion speed (I
think mostly not because the added management of two pointers, but
because the memory used increases, so there are more cache misses. If
this turns out as true, then using the xor trick may halve the extra
memory needed). I have no idea if such different balance of
performance will lead to on average faster or slower Python programs,
this has to be tested. But even if the average performance becomes a
little worse I think making the default Python dict as ordered is a
positive change for Python too, because built-ins are meant to be as
flexible as possible, even if they aren't the fastest ones or the less
memory hungry ones (and keeping dicts ordered decreases the surprises
a newbie finds, makes some code cleaner, and doctests become simpler
to write because the order of items may be more defined).
Once the default dicts are ordered, it can be possible to add an
unordereddict to the collections module to be used by programmers when
max performance or low memory usage is very important
Namespaces and other internal things of Python can keep using the un-
ordereddicts.
I don't know what to do regarding Python sets yet. In mathematics sets
aren't ordered, so they may be kept as they are now.
Bye,
bearophile
and sometimes the best you can do is to change some data structures
and look for the faster running time. To do this it helps to have a
language that allows you to swap data structures in the more
transparent way possible.
It's not just a matter of the specific program, it's also a matter of
the nature of the programming language. Python isn't meant to be high-
performance in running time, it's meant to be easy to use, flexible,
and easy to understand. (Ruby shares almost the same purposes, but it
looks a bit less easy, a bit more flexible, and offers a bit more
freedom, and it used to be a little slower).
Now Ruby dicts are ordered by default:
http://www.igvita.com/2009/02/04/ruby-19-internals-ordered-hash/
Of course Python can grow odicts into its collections module, that can
be used everywhere a normal dict is used. But while this may be a good
solution for faster languages like Java, C# and D, for me it doesn't
sound like the best solution for Python.
That page about Ruby dicts show a higher traversal speed (probably
just because the CPU has to scan less memory, but I am not sure,
because mordern CPUs are very complex) but lower insertion speed (I
think mostly not because the added management of two pointers, but
because the memory used increases, so there are more cache misses. If
this turns out as true, then using the xor trick may halve the extra
memory needed). I have no idea if such different balance of
performance will lead to on average faster or slower Python programs,
this has to be tested. But even if the average performance becomes a
little worse I think making the default Python dict as ordered is a
positive change for Python too, because built-ins are meant to be as
flexible as possible, even if they aren't the fastest ones or the less
memory hungry ones (and keeping dicts ordered decreases the surprises
a newbie finds, makes some code cleaner, and doctests become simpler
to write because the order of items may be more defined).
Once the default dicts are ordered, it can be possible to add an
unordereddict to the collections module to be used by programmers when
max performance or low memory usage is very important
Namespaces and other internal things of Python can keep using the un-
ordereddicts.
I don't know what to do regarding Python sets yet. In mathematics sets
aren't ordered, so they may be kept as they are now.
Bye,
bearophile