No, when you program you know how you will be using the data structure,
so you can choose the implementation that's right for the application.
I don't understand why you have prefixed that sentence with "No". It
seems like you are agreeing with bearophile's point.
That's what the container libraries for other languages do. At the
moment, you just have one implementation, and have to hope it's right
for the job.
To the extent that this is true, it is a sign of the poverty of the
standard Python library. But in fact it isn't true. Python has quite a
decent collection of collection types. Just off the top of my head:
tuple, namedtuple, list, set, frozenset, dict, defaultdict, queue, deque
set, list and dict are highly optimized for general use, and are very
suitable for building new data structures. And you are wrong to say that
we have to "hope it's right for the job", we can build perfectly
acceptable pure-Python data structures from these building blocks, or add
our own C extensions. You're never forced to use a standard library data
structure, it's a trade-off of your time and effort against runtime
efficiency. And because the standard library types are generally so
excellent, that trade-off is often (but not always) a no-brainer.
Adding an *optional* parameter that says, in effect, "I
want this list optimised for writes and reads at both ends" or "I want
this list optimised for fully random reads but writes only at the end"
doesn't *lose* you any information about what you're programming with.
It might not lose information, but it does obfuscate it.
Right now, we can do this:
if isinstance(obj, list):
print "optimized for random reads and writes at the end"
elif isinstance(obj, deque):
print "optimized for writes at both the start and end"
If you care about such things, it's easy to find out, because there is a
one-to-one mapping from type to behaviour.
Under Lie's suggestion, we would have a one-to-many mapping from type to
behaviour, forcing us to do this:
if isinstance(obj, list):
if obj.implementation == 'deque'
print "optimized for writes at both the start and end"
else:
print "optimized for random reads and writes at the end"
That's assuming the implementation is available at runtime at all; if
it's not, then you have lost information.
Of course it's essential that the data structure has identican
/functional/ behaviour whatever optimisation you use.
"Essential"? That's an interesting point. Let's look at an actual example.
Consider one of Lie's initial examples (paraphrased):
D1 = dict({1: 'a', 2: 'b'}) # standard hash-table
D2 = dict({1: 'a', 2: 'b'}, implementation="binary tree")
You say it's "essential" that the binary tree implementation has
identical functional behaviour to standard hash-table implementation of
dict. Okay, let's see where that leads us. The functional behaviour of
hash-tables is that they are O(1) for reads, while binary trees are
O(log(N)). The only way for them to have identical functional behaviour
is to *artificially* slow down dicts to the lowest common speed. That is
a Bad Thing, and I don't understand why you think it is essential.
But perhaps you didn't actually mean identical behaviour, and merely an
identical *interface*. That's more reasonable, we no longer have to
artificially change the behaviour of the type -- be we do have to
artificially cripple the API of the type!
The advantage of binary trees as a mapping is that they are ordered.
Binary trees have a rich set of traversal methods: you can walk the tree
in post-order, pre-order, in-order or depth first. But hash-tables don't
have *any* of those, so now we have to forbid all binary-tree traversal
methods. In which case, why bother paying the cost of binary trees if you
don't get the extra functionality?
But maybe you'll argue that Lie's examples are bad examples. Perhaps what
you have in mind is something like the difference between "dicts
implemented as hash tables with chaining" and "dicts implemented as hash
tables with open addressing". (Aside: what sort of open addressing?
Linear? Quadratic? Something else?) At least now we're talking about
different implementations with the same API and more-or-less the same
functional behaviour.
I'd suggest that 99% of the time, client code simply shouldn't care about
the differences. The existing implementations are Good Enough. Have you
*ever* found yourself saying "I can't use the built-in dict, because it
uses chaining and for my application I need linear open addressing"? (Or
vice versa.)
I doubt you've every said that. I doubt you know anyone who has said it.
I doubt you know *of* anyone who has said it. And even if you have, then
you are free to go and write your own dict type as a C extension type and
call it dict_with_linear_openaddressing and write this:
D = dict_with_linear_openaddressing(data)
instead of:
D = dict(data, implementation = "linear open addressing")
Yet again Lie's proposal gains you nothing. In fact, as an application
writer, you're better off writing your own type rather than waiting for
it to be added to the built-in implementation of dict.
The one exception to this is when you're developing a *new type* that
doesn't already exist in the standard library, and you haven't decided
which implementation(s) you should use, so you write multiple
implementations for testing. Here's the wrong way to do it:
class Foo(object):
def __init__(self, data, implementation='standard'):
self.implementation = implementation
if implementation == 'standard':
# save data one way
self.data = foo(data)
elif implementation == 'magic':
# save data another way
self.data = bar(data)
elif implementation == 'green':
# save data a third way
self.data = baz(data)
else:
raise ValueError('unknown implementation')
def len(self):
if implementation == 'standard':
return len(self.data)
elif implementation == 'magic':
return len(self.data[0]) + len(self.data[1:])
elif implementation == 'green':
return self._len
Why would you write it that way? Worst Design Ever -- it's confusing and
error prone and fails to obey the Don't Repeat Yourself principle, thus
increasing the risk of errors. And testing and documentation are
significantly more complicated.
Here's a better way:
class Foo(object):
class _StandardFoo(data):
def __init__(self, data):
self.data = foo(data)
def __len__(self):
return len(self.data)
class _MagicFoo(data):
def __init__(self, data):
self.data = bar(data)
def __len__(self):
return len(self.data[0]) + len(self.data[1:])
class _GreenFoo(data):
def __init__(self, data):
self.data = baz(data)
def __len__(self):
return self._len
_mapping = {'standard': _StandardFoo,
'magic': _MagicFoo, 'green': _GreenFoo}
def __new__(cls, data, implementation='standard'):
try:
return cls._mapping[implementation]
except KeyError:
raise ValueError('unknown implementation')
Here's an even better way:
class StandardFoo(data):
def __init__(self, data):
self.data = foo(data)
def __len__(self):
return len(self.data)
class MagicFoo(data):
def __init__(self, data):
self.data = bar(data)
def __len__(self):
return len(self.data[0]) + len(self.data[1:])
class GreenFoo(data):
def __init__(self, data):
self.data = baz(data)
def __len__(self):
return self._len
and let the client code call whichever implementation they want.
[snip]
They're not different data structures from the client point of view.
But of course they are. They have different functional behaviour, and
different APIs. Unless we artificially cripple the implementations and
make them all identical (which defeats the purpose of having different
implementations!) they are different data structures, and those
differences will be visible to the client code.
This scheme is a case of over-generalisation. It's also impractical: who
is going to spend the time and effort to do it? It's hard enough to get
somebody to write a binary tree implementation for the standard library,
without expecting them to *also* write a wrapper for it to make it look
like a dict. It's working against the grain of Python, which is well
suited for the one-type = one-API model instead of the one-type = many-
APIs model which Lie is suggesting. And for what benefit? The only
benefit suggested so far is that we can write:
dict({1: 'a', 2: 'b'}, implementation="binary tree")
instead of
binarytree({1: 'a', 2: 'b'})
If you call that a benefit. I don't.