R
Raymond Hettinger
Hello Folks.
It doesn't seem to be common knowledge when and how a[x] gets
translated to a[x+len(x)]. So, here's a short info post on how Python
supports negative indices for sequences.
I've put the answer below, but if you want to quickly test your own
knowledge, ask yourself which of these support negative indices:
'abcde'[-2] # builtin string class, bracket
syntax
'abcde'.__getitem__(-2) # builtin class, magic method
collections.deque('abcde')[-2] # extension class, bracket syntax
collections.deque('abcde').__getitem__[-2] # extension class, magic
method
class S(str):
pass
class T(str):
def __getitem__(self, i):
print(i)
return 'd'
class U(str):
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
class V(str):
'str subclass overriding key methods'
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'
class W:
'object subclass registered as Sequence'
def __init__(self, data):
self.data = data
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'
collections.Sequence.register(V)
class X(collections.Sequence):
'Subclass the Sequence abstract base class'
def __init__(self, data):
self.data = data
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'
Essentially, the example raise these questions:
* Are negative indices tied to the grammar (i.e. binary subscription)?
* Does it matter if you call a magic method instead of using
brackets?
* Are they tied to particular concrete classes (str, list, tuple,
deque, dict)?
* If so, how does subclassing affect index handling?
* What parts of user defined classes affect handling of indices?
* Are they part of the abstract layer (PyObject_GetItem)?
* Are they tied to the Sequence ABC?
* How are mapping lookups differentiated (d[-2] where d is a Mapping)?
------------- Answers below --------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The first five examples support negative indices. The builtin
sequence types: str, deque, list and tuple, all have negative index
support built in to their concrete implementations (such as
str.__getitem__) and that behavior persists for subclassers as long as
they don't override __getitem__.
The parse of "a['x']" matches the "subscription" rule in Grammar/
Grammar. That compiles the BINARY_SUBSCR opcode which pops off the
evaluated expressions for "a" and "x" and calls the abstract layer,
PyObject_GetItem() which routes most calls to the concrete
implementation (i.e. myobj.__getitem__).
The exception to concrete dispatch is a fast path for builtin classes
or their subclasses that
1) are recognized as sequences (i.e. not a dictionary and have not
overridden __getitem__)
2) where "x" is an index (i.e. not a float)
If so, it is passed to an intermediate abstract layer,
PySequence_SetItem() which will apply negative index handling iff the
sequence defines a __len__ method.
In other words, the handling of negative indices is always up to the
concrete class.
It is not part of the grammar, or the definition of subscriptions, or
the dispatch sequence.
The docs guarantee that Python's builtin sequences implement support
for negative indices ( http://docs.python.org/dev/reference/expressions.html#subscriptions
) and they all do so through their individual __getitem__
implementations. Builtin Python sequences also have a fast path that
make negative index handling automatic, but this is just an invisible
optimization and can be bypassed by calling myobj.__getitem__
directly.
Note on slicing: slices are handled the same way (at least in 3.x),
so it is the responsibility of concrete classes to add support if they
choose to do so. Some, like lists tuples and string do support
slicing and some objects like deque do not. Interestingly, the grammar
has rules for slicing, but that is implemented as making a slice
instance as the argument to the lookup. The actual lookup work is
dispatched to the concrete class and there is no fast path along the
way.
Hope you all found this to be informative,
Raymond
It doesn't seem to be common knowledge when and how a[x] gets
translated to a[x+len(x)]. So, here's a short info post on how Python
supports negative indices for sequences.
I've put the answer below, but if you want to quickly test your own
knowledge, ask yourself which of these support negative indices:
'abcde'[-2] # builtin string class, bracket
syntax
'abcde'.__getitem__(-2) # builtin class, magic method
collections.deque('abcde')[-2] # extension class, bracket syntax
collections.deque('abcde').__getitem__[-2] # extension class, magic
method
class S(str):
pass
class T(str):
def __getitem__(self, i):
print(i)
return 'd'
class U(str):
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
class V(str):
'str subclass overriding key methods'
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'
class W:
'object subclass registered as Sequence'
def __init__(self, data):
self.data = data
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'
collections.Sequence.register(V)
class X(collections.Sequence):
'Subclass the Sequence abstract base class'
def __init__(self, data):
self.data = data
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'
Essentially, the example raise these questions:
* Are negative indices tied to the grammar (i.e. binary subscription)?
* Does it matter if you call a magic method instead of using
brackets?
* Are they tied to particular concrete classes (str, list, tuple,
deque, dict)?
* If so, how does subclassing affect index handling?
* What parts of user defined classes affect handling of indices?
* Are they part of the abstract layer (PyObject_GetItem)?
* Are they tied to the Sequence ABC?
* How are mapping lookups differentiated (d[-2] where d is a Mapping)?
------------- Answers below --------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The first five examples support negative indices. The builtin
sequence types: str, deque, list and tuple, all have negative index
support built in to their concrete implementations (such as
str.__getitem__) and that behavior persists for subclassers as long as
they don't override __getitem__.
The parse of "a['x']" matches the "subscription" rule in Grammar/
Grammar. That compiles the BINARY_SUBSCR opcode which pops off the
evaluated expressions for "a" and "x" and calls the abstract layer,
PyObject_GetItem() which routes most calls to the concrete
implementation (i.e. myobj.__getitem__).
The exception to concrete dispatch is a fast path for builtin classes
or their subclasses that
1) are recognized as sequences (i.e. not a dictionary and have not
overridden __getitem__)
2) where "x" is an index (i.e. not a float)
If so, it is passed to an intermediate abstract layer,
PySequence_SetItem() which will apply negative index handling iff the
sequence defines a __len__ method.
In other words, the handling of negative indices is always up to the
concrete class.
It is not part of the grammar, or the definition of subscriptions, or
the dispatch sequence.
The docs guarantee that Python's builtin sequences implement support
for negative indices ( http://docs.python.org/dev/reference/expressions.html#subscriptions
) and they all do so through their individual __getitem__
implementations. Builtin Python sequences also have a fast path that
make negative index handling automatic, but this is just an invisible
optimization and can be bypassed by calling myobj.__getitem__
directly.
Note on slicing: slices are handled the same way (at least in 3.x),
so it is the responsibility of concrete classes to add support if they
choose to do so. Some, like lists tuples and string do support
slicing and some objects like deque do not. Interestingly, the grammar
has rules for slicing, but that is implemented as making a slice
instance as the argument to the lookup. The actual lookup work is
dispatched to the concrete class and there is no fast path along the
way.
Hope you all found this to be informative,
Raymond