How Python works: What do you know about support for negative indices?

R

Raymond Hettinger

Hello Folks.

It doesn't seem to be common knowledge when and how a[x] gets
translated to a[x+len(x)]. So, here's a short info post on how Python
supports negative indices for sequences.

I've put the answer below, but if you want to quickly test your own
knowledge, ask yourself which of these support negative indices:

'abcde'[-2] # builtin string class, bracket
syntax

'abcde'.__getitem__(-2) # builtin class, magic method

collections.deque('abcde')[-2] # extension class, bracket syntax

collections.deque('abcde').__getitem__[-2] # extension class, magic
method

class S(str):
pass

class T(str):
def __getitem__(self, i):
print(i)
return 'd'

class U(str):
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5

class V(str):
'str subclass overriding key methods'
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'

class W:
'object subclass registered as Sequence'
def __init__(self, data):
self.data = data
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'
collections.Sequence.register(V)

class X(collections.Sequence):
'Subclass the Sequence abstract base class'
def __init__(self, data):
self.data = data
def __getitem__(self, i):
print(i)
return 'd'
def __len__(self):
return 5
def __iter__(self):
return iter('abcde')
def __contains__(self, val):
return val in 'abcde'

Essentially, the example raise these questions:

* Are negative indices tied to the grammar (i.e. binary subscription)?
* Does it matter if you call a magic method instead of using
brackets?
* Are they tied to particular concrete classes (str, list, tuple,
deque, dict)?
* If so, how does subclassing affect index handling?
* What parts of user defined classes affect handling of indices?
* Are they part of the abstract layer (PyObject_GetItem)?
* Are they tied to the Sequence ABC?
* How are mapping lookups differentiated (d[-2] where d is a Mapping)?

------------- Answers below --------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

The first five examples support negative indices. The builtin
sequence types: str, deque, list and tuple, all have negative index
support built in to their concrete implementations (such as
str.__getitem__) and that behavior persists for subclassers as long as
they don't override __getitem__.

The parse of "a['x']" matches the "subscription" rule in Grammar/
Grammar. That compiles the BINARY_SUBSCR opcode which pops off the
evaluated expressions for "a" and "x" and calls the abstract layer,
PyObject_GetItem() which routes most calls to the concrete
implementation (i.e. myobj.__getitem__).

The exception to concrete dispatch is a fast path for builtin classes
or their subclasses that
1) are recognized as sequences (i.e. not a dictionary and have not
overridden __getitem__)
2) where "x" is an index (i.e. not a float)
If so, it is passed to an intermediate abstract layer,
PySequence_SetItem() which will apply negative index handling iff the
sequence defines a __len__ method.

In other words, the handling of negative indices is always up to the
concrete class.
It is not part of the grammar, or the definition of subscriptions, or
the dispatch sequence.

The docs guarantee that Python's builtin sequences implement support
for negative indices ( http://docs.python.org/dev/reference/expressions.html#subscriptions
) and they all do so through their individual __getitem__
implementations. Builtin Python sequences also have a fast path that
make negative index handling automatic, but this is just an invisible
optimization and can be bypassed by calling myobj.__getitem__
directly.

Note on slicing: slices are handled the same way (at least in 3.x),
so it is the responsibility of concrete classes to add support if they
choose to do so. Some, like lists tuples and string do support
slicing and some objects like deque do not. Interestingly, the grammar
has rules for slicing, but that is implemented as making a slice
instance as the argument to the lookup. The actual lookup work is
dispatched to the concrete class and there is no fast path along the
way.

Hope you all found this to be informative,


Raymond
 
M

Mark Tolonen

Ben Finney said:
Raymond Hettinger said:
It doesn't seem to be common knowledge when and how a[x] gets
translated to a[x+len(x)]. So, here's a short info post on how Python
supports negative indices for sequences.

Thanks for this. Could you post your messages using a channel that
doesn't arbitrarily split your paragraphs into long-short-long-short
lines? It makes paragraphs burdensome to read, and I skipped most of the
message because of that.

I encourage anyone whose messages are munged like that to seek
correction from their mail service provider, and switch to a different
one until it's fixed.

--
\ “Oh, I realize it's a penny here and a penny there, but look at |
`\ me: I've worked myself up from nothing to a state of extreme |
_o__) poverty.†—Groucho Marx |
Ben Finney

It came across fine for me (on much maligned Outlook Express, no less).

-Mark
 
N

Neil Hodgson

Mark Tolonen:
It came across fine for me (on much maligned Outlook Express, no less).

Yes, looks fine to me both in Thunderbird (news, not mailing list)
and at Google Groups. There is a single text part with all lines except
an URL easily within 80 columns. Perhaps there is a problem in Ben's
reader or in the mailing list gateway.

Neil
 
U

Ulrich Eckhardt

Raymond said:
collections.deque('abcde').__getitem__[-2] # extension class, magic
method

Small nit: You don't mean [square] brackets here, right?

Otherwise, good posting, thank you!

Uli
 
G

Giacomo Boffi

Ben Finney said:
Raymond Hettinger said:
It doesn't seem to be common knowledge when and how a[x] gets
translated to a[x+len(x)]. So, here's a short info post on how Python
supports negative indices for sequences.

Thanks for this. Could you post your messages using a channel that
doesn't arbitrarily split your paragraphs into long-short-long-short
lines?

hi Ben, i see that you uses gnus... well, it's gnus that does the
unwanted formatting

try C-u g, as dettailed below, ciao

g runs `gnus-summary-show-article'

`gnus-summary-show-article' is an interactive compiled Lisp function
-- loaded from "gnus-sum"
(gnus-summary-show-article &optional ARG)

Documentation:
Force redisplaying of the current article.
If ARG (the prefix) is a number, show the article with the charset
defined in `gnus-summary-show-article-charset-alist', or the charset
input.
If ARG (the prefix) is non-nil and not a number, show the raw article
without any article massaging functions being run. Normally, the key
strokes are `C-u g'.
 
S

Steven D'Aprano

Hello Folks.

It doesn't seem to be common knowledge when and how a[x] gets translated
to a[x+len(x)]. So, here's a short info post on how Python supports
negative indices for sequences. [...]
Hope you all found this to be informative,


Thanks Raymond!
 
D

Daniel Fetchinson

Raymond Hettinger said:
It doesn't seem to be common knowledge when and how a[x] gets
translated to a[x+len(x)]. So, here's a short info post on how Python
supports negative indices for sequences.

Thanks for this. Could you post your messages using a channel that
doesn't arbitrarily split your paragraphs into long-short-long-short
lines?

It came across fine for me as well (gmail with basic html interface).
It makes paragraphs burdensome to read, and I skipped most of the
message because of that.

You might want to switch to a client where you do not have this problem.
I encourage anyone whose messages are munged like that to seek
correction from their mail service provider, and switch to a different
one until it's fixed.

I encourage anyone who has problems with reading various emails,
newsgroup postings, forums and what not, to start using modern tools
that work with the vast majority of other tools.

Cheers,
Daniel
 
T

Terry Reedy

The docs guarantee that Python's builtin sequences implement support
for negative indices (

http://docs.python.org/dev/reference/expressions.html#subscriptions

The relevant paragraphs are
"
For built-in objects, there are two types of objects that support
subscription:

If the primary is a mapping, the expression list must evaluate to an
object whose value is one of the keys of the mapping, and the
subscription selects the value in the mapping that corresponds to that
key. (The expression list is a tuple except if it has exactly one item.)

If the primary is a sequence, the expression (list) must evaluate to an
integer. If this value is negative, the length of the sequence is added
to it (so that, e.g., x[-1] selects the last item of x.) The resulting
value must be a nonnegative integer less than the number of items in the
sequence, and the subscription selects the item whose index is that
value (counting from zero).
"

Reading the third paragraph out of context, one can miss the restriction
to built-in objects. I had assumed that the conversion using len(), when
available, happened prior to the __getitem__ call. I believe I need to
add the restriction in my discussion of negative indexing in my book. I
would like the above rewritten something like the following:
"
Two types of built-in objects support subscription as primaries:
mappings and sequences.

For built-in mappings, the....

For built-in sequences, the ...
"

The second paragraph was written before defaultdict and does not apply
to them. I presume that it is an extension rather than built-in class
for the purpose of the Reference.
Hope you all found this to be informative,

Definitely. I save a copy for future reference.
 
A

Aahz

Attribution missing:

I encourage anyone who has problems with reading various emails,
newsgroup postings, forums and what not, to start using modern tools
that work with the vast majority of other tools.

Why? Raymond's post worked fine for me with trn3.6....
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

[on old computer technologies and programmers] "Fancy tail fins on a
brand new '59 Cadillac didn't mean throwing out a whole generation of
mechanics who started with model As." --Andrew Dalke
 
A

Aahz

Ben Finney said:
Raymond Hettinger said:
It doesn't seem to be common knowledge when and how a[x] gets
translated to a[x+len(x)]. So, here's a short info post on how
Python supports negative indices for sequences.

Thanks for this. Could you post your messages using a channel that
doesn't arbitrarily split your paragraphs into long-short-long-short
lines? It makes paragraphs burdensome to read, and I skipped most of
the message because of that.

For those who think the problem may be with the recipient's software, I
see the same annoying line-wrapping problems in the archived message
<URL:http://mail.python.org/pipermail/python-list/2010-September/1255167.html>.

Still looks like *your* problem to me; except for exactly one paragraph,
I don't see comb-style formatting in Lynx at that URL.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

[on old computer technologies and programmers] "Fancy tail fins on a
brand new '59 Cadillac didn't mean throwing out a whole generation of
mechanics who started with model As." --Andrew Dalke
 
N

Neil Hodgson

Ben Finney:
For those who think the problem may be with the recipient's software, I
see the same annoying line-wrapping problems in the archived message
<URL:http://mail.python.org/pipermail/python-list/2010-September/1255167.html>.

That looks well-formatted to me and just the same as I see in a news
reader. There appear to be deliberate wraps at sentence end or automatic
wraps to fit <80 columns. Which lines are wrong and why are they wrong?

Neil
 
A

Aahz

The automatic wraps in the code presented in the message are wrong. The
automatic wraps in the bullet point list are, if not wrong, at least
presumably unintended.

That is considerably different from your original claim that the post
contained formatting that was "long-short-long-short", aka comb
formatting.
I hope that clears up what I meant in this case. The effort devoted to
explaining this issue far outweighs the burden it caused initially. I'll
try to be more explicit when presenting it in future, to forestall this.

s/explicit/accurate/

Had you noted that some lines of code were wrapped short, I would have
agreed with you, but I also would have noted that it's not a big deal.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

[on old computer technologies and programmers] "Fancy tail fins on a
brand new '59 Cadillac didn't mean throwing out a whole generation of
mechanics who started with model As." --Andrew Dalke
 
R

Raymond Hettinger

[Ben Finney]
I encourage anyone whose messages are munged like that to seek
correction from their mail service provider, and switch to a different
one until it's fixed.

The post was typed on a mobile device into the text window on Google
Groups.

It's too bad that inane concerns with newline placement overwhelmed
the substance of the post.


Raymond
 
R

Raymond Hettinger

Reading the third paragraph out of context, one can miss the restriction
to built-in objects. I had assumed that the conversion using len(), when
available, happened prior to the __getitem__ call.

Yes, that's a common misconception. It is probably based on
the current wording of the docs and on the PySequence_GetItem()
optimized fast-path for builtin and extension sequences.

From the users point-of-view, the important thing is that it
(effectively) occurs in the __getitem__() code, that builtin
sequences and extensions support it, and that else where it is
optional.

Another way of saying it is: if you are ever writing a __getitem__()
method in Python (even for a subclass of a builtin sequence), then it
is up to you to decide whether to add support for negative indices
and slicing.


Raymond
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top