Terry said:
> "Paul Rubin" wrote:
>
>>Really it's x[-1]'s behavior that should go, not find/rfind.
>
> I complete disagree, x[-1] as an abbreviation of x[len(x)-1] is extremely
> useful, especially when 'x' is an expression instead of a name.
Hear us out; your disagreement might not be so complete as you
think. From-the-far-end indexing is too useful a feature to
trash. If you look back several posts, you'll see that the
suggestion here is that the index expression should explicitly
call for it, rather than treat negative integers as a special
case.
I wrote up and sent off my proposal, and once the PEP-Editors
respond, I'll be pitching it on the python-dev list. Below is
the version I sent (not yet a listed PEP).
--
--Bryan
PEP: -1
Title: Improved from-the-end indexing and slicing
Version: $Revision: 1.00 $
Last-Modified: $Date: 2005/08/26 00:00:00 $
Author: Bryan G. Olson <
[email protected]>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 26 Aug 2005
Post-History:
Abstract
To index or slice a sequence from the far end, we propose
using a symbol, '$', to stand for the length, instead of
Python's current special-case interpretation of negative
subscripts. Where Python currently uses:
sequence[-i]
We propose:
sequence[$ - i]
Python's treatment of negative indexes as offsets from the
high end of a sequence causes minor obvious problems and
major subtle ones. This PEP proposes a consistent meaning
for indexes, yet still supports from-the-far-end
indexing. Use of new syntax avoids breaking existing code.
Specification
We propose a new style of slicing and indexing for Python
sequences. Instead of:
sequence[start : stop : step]
new-style slicing uses the syntax:
sequence[start ; stop ; step]
It works like current slicing, except that negative start or
stop values do not trigger from-the-high-end interpretation.
Omissions and 'None' work the same as in old-style slicing.
Within the square-brackets, the '$' symbol stands for the
length of the sequence. One can index from the high end by
subtracting the index from '$'. Instead of:
seq[3 : -4]
we write:
seq[3 ; $ - 4]
When square-brackets appear within other square-brackets,
the inner-most bracket-pair determines which sequence '$'
describes. The length of the next-outer sequence is denoted
by '$1', and the next-out after than by '$2', and so on. The
symbol '$0' behaves identically to '$'. Resolution of $x is
syntactic; a callable object invoked within square brackets
cannot use the symbol to examine the context of the call.
The '$' notation also works in simple (non-slice) indexing.
Instead of:
seq[-2]
we write:
seq[$ - 2]
If we did not care about backward compatibility, new-style
slicing would define seq[-2] to be out-of-bounds. Of course
we do care about backward compatibility, and rejecting
negative indexes would break way too much code. For now,
simple indexing with a negative subscript (and no '$') must
continue to index from the high end, as a deprecated
feature. The presence of '$' always indicates new-style
indexing, so a programmer who needs a negative index to
trigger a range error can write:
seq[($ - $) + index]
Motivation
From-the-far-end indexing is such a useful feature that we
cannot reasonably propose its removal; nevertheless Python's
current method, which is to treat a range of negative
indexes as special cases, is warty. The wart bites novice or
imperfect Pythoners by not raising an exceptions when they
need to know about a bug. For example, the following code
prints 'y' with no sign of error:
s = 'buggy'
print s[s.find('w')]
The wart becomes an even bigger problem with more
sophisticated use of Python sequences. What is the 'stop'
value for a slice when the step is negative and the slice
includes the zero index? An instance of Python's slice type
will report that the stop value is -1, but if we use this
stop value to slice, it gets misinterpreted as the last
index in the sequence. Here's an example:
class BuggerAll:
def __init__(self, somelist):
self.sequence = somelist[:]
def __getitem__(self, key):
if isinstance(key, slice):
start, stop, step = key.indices(len(self.sequence))
# print 'Slice says start, stop, step are:', start,
stop, step
return self.sequence[start : stop : step]
print range(10) [None : None : -2]
print BuggerAll(range(10))[None : None : -2]
The above prints:
[9, 7, 5, 3, 1]
[]
Un-commenting the print statement in __getitem__ shows:
Slice says start, stop, step are: 9 -1 -2
The slice object seems to think that -1 is a valid exclusive
bound, but when using it to actually slice, Python
interprets the negative number as an offset from the high
end of the sequence.
Steven Bethard offered the simpler example:
py> range(10)[slice(None, None, -2)]
[9, 7, 5, 3, 1]
py> slice(None, None, -2).indices(10)
(9, -1, -2)
py> range(10)[9:-1:-2]
[]
The double-meaning of -1, as both an exclusive stopping
bound and an alias for the highest valid index, is just
plain whacked. So what should the slice object return? With
Python's current indexing/slicing, there is no value that
just works. 'None' will work as a stop value in a slice, but
index arithmetic will fail. The value 0 - (len(sequence) +
1) will work as a stop value, and slice arithmetic and
range() will happily use it, but the result is not what the
programmer probably intended.
The problem is subtle. A Python sequence starts at index
zero. There is some appeal to giving negative indexes a
useful interpretation, on the theory that they were invalid
as subscripts and thus useless otherwise. That theory is
wrong, because negative indexes were already useful, even
though not legal subscripts, and the reinterpretation often
breaks their exiting use. Specifically, negative indexes are
useful in index arithmetic, and as exclusive stopping
bounds.
The problem is fixable. We propose that negative indexes not
be treated as a special case. To index from the far end of a
sequence, we use a syntax that explicitly calls for far-end
indexing.
Rationale
New-style slicing/indexing is designed to fix the problems
described above, yet live happily in Python along-side the
old style. The new syntax leaves the meaning of existing
code unchanged, and is even more Pythonic than current
Python.
Semicolons look a lot like colons, so the new semicolon
syntax follows the rule that things that are similar should
look similar. The semicolon syntax is currently illegal, so
its addition will not break existing code. Python is
historically tied to C, and the semicolon syntax is
evocative of the similar start-stop-step expressions of C's
'for' loop. JPython is tied to Java, which uses a similar
'for' loop syntax.
The '$' character currently has no place in a Python index,
so its new interpretation will not break existing code. We
chose it over other unused symbols because the usage roughly
corresponds to its meaning in the Python library's regular
expression module.
We expect use of the $0, $1, $2 ... syntax to be rare;
nevertheless, it has a Pythonic consistency. Thanks to Paul
Rubin for advocating it over the inferior multiple-$ syntax
that this author initially proposed.
Backwards Compatibility
To avoid braking code, we use new syntax that is currently
illegal. The new syntax more-or-less looks like current
Python, which may help Python programmers adjust.
User-defined classes that implement the sequence protocol
are likely to work, unchanged, with new-style slicing.
'Likely' is not certain; we've found one subtle issue (and
there may be others):
Currently, user-defined classes can implement Python
subscripting and slicing without implementing Python's len()
function. In our proposal, the '$' symbol stands for the
sequence's length, so classes must be able to report their
length in order for $ to work within their slices and
indexes.
Specifically, to support new-style slicing, a class that
accepts index or slice arguments to any of:
__getitem__
__setitem__
__delitem__
__getslice__
__setslice__
__delslice__
must also consistently implement:
__len__
Sane programmers already follow this rule.
Copyright:
This document has been placed in the public domain.