Why Python does *SLICING* the way it does??

T

Torsten Bronger

Hallöchen!

Bernhard Herzog said:
There are very good reasons for half-open intervals and starting
at 0 apart from memory organization. Dijkstra explained this
quite well in
http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

I see only one argument there: "Inclusion of the upper bound would
then force the latter to be unnatural by the time the sequence has
shrunk to the empty one." While this surely is unaesthetical, I
don't think that one should optimise syntax according to this very
special case. Besides, no language I know of has probems with
negative values.

Well, and the argument for "0" derives from that, according to
Dijkstra.

Tschö,
Torsten.
 
P

Peter Hansen

Terry said:
However, I used to make "off by one" errors all the time in both C and Fortran,
whereas I hardly ever make them in Python.

This should probably be the overriding concern in this
case.

I can't remember the last time I made an off-by-one error
in Python (or, really, whether I ever have), whereas I
can't remember the last C program I wrote which didn't have
one.
So I like Python's slicing because it "bites *less*" than intervals in C or Fortran.

+1 QOTW
 
B

Bernhard Herzog

Torsten Bronger said:
I see only one argument there: "Inclusion of the upper bound would
then force the latter to be unnatural by the time the sequence has
shrunk to the empty one." While this surely is unaesthetical, I
don't think that one should optimise syntax according to this very
special case.

The other main argument for startig at 0 is that if you do not include
the upper bound and start at 1 then the indices i of a sequence of N
values are 1 <= i < N + 1 which is not as nice as 0 <= i < N.
opportunity for an off by one error.

Then there's also that, starting at 0, "an element's ordinal (subscript)
equals the number of elements preceding it in the sequence."


Bernhard
 
T

Torsten Bronger

Hallöchen!

Bernhard Herzog said:
Torsten Bronger said:

I see only one argument there: "Inclusion of the upper bound
would then force the latter to be unnatural by the time the
sequence has shrunk to the empty one." [...]

The other main argument for startig at 0 is that if you do not
include the upper bound and start at 1 then the indices i of a
sequence of N values are 1 <= i < N + 1 which is not as nice as 0
<= i < N. opportunity for an off by one error.

The alternative is starting with 1 and using "lower <= i <= upper".
(Dijkstra's second choice.)
Then there's also that, starting at 0, "an element's ordinal
(subscript) equals the number of elements preceding it in the
sequence."

Granted, but you trade such elegancies for other uglinesses. A
couple of times I changed the lower limit of some data structure
from 0 to 1 or vice versa, and ended up exchanging a "+1" here for a
"-1" there.

It's a matter of what you are accustomed to, I suspect. We
(programmers) think with the 0-notation, but non-spoiled minds
probably not.

Tschö,
Torsten.
 
R

Roy Smith

Part of the reason may be that most loops over lists involve
iterators, where the details of the index limits are hidden. In
Python, you write:

for item in myList:
blah

but in C and Fortran you would write:

for (i = 0; i < MAXLIST; ++i) {
blah;

do 10 i = 1, MAXLIST
10 blah

both endpoints are mentioned explicitly. C++/STL also uses iterators,
but the syntax is repulsive.
 
J

James Stroud

Many people I know ask why Python does slicing the way it does.....

Can anyone /please/ give me a good defense/justification???

Here you go, no remembering "+1" or "-1". Also, see the hundreds of other
times this topic has graced this list.
i = 4
str = "asdfjkl;"
print str[:i]+str[i:]
asdfjkl;

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
J

James Carroll

If you have five elements, numbered 0,1,2,3,4 and you ask for the
elements starting with the first one, and so on for the length you
would have [0:length]. So [0:5] gives you elemets 0,1,2,3,4. Think
of the weirdess if you had to ask for [0:length-1] to get length
elements...

One based 1... n are what I call _counting numbers_
Zero based 0... n-1 are the _indexes_ (offsets) into the collection.
The first element is at offset 0.

It is a little weired that slicing does [index: count] instead of
[index:index] or [count:count] I agree, but python really does just
flow wonderfully once you see how clean code is that's written [index:
count].

In C++ the STL also has the idea that there's an 'end()' iterator that
is really one element past the end of your container. It makes things
flow really well there too. All code interates up to but not
including the last element you specify. always.

-Jim
 
J

Javier Bezos

Many people I know ask why Python does slicing the way it does.....

Can anyone /please/ give me a good defense/justification???

I'm referring to why mystring[:4] gives me
elements 0, 1, 2 and 3 but *NOT* mystring[4] (5th element).

Many people don't like idea that 5th element is not invited.

(BTW, yes I'm aware of the explanation where slicing
is shown to involve slices _between_ elements. This
doesn't explain why this is *best* way to do it.)

Recently there was a short (sub)thread about that.
One of my messages (against half-open slices) is,
for example

http://groups-beta.google.com/group/comp.lang.python/msg/5532dd50b57853b1

Javier

___________________________________________________________
Javier Bezos | TeX y tipografía
jbezos at wanadoo dot es | http://perso.wanadoo.es/jbezos
.............................|...............................
CervanTeX (Spanish TUG) | http://www.cervantex.org
 
J

Javier Bezos

James Stroud said:
I like this, it works for any integer.
str="asdfjkl;"
i=-400
print str[:i]+str[i:] asdfjkl;
i = 65534214
print str[:i]+str[i:]
asdfjkl;

Actually, this has no relation with the half-open
slices but with the fact that if i goes beyond
the limit of the string then Python, wisely, doesn't
raise an error but instead return the string until
the end. When people say that half-open slices work
for every i, they are tinking in the case i=0.

Javier

___________________________________________________________
Javier Bezos | TeX y tipografía
jbezos at wanadoo dot es | http://perso.wanadoo.es/jbezos
.............................|...............................
CervanTeX (Spanish TUG) | http://www.cervantex.org
 
T

Terry Hancock

Part of the reason may be that most loops over lists involve
iterators,
both endpoints are mentioned explicitly. C++/STL also uses iterators,
but the syntax is repulsive.

That's true of course. It's more likely to show up in manipulating
lists or strings. And Python provides a much richer environment for
processing strings, so one has to deal with explicit indexing much
less.

But I still think that I make fewer error per instance of dealing with
intervals. It's rare that I even have to think about it much when
writing such a thing. Negative indexing also helps a lot.

Terry
 
P

Peter Hansen

James said:
If you have five elements, numbered 0,1,2,3,4 and you ask for the
elements starting with the first one, and so on for the length you
would have [0:length]. So [0:5] gives you elemets 0,1,2,3,4. Think
of the weirdess if you had to ask for [0:length-1] to get length
elements... [...]
It is a little weired that slicing does [index: count] instead of
[index:index] or [count:count] I agree, but python really does just
flow wonderfully once you see how clean code is that's written [index:
count].

I think you got confused part way through that. Python's
slices are *not* index:count, but are index:index. It's
just that for the example you gave, starting at 0, they
happen to amount to the same thing...

-Peter
 
M

Mike Meyer

Antoon Pardon said:
Op 2005-04-20 said:
Hallöchen!

Many people I know ask why Python does slicing the way it does.....

Can anyone /please/ give me a good defense/justification???

I'm referring to why mystring[:4] gives me elements 0, 1, 2 and 3
but *NOT* mystring[4] (5th element).

mystring[:4] can be read as "the first four characters of
mystring". If it included mystring[4], you'd have to read it as
"the first five characters of mystring", which wouldn't match the
appearance of '4' in the slice.

[...]

It all makes perfect sense when you look at it this way!

Well, also in my experience every variant has its warts. You'll
never avoid the "i+1" or "i-1" expressions in your indices or loops
(or your mind ;).

It's interesting to muse about a language that starts at "1" for all
arrays and strings, as some more or less obsolete languages do. I
think this is more intuitive, since most people (including
mathematicians) start counting at "1". The reason for starting at
"0" is easier memory address calculation, so nothing for really high
level languages.

Personnaly I would like to have the choice. Sometimes I prefer to
start at 0, sometimes at 1 and other times at -13 or +7.

Some HLLs have had arrays that let you declare the first index as well
as the size for decades now. Algol-68, for instance. Modern language
still offer such features, but none seem to have been very popular.

Of course, in any reasonable OO language, you can roll your own. So
the question comes down to whether or not these get into the standard
library.

<mike
 
A

Antoon Pardon

Op 2005-04-20 said:
Propose one, and I won't write it off without thinking, but my bias is
way against it from experience. Knowledge gets scattered across the
program,

Knowledge always gets scattered across the program. The end
index can vary endlessly but that doesn't seem to worry
you. So why is a varying start index so worrysome?
unless you're defining the start index every time you use the
list, which seems no better than adding an offset to me.

I don't see why the start index can't be accessible through
a method or function just like the length of a list is now.

My favourite would be a range method so we would have
the following idiom:

for i in lst.range():
do something with lst
 
A

Antoon Pardon

Op 2005-04-20 said:
Antoon Pardon said:
Op 2005-04-20 said:
Personnaly I would like to have the choice. Sometimes I prefer to
start at 0, sometimes at 1 and other times at -13 or +7.

Argggh. Having two (or more!) ways to do it, would mean that every time I
read somebody else's code, I would have to figure out which flavor they are
using before I could understand what their code meant. That would be evil.

This is nonsens. table = j, just associates value j with key i.
That is the same independend from whether the keys can start from
0 or some other value. Do you also consider it more ways because
the keys can end in different values?


There are certainly many examples where the specific value of the
first key makes no difference. A good example would be

for element in myList:
print element

On the other hand, what output does

myList = ["spam", "eggs", "bacon"]
print myList[1]

produce? In a language where some lists start with 0 and some start
with 1, I don't have enough information just by looking at the above
code.


Yes you have. The fact that a language allows a choice doesn't
contradict there is a default, when no choice is specified.

My preference would be that it would produce "spam", because
if you want the *first*" element, you want the element
associated withe the key 1.

Or maybe the language would force you to give a start index,
so that you would have to write:

MyList = [3 -> "spam", "eggs", "bacon"]

End of course the language would provide instances or methods
so you could ask what the first index was.
 
D

Dan Bishop

What languages besides Python use the Python slicing convention?

Java uses it for the "substring" method of strings.
In C starting at
0 may be justified because of the connection between array subscripting
and pointer arithmetic, but Python is a higher-level language where
such considerations are less relevant.

Maybe less relevant, but relevant nonetheless.

First, there's the desire for familiarity. Many Python programmers are
also C programmers, and that fact has had an influence on the
development of the language. That's why we write "x += 0x20" rather
than "x := x + $20". Why not array indexing as well?

More importantly, there are reasons for counting from zero that have
nothing to do with pointers.

The biggest reason involves modular arithmetic: r=n%d satifies 0 <= r <
d, which conveniently matches Python's array syntax.

DAY_NAMES = ["Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday"]
def weekday_name(date):
return DAY_NAMES[date.toordinal() % 7]

Modular arithmetic's preference for 0-based counting goes beyond array
indexing. For example, consider our notation for time, from 00:00:00
to 23:59:59.

def time_add(time, delta):
"""
time = an (hour, minute, second) tuple for a time of day
delta = an (hour, minute, second) tuple for an amount of time

Returns time+delta, as an (hour, minute, second) tuple.
"""
hour = time[0] + delta[0]
minute = time[1] + delta[1]
second = time[2] + delta[2]
# Normalize the time
second = ((hour * 60) + minute) * 60 + second
minute, second = divmod(second, 60)
hour, minute = divmod(minute, 60)
hour %= 24
return hour, minute, second

Imagine that the time notation went from 01:01:01 to 24:60:60. Try
writing a time_add function for that. The only simple way I can think
of is to temporarily convert to zero-based notation!

def time_add(time, delta):
# Add like terms and convert to zero-based notation.
hour = time[0] + delta[0] - 1
minute = time[1] + delta[1] - 1
second = time[2] + delta[2] - 1
# Normalize the time
second = ((hour * 60) + minute) * 60 + second
minute, second = divmod(second, 60)
hour, minute = divmod(minute, 60)
hour %= 24
# Convert back to one-based notation on output
return hour + 1, minute + 1, second + 1
Along the same lines, I think the REQUIREMENT that x[0] rather than
x[1] be the first element of list x is a mistake. At least the
programmer should have a choice, as in Fortran or VBA.

If you really want 1-based arrays, just do what most BASIC programmers
do: Ignore x[0].
months = [None, "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
print months[4]
Apr
 
A

Antoon Pardon

Op 2005-04-20 said:
This should probably be the overriding concern in this
case.

I can't remember the last time I made an off-by-one error
in Python (or, really, whether I ever have), whereas I
can't remember the last C program I wrote which didn't have
one.

I do so frequently.

I often have to process files where each line is a record,
each record having a number of fields seperated with a
delimiter.

So the idiom for treating such files has become

for line in the_file:
lst = line.split(delimiter)

The problem is that the fields in lst are associated
with a number that is off by one as they are normally
counted. If I go and ask my colleague which field
contains some specific data and he answers:
the 5th, I have to remind my self I want lst[4]

This is often a cause for errors.
 
A

Antoon Pardon

Op 2005-04-20 said:
Although I would classify that as a "rare use case". So, it "ought
to be possible to do it, but not necessarily easy".

The -13 and +7 may be rare cases, but I wouldn't call one as a
start index a rare case. I often get data in files where each line
is a record with delimited fields. When I ask which field contains
certain data, I get an answer that depends on counting starting
with one. So starting with one as index would seem the more natural
choice here.

As far as I'm concerend, 0 as a starting index is a rare use
case. In the programs I write about 45% is a natural 1 for
first index, about 50% is a don't care and 5% is a natural
0 first index and some fraction is others. So yes in most
of my programs 0 as a starting index works fine, but that
is because in a lot of programs it doesn't matter much
which would be the first index.

Now I can understand that for others the 0 as a natural
start index is more frequent, but if we have two frequent
use cases, it seems to me the language should provide the
choice more easily as it does now.
 
R

Raymond Hettinger

[Antoon Pardon]
I don't see why the start index can't be accessible through
a method or function just like the length of a list is now.

My favourite would be a range method so we would have
the following idiom:

for i in lst.range():
do something with lst


After going to all that trouble, you might as well also get the value at that
position:

for i, x in enumerate(lst):
do something with lst also known as x


Raymond Hettinger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,236
Messages
2,571,188
Members
47,822
Latest member
mariya234

Latest Threads

Top