Suggested feature: slice syntax within tuples (or even more generally)?

S

stephenwlin

Hello,

Would it be feasible to modify the Python grammar to allow ':' to generate slice objects everywhere rather than just indexers and top-level tuples of indexers?

Right now in Py2.7, Py3.3:
"obj[:,2]" yields "obj[slice(None),2]"
but
"obj[:),1),2]" is an error, instead of "obj[(slice(None), 1), 2]"

Also, more generally, you could imagine this working in (almost?) any expression without ambiguity:
"a = (1:2)" could yield "a = slice(1,2)"

See motivating discussion for this at:
https://github.com/pydata/pandas/issues/2866

There might not be very many use cases for this currently outside of pandas, but apparently the grammar was already modified to allow '...' outside indexers and there's probably even fewer valid use cases for that:
"e = ..." yields "e = Ellipsis"

Would there be any downside to changing the handling of ':' as well? It might even make the grammar simpler, in some ways, since indexers won't have to be treated specially.

Let me know if you have any thoughts.

Thanks!
Stephen
 
T

Terry Reedy

Hello,

Would it be feasible to modify the Python grammar to allow ':' to generate slice objects everywhere rather than just indexers and top-level tuples of indexers?

Right now in Py2.7, Py3.3:
"obj[:,2]" yields "obj[slice(None),2]"
but
"obj[:),1),2]" is an error, instead of "obj[(slice(None), 1), 2]"

Also, more generally, you could imagine this working in (almost?) any expression without ambiguity:
"a = (1:2)" could yield "a = slice(1,2)"

I believe the idea of slice literals has been rejected.
See motivating discussion for this at:
https://github.com/pydata/pandas/issues/2866

There might not be very many use cases for this currently outside of pandas, but apparently the grammar was already modified to allow '...' outside indexers and there's probably even fewer valid use cases for that:
"e = ..." yields "e = Ellipsis"

One dubious move does not justify another.
 
S

stephenwlin

I believe the idea of slice literals has been rejected.

That's too bad...do you have a link to prior discussion on this and what the reasoning was for rejection? There doesn't seem to be any particular downside and things would be more consistent with slice syntax allowed anywhere.

It would be helpful in other cases as well other than the one linked to, since there's good reason to be able to succinctly create and reuse the same indexer object multiple times without having to convert everything into slice() calls.

-Stephen
 
S

stephenwlin

I believe the idea of slice literals has been rejected.

That's too bad...do you have a link to prior discussion on this and what the reasoning was for rejection? There doesn't seem to be any particular downside and things would be more consistent with slice syntax allowed anywhere.

It would be helpful in other cases as well other than the one linked to, since there's good reason to be able to succinctly create and reuse the same indexer object multiple times without having to convert everything into slice() calls.

-Stephen
 
S

Steven D'Aprano

That's too bad...do you have a link to prior discussion on this and what
the reasoning was for rejection? There doesn't seem to be any particular
downside and things would be more consistent with slice syntax allowed
anywhere.

There's *always* downside to new syntax. The question is, does the
benefit exceed the downside?

- new syntax requires more complexity in the parser;

- new tests for it;

- more documentation;

- increase in the number of things people have to learn before they can
read and write Python code;

- takes the language further away from "executable pseudo-code" and
closer to "line noise";

- somebody has to actually write the code, write the tests, write the
documentation; and somebody else has to review it; for a chronically
short-handed dev team where there are hundreds of bugs and feature
requests in the queue, this is a significant cost.

Now, you might argue that these are all *low* cost, but they're not zero,
and how do they stack up against the benefits?

What are the benefits of syntactic sugar for slice objects?

Personally, there's not much difference to my eye between:


S = slice
S(1, 20, 3)

versus

(1:20:3)

so I'm skeptical that the benefit is much greater than the cost, as low
as that cost may be.

It would be helpful in other cases as well other than the one linked to,
since there's good reason to be able to succinctly create and reuse the
same indexer object multiple times without having to convert everything
into slice() calls.

I don't quite understand this argument. If you mean that you can do this:

s = (1:2) # Like slice(1, 2).
print alist
print blist # reuse the slice object
print clist


you can replace the line s = (1:2) to a call to slice, and still reuse
the slice object. So I don't understand what the syntactic sugar gains
you.
 
S

stephenwlin

There's *always* downside to new syntax. The question is, does the

benefit exceed the downside?

Fair enough points w.r.t with the costs of implementation, etc. I just meant that, from my perspective, it seems like a simplification of the grammar rather than making it more complex, since it just makes ':' work the same way outside of [] as within it, instead of treating [] as a special case. I'm aware there could be some ambiguities with dictionary construction but itshould be pretty easy to resolve with precedence rules.

As for making the language more complicated to learn: to be honest, the fact that ':' was treated specially in [] made things more confusing to me until I realized there was a special grammar specifically for that case (sincethis is not something I would expect coming from other languages). That there would be specific grammar for this case would make more sense if there was something about the __getitem__/__setitem__ protocol that inherently required it, but there isn't really: you're just passing an object to __getitem__ just like you can pass an object to any function, so why would you parse expressions differently in the former case versus the latter? Obviously,this depends on one's individual intuition though, so maybe I'm in the minority here in finding it weird.
What are the benefits of syntactic sugar for slice objects?



Personally, there's not much difference to my eye between:





S = slice

S(1, 20, 3)



versus



(1:20:3)



so I'm skeptical that the benefit is much greater than the cost, as low

as that cost may be.

But if there's no difference, then why have ':' work specially for '[]' operations at all instead of requiring the user to build slice objects manually all the time? It obviously is a convenient and simpler syntax, and there doesn't seem to be any real reason for the artificial restriction that thishappens inside '[]' (and in a shallow manner, so no nested slices or slices within tuples) only.
It would be helpful in other cases as well other than the one linked to,
since there's good reason to be able to succinctly create and reuse the
same indexer object multiple times without having to convert everything
into slice() calls.



I don't quite understand this argument. If you mean that you can do this:



s = (1:2) # Like slice(1, 2).

print alist

print blist # reuse the slice object

print clist





you can replace the line s = (1:2) to a call to slice, and still reuse

the slice object. So I don't understand what the syntactic sugar gains

you.


Yes, but you can't just directly use what you wrote within '[]' outside of it, and there doesn't seem to be any real technical reason for this except for historical evolution of the language. Obviously isn't not that hard to convert everything to call slice() manually but it can get verbose quickly for complex multidimensional slices cases (which are common in numpy, whichis why Travis Oliphant wants this feature as well...)

You can do something hackish like make a dummy object that returns

class Indexer:
def __getitem__(self, obj):
return obj

I = Indexer()

s = I[1:2,..,3:4:-1,::-1]

but that seems that seems mostly to highlight the fact that this is an artificial problem to begin with...'[]' just translates to a function call anyway (more or less), so why treat it differently?

Thanks,

-Stephen
 
S

stephenwlin

http://osdir.com/ml/python.python-3000.devel/2006-05/msg00686.html

http://mail.python.org/pipermail/python-list/2001-August/094909.html



E.g.:



if x:

pass





Is that intended as "if slice(x, None, None)" with a missing colon, or

"if x" with colon supplied?

I don't mean to argue with Guido, but unless I'm missing something, the former would be a syntax error and the latter would not be, so even if it might be ambiguous in isolation it wouldn't be in context. Isn't this somethingthat could be resolved without requiring a mathematically more complex parser than Python already requires? (i.e. one that corresponds to a more complex minimal automaton).

Also, you could easily restrict that ':' cannot be in top-level expressions, so have to be enclosed with either () or [] (so the latter because just aspecific case of a more general rule.)
With the addition of one extra letter, you can use slice notation to

return slice objects:



class SlicerAndDicer:

def __getitem__(self, item):

return item



s = SlicerAndDicer()





And some examples:



py> s[2::5]

slice(2, None, 5)

py> s[::-1]

slice(None, None, -1)

py> s[3, 4]

(3, 4)

py> s[3, 4:6]

(3, slice(4, 6, None))

py> s[7::, 9]

(slice(7, None, None), 9)

Hah, yes. I basically wrote that exact example in my reply to Steven at thesame time you replied. numpy.s_ is similar (although I think it does some extra work for odd reasons).

Anyway this is an okay workaround, but it just seems to highlight the fact that the restriction of using ':' within [] is arbitrary to begin with, since all you're doing is wrapping a function call. Right now, everything which is parsed within f() is also parsed within f[], but the latter is parsingmore things just by virtue of the fact that that it's a [] instead of ().

To be honest, this feels like more of an artifact of historical evolution than anything else. It just doesn't make much sense create a special syntax for parsing expressions into objects in one particular context but not others when there's nothing special about the underlying object protocol that requires it to be handled separately (and perhaps this was not always the case...I've not been with Python long enough to know the particulars of how this was implemented in the past...)

Anyway, thanks for the feedback!

- Stephen
 
S

stephenwlin

Hah, yes. I basically wrote that exact example in my reply to Steven at the same time you replied. numpy.s_ is similar (although I think it does some extra work for odd reasons).

Oops, this is silly in retrospect...sorry, wasn't looking at the From: line carefully enough and didn't realize I was responding to you again.
 
S

stephenwlin

You can't just allow ':' to generate slice objects everwhere without

introducing ambiguity, so your proposal would have to be to allow slice

objects in wider but still restricted contexts.

Yeah, I mentioned that in a follow-up. I'm pretty sure of just allowing it within [] and () would work, though, and still be a pretty consistent/simple grammar.

This would also remove Steven's (i.e. Guido's) objection that about

if x:

This would still preserves the main benefit of allowing you to succinctly create slices in any context you need an expression in, because you can always add () around any expression.

-Stephen
 
I

Ian Kelly

E.g.:

if x:
pass


Is that intended as "if slice(x, None, None)" with a missing colon, or
"if x" with colon supplied?

That's not ambiguous, because the former is simply invalid syntax.
However, consider the following.

if 1: 2:

That could be either a one-line if statement where the condition is 1
and the body is slice(2, None), or it could be the beginning of a
multi-line if block where the condition is slice(1, 2). If the parser
sees that, should it expect the next line to be indented or not? If
it relies on indentation to determine this, then it loses some ability
to warn the user of incorrect indentation.

Then we have dictionary literals:

{1:2:3}

Should that be read as dict([(slice(1, 2), 3)]) or dict([(1, slice(2,
3))])? Or even set([slice(1, 2, 3)])?
 
S

stephenwlin

That's not ambiguous, because the former is simply invalid syntax.

However, consider the following.



if 1: 2:



That could be either a one-line if statement where the condition is 1

and the body is slice(2, None), or it could be the beginning of a

multi-line if block where the condition is slice(1, 2). If the parser

sees that, should it expect the next line to be indented or not? If

it relies on indentation to determine this, then it loses some ability

to warn the user of incorrect indentation.



Then we have dictionary literals:



{1:2:3}



Should that be read as dict([(slice(1, 2), 3)]) or dict([(1, slice(2,

3))])? Or even set([slice(1, 2, 3)])?

Restricting this to within the top level of ()-enclosed expressions would be sufficient to eliminate all ambiguities, though, right? Basically all that needs to change is for expressions within '()' to be parsed identically as are currently parsed in '[]'.

-Stephen
 
S

stephenwlin

That's not ambiguous, because the former is simply invalid syntax.

However, consider the following.



if 1: 2:



That could be either a one-line if statement where the condition is 1

and the body is slice(2, None), or it could be the beginning of a

multi-line if block where the condition is slice(1, 2). If the parser

sees that, should it expect the next line to be indented or not? If

it relies on indentation to determine this, then it loses some ability

to warn the user of incorrect indentation.



Then we have dictionary literals:



{1:2:3}



Should that be read as dict([(slice(1, 2), 3)]) or dict([(1, slice(2,

3))])? Or even set([slice(1, 2, 3)])?

Restricting this to within the top level of ()-enclosed expressions would be sufficient to eliminate all ambiguities, though, right? Basically all that needs to change is for expressions within '()' to be parsed identically as are currently parsed in '[]'.

-Stephen
 
R

Rick Johnson

On Thursday, February 14, 2013 1:58:06 PM UTC-5, Ian wrote:

[snip: quote noise!]

Dude! Please trim this quote noise from your posts. I know Google's quotingmechanism is buggy, but dammit man YOU'RE A PROGRAMER! There is no excuse for not trimming excessive newlines.

============================================================
As to your slicing request.
============================================================

Anybody who knows me KNOWS that i love consistency! So i'm all for applyinga slicing syntax consistently, however, i don't think your approach is thecorrect approach.

To get you going in the correct direction: Ruby uses the "s..e" and "s...e" (where "s" represents the start of the range and "e" represents the end of a range) as syntactic sugar for Range.new(s, e). Two dots create an /inclusive/ range and three dots create an /exclusive/ range. Anyway, enough tutorials, read the doc:

http://www.ruby-doc.org/core-1.9.3/Range.html

Now, i am not suggesting that python should adopt the /exact/ syntax of Ruby, however, i /am/ suggesting that Ruby is more consistent with the range object than Python.

In Ruby:

....you can slice arrays with the range:

rb> a = [1,2,3,4,5]
rb> a[0..-1]
[1,2,3,4,5]
rb> a[0...-1]
[1,2,3,4]

....you can create a range of integers :

rb> r = 1..10
rb> r.to_a()
[1,2,3,4,5,6,7,8,9]

....you can create a range of chars:

rb> r = "a".."d"
rb> r.to_a()
["a", "b", "c", "d"]

....you can use range in a loop:

rb> for x in 0...5;puts "#{x}th iteration";end
0th iteration
1th iteration
2th iteration
3th iteration
4th iteration

....but most importantly, you can do all these things in a consistent mannerusing a consistent syntax!

Python however has the stupid slice function and then sequence indexing, and no consistency between the two! Plus, the for loop uses the range function to create "lazy iterators" instead of employing a consistent "range" syntax.

Consistent syntax and consistent application are the biggest issues with Python ranges as they exist today. That's the starting point.
 
R

Rick Johnson

On Thursday, February 14, 2013 1:58:06 PM UTC-5, Ian wrote:

[snip: quote noise!]

Dude! Please trim this quote noise from your posts. I know Google's quotingmechanism is buggy, but dammit man YOU'RE A PROGRAMER! There is no excuse for not trimming excessive newlines.

============================================================
As to your slicing request.
============================================================

Anybody who knows me KNOWS that i love consistency! So i'm all for applyinga slicing syntax consistently, however, i don't think your approach is thecorrect approach.

To get you going in the correct direction: Ruby uses the "s..e" and "s...e" (where "s" represents the start of the range and "e" represents the end of a range) as syntactic sugar for Range.new(s, e). Two dots create an /inclusive/ range and three dots create an /exclusive/ range. Anyway, enough tutorials, read the doc:

http://www.ruby-doc.org/core-1.9.3/Range.html

Now, i am not suggesting that python should adopt the /exact/ syntax of Ruby, however, i /am/ suggesting that Ruby is more consistent with the range object than Python.

In Ruby:

....you can slice arrays with the range:

rb> a = [1,2,3,4,5]
rb> a[0..-1]
[1,2,3,4,5]
rb> a[0...-1]
[1,2,3,4]

....you can create a range of integers :

rb> r = 1..10
rb> r.to_a()
[1,2,3,4,5,6,7,8,9]

....you can create a range of chars:

rb> r = "a".."d"
rb> r.to_a()
["a", "b", "c", "d"]

....you can use range in a loop:

rb> for x in 0...5;puts "#{x}th iteration";end
0th iteration
1th iteration
2th iteration
3th iteration
4th iteration

....but most importantly, you can do all these things in a consistent mannerusing a consistent syntax!

Python however has the stupid slice function and then sequence indexing, and no consistency between the two! Plus, the for loop uses the range function to create "lazy iterators" instead of employing a consistent "range" syntax.

Consistent syntax and consistent application are the biggest issues with Python ranges as they exist today. That's the starting point.
 
A

Andrew Robinson

Notice I specifically said an "infinite *for* loop".
OK, so tit for tat.

Notice I already showed an effective *accidental* "infinite" for loop
because I did notice you spoke about a *for* loop.

And, obviously, in the case of the while loop I showed -- it was not
meant to be True forever.
It's a variable, which is subject to change.

I really do respect your opinion; but it's one of about 5 people that
dominate this list, albeit the same spend a lot of time helping others;
Stephen is someone new to me, and I want to encourage his probing of the
issue more than I want to advance my view.

P.S.
I apologize about the e-mail clock, it seems I am sending my local time
again -- and it's different from your timezone; I *wish* the python list
computer would politely adjust it when *accidents* happen, or my OS's
distribution would fix their bug -- but cest la vie. I limp along with
the status quo for now.
 
I

Ian Kelly

I've read through the whole of the subject, and the answer is no, although I
think allowing it in :):) is a *very* good idea, including as a replacement
for range or xrange.

s=1:2:3
for i in s:
for i in (1:2:3) :

Eww, no. I can appreciate the appeal of this syntax, but the problem
is that ranges and slices are only superficially similar. For one,
ranges require a stop value; slices do not. What should Python do
with this:

for i in :)):

Intuitively, it should result in an infinite loop starting at 0. But
ranges require a stop value for a very good reason -- it should not be
this easy to accidentally create an infinite for loop. So I would
advocate that this should raise an error instead. If the user really
wants an unlimited counting loop, let them continue to be explicit
about it by using itertools.count. On the other hand, this would mean
that the semantics of :)) would be different depending on whether the
slice is used as a slice or a range.

The next problem you run into is that the semantics of negative
numbers are completely different between slices and ranges. Consider
this code:

s = (-5:6)
for i in s:
print(i)
for i in range(6):
print(i)

Intuitively, both loops should print the same thing. After all, one
is using the slice s as a range, and the other is using the very same
slice s as a slice of a sequence where the indices and values are the
same. This expectation fails, however. The first loop prints the
integers from -5 to 5 inclusive, and the second loop only prints the
integers from 1 to 5 inclusive.

For these reasons, I disagree that allowing slices to be implicitly
converted to ranges or vice versa is a good idea.
This is not a new idea: eg: 2002. (which is still status OPEN).
http://osdir.com/ml/python.patches/2002-06/msg00319.html

It's not still open. What you've linked above is an archived mailing
list message concerning the patch. I've linked the actual tracker
issue that the patch was attached below; it was rejected by Guido in
2002.
http://bugs.python.org/issue575515
 
I

Ian Kelly

...
and, besides, the same is true with other constructions of loops....

while a: # Damn easy, if a is accidentally true!

Notice I specifically said an "infinite *for* loop". While loops are
meant to be indeterminate in the number of iterations they will take
going into the loop; for loops are not.
The next problem you run into is that the semantics of negative
numbers are completely different between slices and ranges. Consider
this code:

s = (-5:6)
for i in s:
print(i)
for i in range(6):
print(i)


I don't find this difference to be necessary, nor objectionable.

It is less inconsistent, in my view, to allow that
([ 1,2,3,4,5 ])[-1:2] produce [5,1,2] than an empty list;
and ([ 1,2,3,4,5])[2:-1] does produce an empty list.

I have been looking for actual programs that this would break for over two
months now, and I haven't been finding any. I am willing to run any
mainstream application you can find on test-patched python!


Yes, I recollect now that we have already had this discussion.
 
N

Nobody

Would it be feasible to modify the Python grammar to allow ':' to generate
slice objects everywhere rather than just indexers and top-level tuples of
indexers?

If you need to be able to easily construct indexing objects, create a
helper like:
class Slicer(object):
= def __getitem__(self, s):
= return s
=
s_ = Slicer()
s_[1,2,3] (1, 2, 3)
s_[:] slice(None, None, None)
s_[1:2:3,4:5:6]
(slice(1, 2, 3), slice(4, 5, 6))
Ellipsis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,130
Members
46,689
Latest member
liammiller

Latest Threads

Top