String#[] behaviour

R

Robert Klemme

2007/12/18 said:
'asd'[0...10] returns 'asd' while 'asd'[-10..-1] returns nil.

As far as I understand, such behaviour completely satisfies ruby
documentation (http://www.ruby-doc.org/core/classes/String.html), but
it seems inconsistent to me.

Any thoughts?

On one hand you are right. On the other hand, begin and end indexes
are asymmetric anyway: you know that the starting index is always 0
but the ending index can have arbitrary values. I could not say I
came across this so far so for me personally this is a non issue. On
a larger scale it is probably a minor issue. Let's hear what others
say.

Kind regards

robert
 
Y

yermej

'asd'[0...10] returns 'asd' while 'asd'[-10..-1] returns nil.
As far as I understand, such behaviour completely satisfies ruby
documentation (http://www.ruby-doc.org/core/classes/String.html), but
it seems inconsistent to me.
Any thoughts?

From the docs for String#[] at that link:

"Returns nil if the initial offset falls outside the string..."

Nevermind what I said. Some days I don't read so good and stuff.
 
D

DNNX

On one hand you are right. On the other hand, begin and end indexes
are asymmetric anyway: you know that the starting index is always 0
but the ending index can have arbitrary values. ...

Hm... On the other hand, end and begin indexes are asymmetric anyway:
you know that the ending index is always -1 but the starting index can
have arbitrary values.

Isn't this a symmetry?

Best regards,
Viktar
 
M

MonkeeSage

Hm... On the other hand, end and begin indexes are asymmetric anyway:
you know that the ending index is always -1 but the starting index can
have arbitrary values.

Isn't this a symmetry?

Best regards,
Viktar

No, because "-1" is a special value...it's got magic in it. It can
magically mean 5, or 10, or even 12 (because it's magic). ;) "-1" is
just sugar for #length, and #length is always a side-effect of a
container, whereas '0' is a constant entry point.

Regards,
Jordan
 
D

DNNX

No, because "-1" is a special value...it's got magic in it. It can
magically mean 5, or 10, or even 12 (because it's magic). ;) "-1" is
just sugar for #length, and #length is always a side-effect of a
container, whereas '0' is a constant entry point.

Regards,
Jordan

-1 is more special and magic than 0? Hm... 0 also can magically mean
-6, -11, or even -13 (because it's magic too).

-1 is sugar for #length? Not sure I understand correctly. Never heard
such an interpretation of -1 earlier. Why #length but not #length-1?
Why 0 is not
sugar for -#length? What do you mean saying -1 is a sugar for
something?

0 is a constant entry point? Great, -1 is a constant exit point.

Anyway, is there any symmetry or no, I still believe that returning
'asd' in one case and nil in other is not consistent (please see my
example in the first message).

Regards,
Viktar
 
M

Michal Suchanek

No, because "-1" is a special value...it's got magic in it. It can
magically mean 5, or 10, or even 12 (because it's magic). ;) "-1" is
just sugar for #length, and #length is always a side-effect of a
container, whereas '0' is a constant entry point.

-1 is as constant as 0. And because of its magic when used as
container index it always means the end. And really length - 1, not
just length. And at that place there is always the last object unless
the container is empty. The same way as the first object is at 0.

Thanks

Michal
 
P

Pasha Nigerish

Michal said:
The asymmetry is in that you can chop off "at most 10 characters from
the start" with 0...10 but not "at most 10 characters from the end"
with -10..-1 because the start that has to be inside the string is the
one of which you cannot be sure. You cannot swap the bounds because
you get an empty string then.

So the symmetric rule for range indexing would be something like this:
...skipped...

So we must become clear with range indexing: I think it's perfectly
legal to return intersection of an array/string with range instead of
nil in a case of negative start.
This can be done via one-line patch in range.c:615 (as in trunk) - just
assume `beg = 0` instead of `goto out_of_range`
Thus we'll have at least more perl-compatible behavior =) i.e. just as
'abc'[0..6] is 'abc' now, so 'abc'[-6..-1] will be 'abc' as well.

One problem I see in this assumption: 'abc'[4..6] and 'abc'[-6..-4] will
return '' instead of nil.
 
M

Michal Suchanek

Michal said:
The asymmetry is in that you can chop off "at most 10 characters from
the start" with 0...10 but not "at most 10 characters from the end"
with -10..-1 because the start that has to be inside the string is the
one of which you cannot be sure. You cannot swap the bounds because
you get an empty string then.

So the symmetric rule for range indexing would be something like this:
...skipped...

So we must become clear with range indexing: I think it's perfectly
legal to return intersection of an array/string with range instead of
nil in a case of negative start.
This can be done via one-line patch in range.c:615 (as in trunk) - just
assume `beg = 0` instead of `goto out_of_range`
Thus we'll have at least more perl-compatible behavior =) i.e. just as
'abc'[0..6] is 'abc' now, so 'abc'[-6..-1] will be 'abc' as well.

One problem I see in this assumption: 'abc'[4..6] and 'abc'[-6..-4] will
return '' instead of nil.

You can still test the lower bound is inside the string. It's just
that with negative ranges the lower bound is the second number, not
the first.

Thanks

Michal
 
M

MonkeeSage

-1 is more special and magic than 0? Hm... 0 also can magically mean
-6, -11, or even -13 (because it's magic too).

-1 is sugar for #length? Not sure I understand correctly. Never heard
such an interpretation of -1 earlier. Why #length but not #length-1?
Why 0 is not
sugar for -#length? What do you mean saying -1 is a sugar for
something?

0 is a constant entry point? Great, -1 is a constant exit point.

Anyway, is there any symmetry or no, I still believe that returning
'asd' in one case and nil in other is not consistent (please see my
example in the first message).

Regards,
Viktar

I think you (and Michal) missed my point. And yes, I should have said
#length-1. The point is, since there is *no such thing* as a negative
index -- 0 is the *first* index -- and "-1" (or -anynumber) is just
sugar (i.e., just a more convenient syntax for writing #length-
whatever), what you're asking is for ranges such as [-7..2] and [1..0]
to be meaningful. Taking your example, "'asd'[-10..-1]", this means
'asd'[-7..2] when you de-sugar it. Now in the other case,
"'asd'[0...10]", once you reach #length-1, you can stop and return
0..#length-1. But with 'asd'[-7..2], what are you supposed to do when
the start index is less than the first index (0)? Well, you could skip
ahead to the first index, sure, but it makes just as much sense (if
not more) to return nil/empty string. Same goes for cases such as
'asd'[-2..-3] (i.e., 'asd'[1..0]), where the start index is greater
than the end index.

Regards,
Jordan
 
P

Pasha Nigerish

Jordan said:
Taking your example, "'asd'[-10..-1]", this means
'asd'[-7..2] when you de-sugar it. Now in the other case,
"'asd'[0...10]", once you reach #length-1, you can stop and return
0..#length-1. But with 'asd'[-7..2], what are you supposed to do when
the start index is less than the first index (0)? Well, you could skip
ahead to the first index, sure, but it makes just as much sense (if
not more) to return nil/empty string. Same goes for cases such as
'asd'[-2..-3] (i.e., 'asd'[1..0]), where the start index is greater
than the end index.

IMHO, the main goal of such a construct (some_string[-10..-1]) - to
return last 10 chars from some_string. And in this case - returning
'asd' for 'asd'[-10..-1] seems to be as logical as returning 'asd' for
'asd' for [0..10] (as implemented now).

right now (1.8.6) we have:
1) 'asd'[0..10] => 'asd'
2) 'asd'[2..1] => ''
3) 'asd'[-1..-2] => ''
-BUT-
4) 'asd'[-10..-1] => nil

I think, that by a "Principle of Least Astonishment" (c) we can unify
that cases - i.e. to return either 'asd' or nil in cases 1) and 4). All
that we need - adjust start index of the range to 0, if negative - right
after de-sugar.
 
M

MonkeeSage

Jordan said:
Taking your example, "'asd'[-10..-1]", this means
'asd'[-7..2] when you de-sugar it. Now in the other case,
"'asd'[0...10]", once you reach #length-1, you can stop and return
0..#length-1. But with 'asd'[-7..2], what are you supposed to do when
the start index is less than the first index (0)? Well, you could skip
ahead to the first index, sure, but it makes just as much sense (if
not more) to return nil/empty string. Same goes for cases such as
'asd'[-2..-3] (i.e., 'asd'[1..0]), where the start index is greater
than the end index.

IMHO, the main goal of such a construct (some_string[-10..-1]) - to
return last 10 chars from some_string. And in this case - returning
'asd' for 'asd'[-10..-1] seems to be as logical as returning 'asd' for
'asd' for [0..10] (as implemented now).

right now (1.8.6) we have:
1) 'asd'[0..10] => 'asd'
2) 'asd'[2..1] => ''
3) 'asd'[-1..-2] => ''
-BUT-
4) 'asd'[-10..-1] => nil

I think, that by a "Principle of Least Astonishment" (c) we can unify
that cases - i.e. to return either 'asd' or nil in cases 1) and 4). All
that we need - adjust start index of the range to 0, if negative - right
after de-sugar.

In thinking about it, I guess that does make some sense. Unless one
were to assume that negative starting indexes were more likely to be
programmer errors than larger-than-#length-1 end indexes (does anyone
claim this?), it seems to me that setting negative indexes to 0 would
be consistent with setting larger-than-#length-1 indexes to #length-1.
Maybe you should start an RCR for this.

Regards,
Jordan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,274
Messages
2,571,366
Members
48,055
Latest member
RacheleCar

Latest Threads

Top