[rcr] String#first / String#last

R

Robert Klemme

trans. (T. Onoma) said:
On Sunday 24 October 2004 11:54 pm, Yukihiro Matsumoto wrote:
| Hi,
|
| In message "Re: [rcr] String#first / String#last"
|
| on Mon, 25 Oct 2004 09:18:51 +0900, Hal Fulton
| |I think I'd be opposed to that. But adding a to_a might be good -- I
| |frequently do a gratuitous split operation for that purpose.
|
| It already has "to_a" which works line-wise. Perhaps something called
| "explode" in other language is what you want.

Believe their is an RCR for #chars

def chars
split(//)
end

This does not yield characters but strings with length 1. Note also that
there is String#each_byte which is often sufficient.

Kind regards

robert
 
F

Florian Gross

Robert said:
This does not yield characters but strings with length 1. Note also that
there is String#each_byte which is often sufficient.

The problem with String#each_byte is that nobody wants to handle
characters as Integers in Ruby, IMHO. (I think one-character Strings are
preferred, because they still let you use lots of Strings useful methods.)
Kind regards
robert

More regards,
Florian Gross
 
R

Robert Klemme

Florian Gross said:
The problem with String#each_byte is that nobody wants to handle
characters as Integers in Ruby, IMHO. (I think one-character Strings are
preferred, because they still let you use lots of Strings useful
methods.)

Yeah possibly. Another problem with each_byte is that byte != char for
many encodings. But AFAIK #each_byte and #split(//) share this problem.

I believe a drawback of using "foo".split(//) is that it's less
performant: you need to create the tmp array plus all the string instances
(although they share the buffer AFAIK).
More regards,
Florian Gross

Even more regards

robert

:)
 
T

trans. (T. Onoma)

On Monday 25 October 2004 09:49 am, Robert Klemme wrote:
| Yeah possibly. Another problem with each_byte is that byte != char for
| many encodings. But AFAIK #each_byte and #split(//) share this problem.
|
| I believe a drawback of using "foo".split(//) is that it's less
| performant: you need to create the tmp array plus all the string instances
| (although they share the buffer AFAIK).

I think this is a very interesting point. I wonder how Ruby 2 will progress in
the area. Isn't i18N (not that I really know what that entails) on the map?
How does that effect things. Is it then prudent to create an actual Character
class, such that a String is essentially an Array of Characters? (not to say
there won't be differences between Array and String, but in essence)

T.
 
M

Markus

On Monday 25 October 2004 09:49 am, Robert Klemme wrote:
| Yeah possibly. Another problem with each_byte is that byte != char for
| many encodings. But AFAIK #each_byte and #split(//) share this problem.
|
| I believe a drawback of using "foo".split(//) is that it's less
| performant: you need to create the tmp array plus all the string instances
| (although they share the buffer AFAIK).

I think this is a very interesting point. I wonder how Ruby 2 will progress in
the area. Isn't i18N (not that I really know what that entails) on the map?
How does that effect things. Is it then prudent to create an actual Character
class, such that a String is essentially an Array of Characters? (not to say
there won't be differences between Array and String, but in essence)

I would like that. Characters are an interesting class in there
own right--and they are _not_ bytes. Strings are much more like arrays
of Characters than Characters are like Integers, so that would be a much
cleaner way to go.

*smile* We could also resolve the discrepancies by deprecating
strings in favor of Bignums, which I suspect Kurt Gödel would like, if
no one else.

-- Markus
 
D

Daniel Berger

Robert Klemme said:
trans. (T. Onoma) said:
On Sunday 24 October 2004 11:54 pm, Yukihiro Matsumoto wrote:
| Hi,
|
| In message "Re: [rcr] String#first / String#last"
|
| on Mon, 25 Oct 2004 09:18:51 +0900, Hal Fulton
| |I think I'd be opposed to that. But adding a to_a might be good -- I
| |frequently do a gratuitous split operation for that purpose.
|
| It already has "to_a" which works line-wise. Perhaps something called
| "explode" in other language is what you want.

Believe their is an RCR for #chars

def chars
split(//)
end

This does not yield characters but strings with length 1. Note also that
there is String#each_byte which is often sufficient.

Kind regards

robert

I prefer String#unpack("C*"). I'm not sure what encoding issues there
are with that, however.

Dan
 
H

Hal Fulton

Robert said:
This does not yield characters but strings with length 1. Note also that
there is String#each_byte which is often sufficient.

In my mind, I blur the distinction between chars and "length 1" strings.

I am fully aware of the difference, but I rarely have need of characters
as Fixnums.

Ruby in the future (as I understand it) will also blur this distinction
further: Apparently ?x will be the same as "x" and "abc"[1] will be "b".


Hal
 
F

Florian Gross

Robert said:
methods.)

Yeah possibly. Another problem with each_byte is that byte != char for
many encodings. But AFAIK #each_byte and #split(//) share this problem.

I think .split(//) works correctly (returning characters) with -Ku, but
I'm not sure about #each_byte.
I believe a drawback of using "foo".split(//) is that it's less
performant: you need to create the tmp array plus all the string instances
(although they share the buffer AFAIK).

Agreed, and this is a big problem in current Ruby -- we don't really
need the Array if Strings themselves let you do character-based
operation instead of line-based ones easily.

The overhead required for individual Character Objects could on the
other hand be quite low. They could also be value Objects meaning you
only need one single Object per character which you then store in a big
hash. Plus they would not need much of the reallocation overhead of
Strings. (They would just need to support most of Strings interface.) I
think it is enough for them to be a wrapper around a char-trait in C.
Even more regards
robert

Yet more regards,
Florian Gross

;)
 
C

Charles Mills

In my mind, I blur the distinction between chars and "length 1"
strings.

I am fully aware of the difference, but I rarely have need of
characters
as Fixnums.

Ruby in the future (as I understand it) will also blur this distinction
further: Apparently ?x will be the same as "x" and "abc"[1] will be
"b".

That is good to hear. I remember when I began learning Ruby I was put
off/confused by the fact that String#[] returned a Fixnum when given a
single index.

-Charlie
 
B

Brian Candler

I prefer String#unpack("C*"). I'm not sure what encoding issues there
are with that, however.

If it's a UTF8 string then use unpack("U*")

As for #first and #last:

a = "abcdefg"
a[0...2] # first two characters
a[0,2] # first two characters
a[-2..-1] # last two characters
a[-2,9999] # last two characters (cheating)

I think the last case shows where a sanctioned "infinity" would be nice. But
otherwise:

class String
def first(n=1)
self[0,n]
end
def last(n=1)
return nil if n < 0
return "" if n == 0
return self if n > self.size
self[-n..-1]
end
end

I still find that string operations sit uncomfortably together.

string[a..b] # Start pos is a, end pos is b
# If either a or b is negative, it's an offset from the end
# (which means it's not a Range in a useful sense)
# Return nil if a is not within string, or b is to the
# 'left' of a, after resolving negative offsets

string[a,b] # Start pos is a, length is b
# If a is negative, it's an offset from the end
# If b is negative, nil is returned

string[a] # return the a'th byte of string as an integer (ick)

I have to remind myself of these with irb each time I use them. "How do I
get from character a to the end of the string?" => str[a..-1]

"How do I get just the a'th character by itself (as a string)?" => str[a,1]

Regards,

Brian.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,160
Messages
2,570,889
Members
47,420
Latest member
ZitaVos505

Latest Threads

Top