first and last char

B

botp

I always get tripped when working together w arrays and strings specially on,

string.first and string.last

of course, they err :)

wish there were #first and #last in String

just a thought
kind regards -botp
 
S

Stefan Rusterholz

Felix said:
irb(main):001:0> class String
irb(main):002:1> def first
irb(main):003:2> self.split('').first
irb(main):004:2> end
irb(main):005:1> def last
irb(main):006:2> self.split('').last
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> "testing".first
=> "t"
irb(main):010:0> "testing".last
=> "g"

Ew, that's awfully complex. You create n new objects from which you
throw n-1 away again...
Think of the memory! ;-)
def first; self[0,1]; end; def last; self[-1,1]; end

Regards
Stefan
 
D

Daniel DeLorme

Stefan said:
Ew, that's awfully complex. You create n new objects from which you
throw n-1 away again...
Think of the memory! ;-)
def first; self[0,1]; end; def last; self[-1,1]; end

wrong, that's the first and last *bytes*, not characters.

def first; self[/\A./m]; end
def last; self[/.\z/m]; end
=> "語"

Daniel
 
S

Stefan Rusterholz

Daniel said:
Stefan said:
Ew, that's awfully complex. You create n new objects from which you
throw n-1 away again...
Think of the memory! ;-)
def first; self[0,1]; end; def last; self[-1,1]; end

wrong, that's the first and last *bytes*, not characters.

def first; self[/\A./m]; end
def last; self[/.\z/m]; end
=> "語"

Daniel

You are right, your solution is better.

Regards
Stefan
 
D

Daniel DeLorme

Stefan said:
Daniel said:
def first; self[/\A./m]; end
def last; self[/.\z/m]; end
$KCODE='u' => "u"
"日本語".first => "日"
"日本語".last
=> "語"

Daniel

You are right, your solution is better.

Only partly. Unfortunately, end-anchored regular expressions have pretty
abysmal performance.
=> 5.35788202285767

:-(
Daniel
 
T

Trans

Stefan said:
Ew, that's awfully complex. You create n new objects from which you
throw n-1 away again...
Think of the memory! ;-)
def first; self[0,1]; end; def last; self[-1,1]; end

wrong, that's the first and last *bytes*, not characters.

For Ruby 1.9+ it will just be:

def first; self[0]; end; def last; self[-1]; end


my own version is (basically):

def first(pattern=//)
split(pattern).at(0)
end

which is a little more versatile. but I see the point about the
memory, and I'll add an optimization clause come 1.9.

T.
 
S

Stefan Rusterholz

Daniel said:
Only partly. Unfortunately, end-anchored regular expressions have pretty
abysmal performance.

=> 5.35788202285767

:-(
Daniel

That can be helped. Assuming that there is no encoding with 1 char > 8
bytes:

def first; self[/\A./m]; end
def last; self[/.\z/m]; end
def last2; self[-8,8][/.\z/m]; end

Benchmark.measure{10000.times{str.first}}.real
=> 0.0643939971923828
Benchmark.measure{10000.times{str.last}}.real
=> 7.3151650428772
Benchmark.measure{10000.times{str.last2}}.real
=> 0.167464017868042

That's a 40x improvement for that string. For short strings it will
probably be slightly slower, but I'd say it's worth it.

Regards
Stefan
 
B

Bertram Scharpf

Hi,

Am Sonntag, 12. Aug 2007, 02:43:00 +0900 schrieb Felix Windt:
-----Original Message-----
From: botp [mailto:[email protected]]
Sent: Saturday, August 11, 2007 10:30 AM

wish there were #first and #last in String

irb(main):001:0> class String
irb(main):002:1> def first
irb(main):003:2> self.split('').first
irb(main):004:2> end
irb(main):005:1> def last
irb(main):006:2> self.split('').last
irb(main):007:2> end
irb(main):008:1> end
=> nil
irb(main):009:0> "testing".first
=> "t"
irb(main):010:0> "testing".last
=> "g"

Sometimes I wish every young programmer was forced to do a
month in Assembler and another one in C just to see what
cost in time and space some constructions cause.

Sorry, Felix!

Bertram
 
S

Stephan Kämper

Hi,

Am 13.08.2007 um 08:04 schrieb Bertram Scharpf:
Hi,

Am Sonntag, 12. Aug 2007, 02:43:00 +0900 schrieb Felix Windt:
-----Original Message-----
From: botp [mailto:[email protected]]
Sent: Saturday, August 11, 2007 10:30 AM

wish there were #first and #last in String

irb(main):001:0> class String
irb(main):002:1> def first
irb(main):003:2> self.split('').first
irb(main):004:2> end
irb(main):005:1> def last
irb(main):006:2> self.split('').last
irb(main):007:2> end
irb(main):008:1> end
=3D> nil
irb(main):009:0> "testing".first
=3D> "t"
irb(main):010:0> "testing".last
=3D> "g"

Sometimes I wish every young programmer was forced to do a
month in Assembler and another one in C just to see what
cost in time and space some constructions cause.

Sorry, Felix!

Well, here's something that should be a little bit less cycle =20
intensive (depending on how String#[](arg) is implemented):

class String
def last
self[-1].chr
end
end

Cheers


Stephan


--=20
Stephan K=E4mper/IT-Beratung http://www.stephankaemper.de
Softwaretest / Datenanalyse / Entwicklung
 
B

Bertram Scharpf

Hi,

Am Montag, 13. Aug 2007, 15:15:54 +0900 schrieb Stephan K=E4mper:
Am 13.08.2007 um 08:04 schrieb Bertram Scharpf:

Well, here's something that should be a little bit less cycle intensive= =20
(depending on how String#[](arg) is implemented):

class String
def last
self[-1].chr
end
end

This is what I would have implemented, too.

The difficult point is that it raises some questions:

- Should it return a Fixnum or a String of lenght 1?
- Should it be able to return UTF-8 characters?
- Should I define String#shift and String#pop now?

I don't recommend to discuss such question in an open forum
since I saw what happened to my String#notempty? proposal.

Bertram

--=20
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de
 
D

dblack

--1926193751-1951042972-1187008779=:9230
Content-Type: MULTIPART/MIXED; BOUNDARY="1926193751-1951042972-1187008779=:9230"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--1926193751-1951042972-1187008779=:9230
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

Hi,

Am Montag, 13. Aug 2007, 15:15:54 +0900 schrieb Stephan K=E4mper:
Am 13.08.2007 um 08:04 schrieb Bertram Scharpf:

Well, here's something that should be a little bit less cycle intensive
(depending on how String#[](arg) is implemented):

class String
def last
self[-1].chr
end
end

This is what I would have implemented, too.

The difficult point is that it raises some questions:

- Should it return a Fixnum or a String of lenght 1?

If it ever gets added to Ruby, it will presumably be in 1.9/2.0, where
str[x] gives you a character anyway. If it doesn't get added, then
everyone will write their own, hopefully in a safe way, and can do
whatever they like :)
- Should it be able to return UTF-8 characters?
- Should I define String#shift and String#pop now?

There's already #chop. I don't know whether there are plans for
#lchop or equivalent.


David

--=20
* Books:
RAILS ROUTING (new! http://www.awprofessional.com/title/0321509242)
RUBY FOR RAILS (http://www.manning.com/black)
* Ruby/Rails training
& consulting: Ruby Power and Light, LLC (http://www.rubypal.com)
--1926193751-1951042972-1187008779=:9230--
--1926193751-1951042972-1187008779=:9230--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,266
Messages
2,571,318
Members
48,002
Latest member
EttaPfeffe

Latest Threads

Top