1.8.7 String#lines keeps new-line chars (say it ain't so in 1.9)

I

Intransition

Ruby 1.8.7 p72
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:
=> ["A", "B", "C"]

Thanks.
 
B

Brian Candler

Thomas said:
Ruby 1.8.7 p72
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:
=> ["A", "B", "C"]

Why would you expect that? The documentation is very clear.

--------------------------------------------------------------- IO#lines
ios.lines(sep=$/) => anEnumerator
ios.lines(limit) => anEnumerator
ios.lines(sep, limit) => anEnumerator
------------------------------------------------------------------------
Returns an enumerator that gives each line in _ios_. The stream
must be opened for reading or an +IOError+ will be raised.

f = File.new("testfile")
f.lines.to_a #=> ["foo\n", "bar\n"]
f.rewind
f.lines.sort #=> ["bar\n", "foo\n"]

If it changed in 1.9, that would be another source of incompatibilities.

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.
 
7

7stud --

Thomas said:
Ruby 1.8.7 p72
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:
=> ["A", "B", "C"]

Thanks.


$ ruby19 r1test.rb
["A\n", "B\n", "C\n"]

$ ri19 String#lines
----------------------------------------------------------- String#lines
str.lines(separator=$/) => anEnumerator
str.lines(separator=$/) {|substr| block } => str

From Ruby 1.9.1
------------------------------------------------------------------------
Returns an enumerator that gives each line in the string. If a
block is given, it iterates over each line in the string.

"foo\nbar\n".lines.to_a #=> ["foo\n", "bar\n"]
"foo\nb ar".lines.sort #=> ["b ar", "foo\n"]
 
7

7stud --

...oh, yeah:

$ ruby19 -v
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin8.11.1]
 
G

Gregory Brown

Ruby 1.8.7 p72

=A0>> "A\nB\nC".lines.to_a
=A0=3D> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

=A0>> "A\nB\nC".lines.to_a
=A0=3D> ["A", "B", "C"]

I know I'm going to be accused of bullying you again, but...
Install the latest 1.9.1 and try it yourself (or if you feel fancy,
the 1.9.2 preview).

These aren't questions irb can't answer for you.

Alternatively, try out ruby-versions:
http://ruby-versions.net/
 
D

David A. Black

Hi --

Ruby 1.8.7 p72
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:
=> ["A", "B", "C"]

String#lines is essentially the same as String#each, which is gone in
1.9. You get, instead of #each and friends (Enumerable), a whole
toolkit of ways to enumerate through strings:

* lines
* bytes
* chars
* codepoints

There's no auto-chomping, but there never has been in any string
operation I can think of.


David

--
David A. Black / Ruby Power and Light, LLC / http://www.rubypal.com
Q: What's the best way to get a really solid knowledge of Ruby?
A: Come to our Ruby training in Edison, New Jersey, September 14-17!
Instructors: David A. Black and Erik Kastner
More info and registration: http://rubyurl.com/vmzN
 
S

Stephen Bannasch

At said:
Hi --

Ruby 1.8.7 p72
"A\nB\nC".lines.to_a
=> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:
"A\nB\nC".lines.to_a
=> ["A", "B", "C"]

String#lines is essentially the same as String#each, which is gone in
1.9. You get, instead of #each and friends (Enumerable), a whole
toolkit of ways to enumerate through strings:

* lines
* bytes
* chars
* codepoints

There's no auto-chomping, but there never has been in any string
operation I can think of.

It isn't the same but in many places where I might use String#lines
I'd use code like this in Ruby 1.8.6:
=> ["first line", "second line", "third line"]
 
R

Rados³aw Bu³at

Ruby 1.8.7 p72

=C2=A0>> "A\nB\nC".lines.to_a
=C2=A0=3D> ["A\n", "B\n", "C"]

Please, tell me that's a mishap, and not how 1.9 works. I'd expect:

=C2=A0>> "A\nB\nC".lines.to_a
=C2=A0=3D> ["A", "B", "C"]

Thanks.

$ ruby -v -e 'p "A\nB\nC".lines.to_a'
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
["A\n", "B\n", "C"]

$ ruby-trunk -v -e 'p "A\nB\nC".lines.to_a'
ruby 1.9.2dev (2009-08-23 trunk 24631) [x86_64-linux]
["A\n", "B\n", "C"]


--=20
Pozdrawiam

Rados=C5=82aw Bu=C5=82at
http://radarek.jogger.pl - m=C3=B3j blog
 
T

Trans

Thomas said:
Ruby 1.8.7 p72
=A0 >> "A\nB\nC".lines.to_a
=A0 =3D> ["A\n", "B\n", "C"]
Please, tell me that's a mishap, and not how 1.9 works. I'd expect:
=A0 >> "A\nB\nC".lines.to_a
=A0 =3D> ["A", "B", "C"]

Why would you expect that? The documentation is very clear.

--------------------------------------------------------------- IO#lines
=A0 =A0 =A0ios.lines(sep=3D$/) =A0 =A0 =3D> anEnumerator
=A0 =A0 =A0ios.lines(limit) =A0 =A0 =A0=3D> anEnumerator
=A0 =A0 =A0ios.lines(sep, limit) =3D> anEnumerator
------------------------------------------------------------------------
=A0 =A0 =A0Returns an enumerator that gives each line in _ios_. The strea= m
=A0 =A0 =A0must be opened for reading or an +IOError+ will be raised.

=A0 =A0 =A0 =A0 f =3D File.new("testfile")
=A0 =A0 =A0 =A0 f.lines.to_a =A0#=3D> ["foo\n", "bar\n"]
=A0 =A0 =A0 =A0 f.rewind
=A0 =A0 =A0 =A0 f.lines.sort =A0#=3D> ["bar\n", "foo\n"]

If it changed in 1.9, that would be another source of incompatibilities.

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

I'd expect it from a StringIO, but not a String.

T.
 
T

Trans

Perhaps most importantly of all, if the newlines were stripped, there
would be loss of data and it would be impossible to reconstruct the
original file exactly from the members of the lines array (e.g.
lines.join), as your example shows nicely.

How is there loss of data, when you know what was removed? Just join
("\n").
 
T

Trans

It isn't the same but in many places where I might use String#lines
I'd use code like this in Ruby 1.8.6:

=A0 =A0>> "first line\nsecond line\nthird line".split("\n")
=A0 =A0=3D> ["first line", "second line", "third line"]

Exactly. And I guess I'll just have to keep on doing that then.
 
J

James Edward Gray II

How is there loss of data, when you know what was removed? Just join
("\n").

What do we do for lines ending in \r\n? Do we take both or just the
\n? I say both would be the most consistent, but then you don't know
if you need to put back a \r\n or just an \n.

Also, how do you know if the last line ended in a \n? join("\n")
wouldn't put it back in either case.

James Edward Gray II
 
T

Trans

String#lines is essentially the same as String#each, which is gone in
1.9. You get, instead of #each and friends (Enumerable), a whole
toolkit of ways to enumerate through strings:

=A0 =A0* lines
=A0 =A0* bytes
=A0 =A0* chars
=A0 =A0* codepoints

There's no auto-chomping, but there never has been in any string
operation I can think of.

String#lines wasn't defined in 1.8.6 so I did not think there was any
precedence for it. My use case has always been (as Radoslaw said):

"first line\nsecond line\nthird line".split("\n")

Wanting my program to read better, I have often defined #lines to do
just that. In my experience that's the frequent case. Wanting to keep
the separator I think is the lesser need --for which I would be
happier with a less concise method name. As it stands #lines does me
no good now.

"first line\nsecond line\nthird line".lines.map{ |s| s.chomp("\n") }

Is even worse than before! ;)
 
T

Trans

Ruby 1.8.7 p72
=A0>> "A\nB\nC".lines.to_a
=A0=3D> ["A\n", "B\n", "C"]
Please, tell me that's a mishap, and not how 1.9 works. I'd expect:
=A0>> "A\nB\nC".lines.to_a
=A0=3D> ["A", "B", "C"]

I know I'm going to be accused of bullying you again, but...
Install the latest 1.9.1 and try it yourself (or if you feel fancy,
the 1.9.2 preview).

These aren't questions irb can't answer for you.

Looking it up isn't the main issue mate. It was the "wherefore?" that
I pondered upon finding it to be the case.
Alternatively, try out ruby-versions:http://ruby-versions.net/

Cool, thanks.
 
B

Brian Candler

Thomas said:
How is there loss of data, when you know what was removed? Just join
("\n").

"Loss of data" means "you don't get back exactly what you started with".
Taking your example, I believe that you want both "A\nB\nC" and
"A\nB\nC\n" to result in lines ["A","B","C"], so this operation is not
reversible.

Or did you want "A\nB\nC\n" to result in ["A","B","C",""] ? That would
surprise me more. Most inputs have terminating newlines on the final
line.
 
T

Trans

How is there loss of data, when you know what was removed? Just join
("\n").

"Loss of data" means "you don't get back exactly what you started with".
Taking your example, I believe that you want both "A\nB\nC" and
"A\nB\nC\n" to result in lines ["A","B","C"], so this operation is not
reversible.

Or did you want "A\nB\nC\n" to result in ["A","B","C",""] ? =A0That would
surprise me more. Most inputs have terminating newlines on the final
line.

Granting that prefect reversibility is a requirement here, then yes
the later makes sense. I do not think it surprising.

To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.

"show it\nto me".words =3D> ["show ", "it\n", "to ", "me "]

I think the broader issue here is the question of whether or not
String is intended for use by code-point (ie. low-level character)
manipulators, or for higher-level human-oriented textual manipulation.
I always thought StringIO was for the former case. But now I am seeing
Ruby's String is some sort of hodge-podge mixture of the two.




T.
 
B

Brian Candler

Thomas said:
To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.

"show it\nto me".words => ["show ", "it\n", "to ", "me "]

...except there is no built-in method 'words' so you can't accuse it of
being inconsistent :)

Some people would want lines with trailing whitespace stripped as well
as terminators. Some people would want leading whitespace stripped too.
I don't think you can please everyone, so IMO the most flexible and
consistent approach is to give the line complete with its terminator,
and let the user apply whatever post-processing they like.
I think the broader issue here is the question of whether or not
String is intended for use by code-point (ie. low-level character)
manipulators, or for higher-level human-oriented textual manipulation.
I always thought StringIO was for the former case. But now I am seeing
Ruby's String is some sort of hodge-podge mixture of the two.

I think StringIO is for when you want to duck-type a File, but with
in-RAM backing.

ruby is certainly lacking consistency in this area. In ruby 1.9, for
example, IO still has #each (meaning #each_line), whereas String doesn't
any more.
 
T

Trans

Thomas said:
To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.
=A0 "show it\nto me".words =3D> ["show ", "it\n", "to ", "me "]

...except there is no built-in method 'words' so you can't accuse it of
being inconsistent :)

ok ;) ...just making an analogy.
Some people would want lines with trailing whitespace stripped as well
as terminators. Some people would want leading whitespace stripped too.
I don't think you can please everyone, so IMO the most flexible and
consistent approach is to give the line complete with its terminator,
and let the user apply whatever post-processing they like.

Sure, but at that point we are moving into a realm of narrower
usecases. I believe the short, more concise method name should go to
the most frequent use. I have no definitive statistics, but I'd wager
that split("\n") is by far the more common case. Based on that, I'd
rather see the current def be called something else, like #newlines or
#rawlines. But Ruby is ultimately Matz' baby so maybe his more common
use is otherwise.
I think StringIO is for when you want to duck-type a File, but with
in-RAM backing.

ruby is certainly lacking consistency in this area. In ruby 1.9, for
example, IO still has #each (meaning #each_line), whereas String doesn't
any more.

Yea, I think that b/c StringIO is an IO first and foremost. So I don't
think String should aspire to be like StringIO per se. And StringIO
can only be like String insofar as it doesn't interfere with it being
an IO. By I may be presuming too much.

Appreciate the discussion.
 
R

Robert Klemme

2009/8/24 Trans said:
Thomas said:
To me returning the newline code with #lines would be like returning
the spaces for a method #words. Eg.
=A0 "show it\nto me".words =3D> ["show ", "it\n", "to ", "me "]

...except there is no built-in method 'words' so you can't accuse it of
being inconsistent :)

ok ;) ...just making an analogy.
Some people would want lines with trailing whitespace stripped as well
as terminators. Some people would want leading whitespace stripped too.
I don't think you can please everyone, so IMO the most flexible and
consistent approach is to give the line complete with its terminator,
and let the user apply whatever post-processing they like.

Sure, but at that point we are moving into a realm of narrower
usecases. I believe the short, more concise method name should go to
the most frequent use. I have no definitive statistics, but I'd wager
that split("\n") is by far the more common case. Based on that, I'd
rather see the current def be called something else, like #newlines or
#rawlines. But Ruby is ultimately Matz' baby so maybe his more common
use is otherwise.

I don't think it is a good idea to change the default behavior. If
you frequently need line endings stripped, you can always define your
own method for this, for example:

class LineEnum
include Enumerable

def initialize(obj, meth =3D case obj
when String, IO then :each_line
else :each
end)
@obj =3D obj
@meth =3D meth
end

def each
@obj.send(@meth) do |elem|
elem.chomp!
yield elem
end

self
end
end

$ irb19 -r lineenum.rb
Ruby version 1.9.1
irb(main):001:0> s =3D "foo\nbar\n"
=3D> "foo\nbar\n"
irb(main):002:0> se =3D LineEnum.new s
=3D> #<LineEnum:0x10169bc0 @obj=3D"foo\nbar\n", @meth=3D:each_line>
irb(main):003:0> se.each {|l| p l}
"foo"
"bar"
=3D> #<LineEnum:0x10169bc0 @obj=3D"foo\nbar\n", @meth=3D:each_line>
irb(main):004:0>

We could also extend Enumerator to honor blocks passed to them so we could =
do

$stdin.to_enum:)each_line) {|l| l.strip!}.each do |line|
p line # no \n present
end

But frankly, I'd rather just add a "line.chomp!" to my block body and
be done. :)

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,169
Messages
2,570,919
Members
47,460
Latest member
eibafima

Latest Threads

Top