Why IO#readlines does'nt accept a Regexp?

G

gabriele renzi

as in the subject, I just noticed that readlines just accepts a string
as line Separator, and I wonder why it works this way.
Some explanations?

BTW, if I want to read a file in a array of 'words' I have to do :


File.new('myfile').gets(nil).split

no better way ?

on a sidenote, what are the efficiency issue related to the use of
IO#each vs IO#foreach(anIO) vs a simple 'while line=gets..' ?
 
G

Gavin Sinclair

as in the subject, I just noticed that readlines just accepts a string
as line Separator, and I wonder why it works this way.
Some explanations?

Sorry, none from me ;)

BTW, if I want to read a file in a array of 'words' I have to do :

no better way ?
File.read('myfile').split


on a sidenote, what are the efficiency issue related to the use of
IO#each vs IO#foreach(anIO) vs a simple 'while line=gets..' ?

All of these read the file one line at a time and present that line to
the user. It's hard to imagine any performance difference between
them.

Gavin
 
R

Robert Klemme

gabriele renzi said:
as in the subject, I just noticed that readlines just accepts a string
as line Separator, and I wonder why it works this way.
Some explanations?

Just a guess: normally it's not necessary and another reason might be
performance, since the overhead of a regexp might be significant for large
files.

However, you can simulate it if you read a complete file into a string and
then split with a regexp.
BTW, if I want to read a file in a array of 'words' I have to do :


File.new('myfile').gets(nil).split

no better way ?

For large Files this is more efficient:

words=[]
IO.foreach("myfile") do |line|
words.push( *line.scan( /\w+/oi ) )
end

If you have many repeating words you can save even more mem:

cache = Hash.new {|h,k| h[k]=k}
words = []

IO.foreach("myfile") do |line|
words.push( *( line.scan( /\w+/oi ).map {|w| cache[w]} ) )
end
on a sidenote, what are the efficiency issue related to the use of
IO#each vs IO#foreach(anIO) vs a simple 'while line=gets..' ?

Try ruby -profile with each method and see what happens. I'd guess that
there is not much difference.

Regards

robert
 
S

Sabby and Tabby

Robert Klemme said:
For large Files this is more efficient:

words=[]
IO.foreach("myfile") do |line|
words.push( *line.scan( /\w+/oi ) )
end

The /oi modifiers aren't necessary.
If you have many repeating words you can save even more mem:

cache = Hash.new {|h,k| h[k]=k}
words = []

IO.foreach("myfile") do |line|
words.push( *( line.scan( /\w+/oi ).map {|w| cache[w]} ) )
end

The #map isn't doing what you think it is doing. To remove repeating
words from the list:

saw = Hash.new {|h,k| h[k] = true; false}
words = []

IO.foreach("myfile") do |line|
words.push(*( line.scan(/\w+/).reject {|w| saw[w]} ))
end

Or if word order isn't a concern:

cache = {}

IO.foreach("myfile") do |line|
line.scan(/\w+/).each {|w| cache[w] = 1}
end

words = cache.keys
 
R

Robert Klemme

Sabby and Tabby said:
Robert Klemme said:
For large Files this is more efficient:

words=[]
IO.foreach("myfile") do |line|
words.push( *line.scan( /\w+/oi ) )
end

The /oi modifiers aren't necessary.

Granted. I just grew used to putting "o" in there whenever the rx doesn't
change over time. Kind of a documentation thingy.
If you have many repeating words you can save even more mem:

cache = Hash.new {|h,k| h[k]=k}
words = []

IO.foreach("myfile") do |line|
words.push( *( line.scan( /\w+/oi ).map {|w| cache[w]} ) )
end

The #map isn't doing what you think it is doing. To remove repeating
words from the list:

It does exactly what I think it's doing. :) I don't want to remove
repeated words from the list but replace all identical strings with the
same *instance* to save memory. map fit's the job perfectly. Of course,
you could use collect also... :)

Regards

robert
 
G

gabriele renzi

il Thu, 18 Sep 2003 07:51:25 GMT, gabriele renzi
<[email protected]> ha scritto::

thanks for all the answers.
bte, yet another solution that comes in my mind now:

ary=(File.new('bf.rb').map { |l| l.scan(/\w+/) }).flatten
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,129
Messages
2,570,770
Members
47,329
Latest member
FidelRauch

Latest Threads

Top