efficient regex scanning

  • Thread starter Trochalakis Christos
  • Start date
T

Trochalakis Christos

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

Thanks
Christos
 
O

Ola Bini

Trochalakis said:
Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

Thanks
Christos
Scan takes a block form:

ri String.scan


IO.read(file).scan(/\w+/) {|w| f.print w}


Cheers

--
Ola Bini (http://ola-bini.blogspot.com)
JRuby Core Developer
Developer, ThoughtWorks Studios (http://studios.thoughtworks.com)

"Yields falsehood when quined" yields falsehood when quined.
 
D

dblack

Hi --

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

You could do something like this (untested, and reversing your logic
somewhat):

File.open(file).each {|line| f.print(line.scan(/\w+/)) }

(You might want to join them with a space or something so they don't
all run together.)


David

--
Q. What is THE Ruby book for Rails developers?
A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black)
(See what readers are saying! http://www.rubypal.com/r4rrevs.pdf)
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)
 
T

Trochalakis Christos

Trochalakis said:
Hello there,
I wan't to extract all the words from a file and so i wrote the
following code:
file = ARGV[0]
File.open('output','w') {|f|
IO.read(file).scan(/\w+/).each{|w| f.print w}
}
The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }
Thanks
Christos

Scan takes a block form:

ri String.scan

IO.read(file).scan(/\w+/) {|w| f.print w}

Cheers

Thanks a lot!
I suppose should have checked first :)
 
R

Robert Klemme

Hi --

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

You could do something like this (untested, and reversing your logic
somewhat):

File.open(file).each {|line| f.print(line.scan(/\w+/)) }

(You might want to join them with a space or something so they don't
all run together.)

You're not closing the IO. I know it's not an issue for a small script
but...

I'd do this:

ARGF.each {|line| puts line.scan /\w+/}

:)

Kind regards

robert
 
D

dblack

Hi --

Hi --

Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

You could do something like this (untested, and reversing your logic
somewhat):

File.open(file).each {|line| f.print(line.scan(/\w+/)) }

(You might want to join them with a space or something so they don't
all run together.)

You're not closing the IO. I know it's not an issue for a small script
but...

It's not a complete script; I was only showing one line. At the very
least it's not going to run unless you assign something to f :)


David

--
Q. What is THE Ruby book for Rails developers?
A. RUBY FOR RAILS by David A. Black (http://www.manning.com/black)
(See what readers are saying! http://www.rubypal.com/r4rrevs.pdf)
Q. Where can I get Ruby/Rails on-site training, consulting, coaching?
A. Ruby Power and Light, LLC (http://www.rubypal.com)
 
J

Joel VanderWerf

Trochalakis said:
Hello there,

I wan't to extract all the words from a file and so i wrote the
following code:

file = ARGV[0]
File.open('output','w') {|f|
IO.read(file).scan(/\w+/).each{|w| f.print w}
}

The problem with this code is that it stores all the words in an array
which is not so good in terms of efficiency.
Is there a better way to do it?
Something like IO.read(file).each_scan { foo }

Here's a thought. Note that it doesn't handle //m regexen. Like David's
and Robert's solutions, it doesn't read the whole at once. (I guess one
could check for pat.options&Regexp::MULTILINE, and read the whole IO in
that case.)

class IO
def scan pat
if block_given?
each {|line| line.scan(pat) {|s| yield s} }
else
read.scan(pat)
end
end
end

File.open(filename) do |f|
f.scan(/\w+/) {|word| puts word}
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,260
Messages
2,571,308
Members
47,963
Latest member
NancyRyl51

Latest Threads

Top