Help processing a file or array

  • Thread starter Eduardo Yáñez Parareda
  • Start date
E

Eduardo Yáñez Parareda

Hello, I'm a newbie, and need some help to process a file.
The file has lines with something like this:

rewrewrwer rrrrrrrrrrrrr aa1 rrrrrrrrrr
rewrwerwer rrrrrrrrrrrrr bb1 rrrrrrrrrr
rwerfwdffsd rrrrrrrrrrrrr cc1 rrrrrrrrrr
ewrwerwerwer rrrrrrrrrrrrr dd1 rrrrrrrrrr
trtretertert rrrrrrrrrrrrr ee1 rrrrrrrrrr

and another file with

aa1
cc1

I'd like to create a new file without lines containing aa1 and cc1

Reading the files and get arrays with the content is easy:

lines = File.new("file1").readlines
tags = File.new("file2).readlines

Is there a Ruby way to remove lines from 'lines' variable which contain tags from 'tags' variable?
 
A

Augie De Blieck Jr.

You'd want to use a regular expression, I think. Probably a nested
loop. I could do this in Perl in about two minutes, but I'm still
adjusting my thinking for Ruby.

lines.each do |line|
tags.each do |tag|
final << line if line !~ /#{tag.chop}/
end
end

Then write the "final" array to whatever file you want to.

I threw in the ".chop" there to get rid of the newline character on the tag=
 
J

Jan Svitok

On 3/22/07, Eduardo Y=E1=F1ez Parareda <[email protected]=
wrote:
Hello, I'm a newbie, and need some help to process a file.
The file has lines with something like this:

rewrewrwer rrrrrrrrrrrrr aa1 rrrrrrrrrr
rewrwerwer rrrrrrrrrrrrr bb1 rrrrrrrrrr
rwerfwdffsd rrrrrrrrrrrrr cc1 rrrrrrrrrr
ewrwerwerwer rrrrrrrrrrrrr dd1 rrrrrrrrrr
trtretertert rrrrrrrrrrrrr ee1 rrrrrrrrrr

and another file with

aa1
cc1

I'd like to create a new file without lines containing aa1 and cc1

Reading the files and get arrays with the content is easy:

lines =3D File.new("file1").readlines
tags =3D File.new("file2).readlines

Is there a Ruby way to remove lines from 'lines' variable which contain t= ags from 'tags' variable?

tags_re =3D Regexp.new("\\b(?:#{tags.map {|t|
Regexp.escape(t.chomp)}.join("|")})\\b")
lines.delete_if {|l| l =3D~ tags_re }

explanation:

tags_re will have form "\b(?:tag1|tag2|tag3|...|tagn)\b"
\b are to match only whole words, not part of the words.
note that if you put in too many tags you may get errors (regexp too
long or something similar)

the last line deletes all lines that match the regexp.
 
E

Eduardo Yáñez Parareda

tags_re = Regexp.new("\\b(?:#{tags.map {|t|
Regexp.escape(t.chomp)}.join("|")})\\b")
lines.delete_if {|l| l =~ tags_re }

One more time I have to praise Ruby...
Thanks Jan.
 
J

Jan Svitok

tags_re =3D Regexp.new("\\b(?:#{tags.map {|t|
Regexp.escape(t.chomp)}.join("|")})\\b")
lines.delete_if {|l| l =3D~ tags_re }

One more time I have to praise Ruby...
Thanks Jan.[/QUOTE]

Now that I look at it: this is more like perl than ruby... Ron's
version is probably a bit slower but much more readable...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,237
Messages
2,571,190
Members
47,827
Latest member
wyton

Latest Threads

Top