Ruby multiline regex problem

G

Gregg Yows

Code:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"


Pattern:

<td.*?>.*?<\/td\s*>


I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.

Anyway, the regex works fine as a single-line. as soon as I introduce
this:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>"

it fails.

When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?

Thanks!
 
T

Todd Benson

Code:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"


Pattern:

<td.*?>.*?<\/td\s*>


I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.

Anyway, the regex works fine as a single-line. as soon as I introduce
this:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>"

it fails.

When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?

Thanks!

<CODE>

s = '<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>'

puts "######\ns:"
puts s

r1 = /<td.*?>.*?<\/td.*?>/m
r2 = /<td.*?>(.*?)<\/td.*?>/m

puts "######\nscan with r1:"
puts s.scan(r1)
puts
puts "######\nmatch with r1:"
puts (s.match r1)[0]
puts

s =~ r1
puts "######\n=~ and $1 with r1:"
puts $1

puts
puts
puts

puts "######\nscan with r2:"
puts s.scan(r2)
puts
puts "######\nmatch with r2:"
puts (s.match r2)[0]
puts

s =~ r2
puts "######\n=~ and $1 with r2:"
puts $1

</CODE>

Hmm, I'm not sure if the regexp /<td[^>]*>.*?<\/td[^>]*>/m would be
more appropriate or not.

Todd
 
R

Robert Klemme

2008/4/8 said:
Code:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here Best</td>"


Pattern:

<td.*?>.*?<\/td\s*>


I'm trying to match this whole block and use it for further parsing.
This started from an example in Brian Merick's book "Everyday
Scripting..." that had to be modified because amazon has changed their
presentation to tables instead of lists.

Anyway, the regex works fine as a single-line. as soon as I introduce
this:

"<td align="left" ><div style="width: 165px; height: 175px;"><a
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
something here

Best</td>"

it fails.

When I try this same expression with perl using the //s mode, it works.
I understand Ruby uses //m (multi-line mode in nearly the same fashion
causing newlines to be considered any character, so it should work,
right? Can anyone tell me what I am doing wrong here? Why isn't
"multiline" mode working?

Works for me: no match without /m, match with /m:

irb(main):004:0> s=%q{<td align="left" ><div style="width: 165px;
height: 175px;"><a
irb(main):005:0'
href="http://www.amazon.com/Rails-Recipes/dp/0977616606/ref=pd_sim_b_njs_img_1">testPit
irb(main):006:0' something here Best</td>}
=> "<td align=\"left\" ><div style=\"width: 165px; height:
175px;\"><a\nhref=\"http://www.amazon.com/Rails-Recipes/dp/09
77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
irb(main):007:0> s[%r{<td.*?</td\s*>}]
=> nil
irb(main):008:0> s[%r{<td.*?</td\s*>}m]
=> "<td align=\"left\" ><div style=\"width: 165px; height:
175px;\"><a\nhref=\"http://www.amazon.com/Rails-Recipes/dp/09
77616606/ref=pd_sim_b_njs_img_1\">testPit\nsomething here Best</td>"
irb(main):009:0>

Cheers

robert
 
R

Ransom Tullis

Thanks folks for all your help...turns out that I was using the regex
test view in Eclipse (RDT) which was obviously not behaving properly in
multi-line mode. I guess I need to go out and get the Aptana/Radrails
plugin that has the latest RDT and ruby-debug built in. I identified the
issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
restarting that server!

http://www.rubular.com/
 
R

Robert Klemme

2008/4/10 said:
Thanks folks for all your help...turns out that I was using the regex
test view in Eclipse (RDT) which was obviously not behaving properly in
multi-line mode. I guess I need to go out and get the Aptana/Radrails
plugin that has the latest RDT and ruby-debug built in. I identified the
issue using Mike Lovitt's Rubular regex tester. Thanks Mike for
restarting that server!

Why look so far? IRB serves the same purpose.

Cheers

robert
 
R

Ransom Tullis

Robert said:
Why look so far? IRB serves the same purpose.

Cheers

robert

I'm a newb with Ruby and IRB. I did test the regex in IRB, but did not
know that I could set a literal string up with \n characters like you
did above through the interface. So, of course, it was passing
everytime. That is very cool! I am growing fonder of IRB every day...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,503
Latest member
supremedee

Latest Threads

Top