Using gsub to remove embedded newlines in HTML file

W

Wes Gamble

I have an HTML file that is in a string.

I want to use gsub! to recursively remove any embedded newlines and
whitespace within two known delimeters.

Given a string that includes this kind of string:

~^LNK:http://slashdot.org/login.pl?op=newuserform~
Create a new account
^~

I want to replace the above with:

~^LNK:http://slashdot.org/login.pl?op=newuserform~Create a new account^~

(stripping out the newlines and whitespace)

Having trouble writing the regex for this.

I think I want something like:

/~\^LNK:.*?([\s\r\n])+.*?\^~/

that I could use in:

str.gsub!(/~\^LNK:.*?([\s\r\n])+.*?\^~/, '')

to replace all of the whitespace, or potential newline characters with
null strings.

But I don't think this will work because I really need to loop _within_
each substring of my large HTML string. The thing about gsub is that it
will substitute the entire matched string.

Do I need to scan out the ~^LNK.*?^~, operate on those and then put them
back into the larger string?

I'm not sure I'm asking this very well, so I apologize if that's the
case.

Thanks,
Wes
 
W

Wes Gamble

Something like:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\s\r\n]/, '')
@html.gsub!(/#{link_line}/mi, new_link_line)
end
 
W

Wes Gamble

Wes said:
Something like:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\s\r\n]/, '')
@html.gsub!(/#{link_line}/mi, new_link_line)
end

This seems to work well:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\t\r\n]/, '')
@html.gsub!(/#{Regexp.escape(link_line)}/mi, new_link_line) if
link_line != new_link_line
end

I wonder if I could have done with with one @html.gsub!() command, but
this is much more understandable to me anyway so I'll stick with this.

Thanks,
Wes
 
C

Carlos

Wes said:
Wes said:
Something like:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\s\r\n]/, '')
@html.gsub!(/#{link_line}/mi, new_link_line)
end


This seems to work well:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\t\r\n]/, '')
@html.gsub!(/#{Regexp.escape(link_line)}/mi, new_link_line) if
link_line != new_link_line
end

You can use a block with gsub:
@html.gsub!(/~\^LNK:.*?~/mi) { |s| s.gsub /\s/, '' }

or something like that.

Good luck.
--
 
W

Wes Gamble

Thanks. That is the _Ruby_ way to do it, and that's what I wanted to
know :).

I've used blocks with gsub but I keep forgetting that I can put anything
in there - so far I've only used backrefs to pull out pieces of the
matching regex to rearrange things.

Wes
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top