regex woes with breaklines

A

Aa Wilson

Hello, all. I've been banging my head against this for a little bit,
and I'm no closer to a solution. The issue is that I have a string
where I would like to gsub all line ends ('\n' or '\r\n') into '<br />',
while preserving the line ends. The catch is that I don't want to gsub
anything that looks like '<br />\r\n' or '<br />\n', to prevent my gsub
from inserting breaklines on subsequent edits of this string. I've
tried gsub with the regex /(?!<br\s*\/>)($)[^\z]/, but haven't had any
luck getting it to match. For example, on the string
"test\r\ntest<br>\r\ntest", it will match both of the '\r\n' sets,
instead of only the first one (which would be optimal).

Thanks in advance for your time.
 
A

Andrew Timberlake

Hello, all. =A0I've been banging my head against this for a little bit,
and I'm no closer to a solution. =A0The issue is that I have a string
where I would like to gsub all line ends ('\n' or '\r\n') into '<br />',
while preserving the line ends. =A0The catch is that I don't want to gsub
anything that looks like '<br />\r\n' or '<br />\n', to prevent my gsub
from inserting breaklines on subsequent edits of this string. =A0I've
tried gsub with the regex /(?!<br\s*\/>)($)[^\z]/, but haven't had any
luck getting it to match. =A0For example, on the string
"test\r\ntest<br>\r\ntest", it will match both of the '\r\n' sets,
instead of only the first one (which would be optimal).

Thanks in advance for your time.

Replace the <br> along with the new line
Also remember that a new line can be CR (Mac), LF (Linux) or CRLF (Windows)

s =3D "test\r\ntest<br>\r\ntest\ntest\rtest<br/>\ntest"
s.gsub(/(?:<br *\/*>)*(?:\r\n|\n|\r)/, "<br />\n")
#=3D> "test<br />\ntest<br />\ntest<br />\ntest<br />\ntest<br />\ntest"

Andrew Timberlake
http://ramblingsonrails.com
http://www.linkedin.com/in/andrewtimberlake

"I have never let my schooling interfere with my education" - Mark Twain
 
A

Aa Wilson

Andrew said:
s = "test\r\ntest<br>\r\ntest\ntest\rtest<br/>\ntest"
s.gsub(/(?:<br *\/*>)*(?:\r\n|\n|\r)/, "<br />\n")
#=> "test<br />\ntest<br />\ntest<br />\ntest<br />\ntest<br />\ntest"

Thanks a million. It feels like a waste replacing something that's
already there, but I suppose that's just my conceptions of real life
getting in the way of my abstractions.

I was actually hoping that '$' (or possibly '$$?') would cover all the
CR/LF/CRLF cases for me, was I mistaken about that?
 
R

Robert Klemme

Thanks a million. It feels like a waste replacing something that's
already there, but I suppose that's just my conceptions of real life
getting in the way of my abstractions.

Not necessarily: with 1.9's regular expression engine there is negative
lookbehind as well. You could use that to prevent substitution of
newlines which are preceeded by a <br/> already.

http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
I was actually hoping that '$' (or possibly '$$?') would cover all the
CR/LF/CRLF cases for me, was I mistaken about that?

I believe it does not cover \r alone. In every case it is safer to be
explicit IMHO. Btw, I believe you can simplify to (?:\r\n?|\n).

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,998
Messages
2,570,242
Members
46,834
Latest member
vina0631

Latest Threads

Top