RegExp help needed!

R

ric.castagna

Greetings, all...

I've got an issue that I'm trying to solve and RegExp looks to be my
only avenue.

Background: We are using a popular third party control for input into a
textarea. By default, the text is HTML but the UI does not display the
HTML tags. Because we are using this in a web-messaging scenario, the
user may choose to send their message in plain text and we want to
display that on the screen.

The third party control we're using does not have any built-in ability
to switch from their rich-text editing environment to a "plain-text"
environment.

Proposed Resolution: 1) Read the data input into the third-party
control, 2) strip all HTML markup from the data while maintaining
carriage return/line feed data, 3) Hide the third-party control and
reveal an HTML text area, and, 4) copy the reformatted text into the
text area.

Additional Feature: The user should be able to re-select HTML editing
effectively reversing this process. While any previous HTML formatting
would be lost, the cr/lf data would be preserved when switching back to
HTML.

I have played around with some regular expressions with only moderate
luck. None of the freely available patters on RegExLib.com seem to do
what is needed.

Here's the current pattern I'm working with: <[^>]+?>
This pattern strips all the HTML, but does not preserve the cr/lf data.

I would really appreciate any assistance that can be offered. If you
know of a pattern that would reliably do the conversion from HTML to
plain-text, and the companion that would do the plain-text to HTML, I
would be very grateful!

Thanks in advance,
Ric Castagna
 
E

Evertjan.

wrote on 13 okt 2005 in comp.lang.javascript:
2) strip all HTML markup from the data while maintaining
carriage return/line feed data

http://www.google.com/search?q=strip.html regex
and
http://groups.google.com/groups?q=strip.html+regex
will give you ample examples of the needed regex

However what crlf do you want to safe?
the <br> and the <hr> or also the <div>?

you could very well detect them first and reassign them first,
and restort the <br> at the end:


=====================================================
t = t.replace(/(<br>)|(<hr>)|(</div>)/gi, "*\\*")

t = t.replace(/[do your stripping]/gi, "")

t = t.replace(/\*\\\\\*/g, "<br>")
=====================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top