M
Max Benjamin
Is there an easy way to strip html tags from strings?
Thanks
Thanks
Is there an easy way to strip html tags from strings?
Thanks
Max Benjamin said:Is there an easy way to strip html tags from strings?
Daniel said:the problem is, it's not always the _correct_ way.
<div id="weird>id"></div>
Thanks for the quick replys.Daniel said:That is true.. if the original poster has the luxury of only dealing
with
correct html, he's a lucky fellow, and can kludge up some regexen that
will
do the job. Even in a well-coded site, it's not unthinkable that you
could
forget to do some encoding and end up with angle-brackets inside a
textarea
or something, though.
How useful a regex approach is depends on the data. I have used a bit
of
regex-type html parsing before and it worked fine, for the data that I
was
parsing. Horses for courses.
;D
Thanks for the quick replys.
I should have been more explicit in my question. I want to strip html
tags in order to sanitize form input. I'm a bit of a ruby noob and I
was hoping to find a function similar to PHP's strip_tags, one that
would remove both html and ruby code.
Best
Mat said:For sanitizing input, just escaping might be a better idea because it
has less chance of being destructive. If you're on rails there's an h
() function for this. If you're doing something else, maybe check
out how rails does it and replicate it. There might be something
easy that someone on this list knows that don't.
If you really want to strip them, I'd bet the regexp solution is no
less effective than PHP's strip_tags.
-Mat
Christian said:It's valid XHTML:
$ echo '<bar quux="foo>bar" />' | xmllint -
<?xml version="1.0"?>
<bar quux="foo>bar"/>
However, '<' needs to be escaped:
$ echo '<bar quux="foo<bar" />' | xmllint -
-:1: parser error : Unescaped '<' not allowed in attributes values
<bar quux="foo<bar" />
Christian said:William James said:re = %r{
<
(?:
# Any characters but > or " .
[^>"] +
|
# Characters within quotes.
# Allow escaped quotes.
"
(?:
# Accept any escaped character.
\\.
|
[^"\\] +
) *
"
) *}xm
print DATA.read.gsub( re, '' )
<foo bar='"quux"' />
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.