reg exp to clean html tags

M

ma.giorgi

hi to all!
I've tried in all the way but I can't find a solution
I show you an example:
I have the following html code:

<div id="aaa">text inside<br/> a div</div><span class="bbb"> text
inside a span</span><img src="blabla"/><p class='ccc'
style='blablabla'>text inside a <b style="FONT-SIZE:
14px">paragraph</b> with some<!--Hey I'm a f$*!#ng comment--> bold
text</p>

I need to tranform this text into clean html, like this:

text inside a div text inside a span<p>text inside a paragraph with
some bold text</p>

I've tried this:

function cleareTags(){
var stringToReplace = String(document.theform.ArticleText.value);
var re=/(<\/?p)(?:\s[^>]*)?(>)|<[^>]*>/gi;
document.theform.ArticleText.value = stringToReplace.replace(re,'');
}

but nothing...it removes all the tags

I've tried also this way

function cleareTags(){
var r=/<(\w{1})([^>]*)>(.*)<\/\1>/gmi;
document.theform.ArticleText.value = stringToReplace.replace(r,
function(s,tag,attr,inner) {
//alert (inner)
return
(tag.toUpperCase()=="P"?"<"+tag+">"+inner+"</"+tag+">":inner);
}
);
}

but nothing happen...

please help me
My resources are at the end
bye
marco
 
J

Jc

hi to all!
I've tried in all the way but I can't find a solution
I show you an example:
I have the following html code:

<div id="aaa">text inside<br/> a div</div><span class="bbb"> text
inside a span</span><img src="blabla"/><p class='ccc'
style='blablabla'>text inside a <b style="FONT-SIZE:
14px">paragraph</b> with some<!--Hey I'm a f$*!#ng comment--> bold
text</p>

I need to tranform this text into clean html, like this:

text inside a div text inside a span<p>text inside a paragraph with
some bold text</p>

You can probably do this using regular expression's and a replace()
call, using the buffer and non-greedy operators (as one option), but
rather than write such a regex for you it would be best if you could
allow an actual HTML parser (such as a browser) to do the work for you.

If you don't care about browser compatibility (you are using this just
for yourself), you could use innerHTML and innerText in IE. For
example, add the HTML to an object in the document using innerHTML or
insertAdjacentHTML, and then read it back using innerText.
 
M

ma.giorgi

thanks for the answer...
the compatibility is to be only with IE6

but sorry I haven't understood quite well your answer...javascript it's
not my language and the boss has given me anyway this problem to solve

how can I do with innerHTML?
 
J

Jc

thanks for the answer...
the compatibility is to be only with IE6

but sorry I haven't understood quite well your answer...javascript it's
not my language and the boss has given me anyway this problem to solve

how can I do with innerHTML?

Here's an example:

<body>

<div id="divContainer"></div>

<script>
var sHTML = "<div>a<div>b</div></div>";
divContainer.innerHTML = sHTML;
alert(divContainer.innerText);
</script>

</body>
 
S

Stephen Chalmers

hi to all!
I've tried in all the way but I can't find a solution
I show you an example:
I have the following html code:

<div id="aaa">text inside<br/> a div</div><span class="bbb"> text
inside a span</span><img src="blabla"/><p class='ccc'
style='blablabla'>text inside a <b style="FONT-SIZE:
14px">paragraph</b> with some<!--Hey I'm a f$*!#ng comment--> bold
text</p>

I need to tranform this text into clean html, like this:

text inside a div text inside a span<p>text inside a paragraph with
some bold text</p>

I've tried this:

function cleareTags(){
var stringToReplace = String(document.theform.ArticleText.value);
var re=/(<\/?p)(?:\s[^>]*)?(>)|<[^>]*>/gi;
document.theform.ArticleText.value = stringToReplace.replace(re,'');
}

but nothing...it removes all the tags

I've tried also this way

function cleareTags(){
var r=/<(\w{1})([^>]*)>(.*)<\/\1>/gmi;
document.theform.ArticleText.value = stringToReplace.replace(r,
function(s,tag,attr,inner) {
file://alert (inner)
return
(tag.toUpperCase()=="P"?"<"+tag+">"+inner+"</"+tag+">":inner);
}
);
}

but nothing happen...

please help me
My resources are at the end
bye
marco

I assume you're trying to remove all tags except <p></p>
Unless someone knows another way, you can do it in two separate operations.
The first removes all tags except <p></p> but does not remove parameters within the opening <p> tag.

str=str.replace(/<(?!\s*p\s*|\w>|\s*\/\s*p\s*>)[^>]+>/gi, "")

str=str.replace(/<\s*p[^>]+>/gi, "<p>");

This isn't a universal solution, but on recent browsers it works on your example text.
 
S

Stephen Chalmers

hi to all!
I've tried in all the way but I can't find a solution
I show you an example:
I have the following html code:

<div id="aaa">text inside<br/> a div</div><span class="bbb"> text
inside a span</span><img src="blabla"/><p class='ccc'
style='blablabla'>text inside a <b style="FONT-SIZE:
14px">paragraph</b> with some<!--Hey I'm a f$*!#ng comment--> bold
text</p>

I need to tranform this text into clean html, like this:

text inside a div text inside a span<p>text inside a paragraph with
some bold text</p>

I've tried this:

function cleareTags(){
var stringToReplace = String(document.theform.ArticleText.value);
var re=/(<\/?p)(?:\s[^>]*)?(>)|<[^>]*>/gi;
document.theform.ArticleText.value = stringToReplace.replace(re,'');
}

but nothing...it removes all the tags

I've tried also this way

function cleareTags(){
var r=/<(\w{1})([^>]*)>(.*)<\/\1>/gmi;
document.theform.ArticleText.value = stringToReplace.replace(r,
function(s,tag,attr,inner) {
file://alert (inner)
return
(tag.toUpperCase()=="P"?"<"+tag+">"+inner+"</"+tag+">":inner);
}
);
}

but nothing happen...

please help me
My resources are at the end
bye
marco
The first operation removes all tags except <p></p>; the second removes parameters within the opening <p> tag.
It won't handle nested <> characters.
I have avoided the use of lookahead assertions.

str=str.replace(/<\s*([a-oq-z]|p\w|\!)[^>]*>|<\s*\/\s*([a-oq-z]|p\w)[^>]*>/gi, "");

str=str.replace(/<\s*p[^>]+>/gi, "<p>");
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top