V
vbfoobar
Hello,
I have HTML input to which I apply some changes.
Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.
The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).
As a test case, suppose I want to uppercase all the text
except the text that is within "a href" tags:
ExampleString = """
<footag>Lorem Ipsum</footag> is simply
dummy text of <a href="junk.html">the printing</a> and
<a href="junk2.html">typesetting <b>industry</b>.</a>
Thanks."""
When applying the text transform, I want to obtain:
<footag>LOREM IPSUM</footag> IS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a> AND
<a href="junk2.html">typesetting <b>industry</b>.</a>
THANKS."""
Feature 2:
========
Another thing I may want to do: If the text I would normally
transform is inside an "a href" tag, then do not transform it,
but insert the result of text transformation just after the "</a>".
Using the same example as input, application of
this feature2 would give something like that:
<footag>LOREM IPSUM</footag> IS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a><feat2>THE
PRINTING</feat2> AND
<a href="junk2.html">typesetting
<b>industry</b>.</a><feat2>TYPESETTING <b>INDUSTRY</b>.</feat2>
THANKS."""
========
Thanks for your help
I have HTML input to which I apply some changes.
Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.
The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).
As a test case, suppose I want to uppercase all the text
except the text that is within "a href" tags:
ExampleString = """
<footag>Lorem Ipsum</footag> is simply
dummy text of <a href="junk.html">the printing</a> and
<a href="junk2.html">typesetting <b>industry</b>.</a>
Thanks."""
When applying the text transform, I want to obtain:
<footag>LOREM IPSUM</footag> IS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a> AND
<a href="junk2.html">typesetting <b>industry</b>.</a>
THANKS."""
Feature 2:
========
Another thing I may want to do: If the text I would normally
transform is inside an "a href" tag, then do not transform it,
but insert the result of text transformation just after the "</a>".
Using the same example as input, application of
this feature2 would give something like that:
<footag>LOREM IPSUM</footag> IS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a><feat2>THE
PRINTING</feat2> AND
<a href="junk2.html">typesetting
<b>industry</b>.</a><feat2>TYPESETTING <b>INDUSTRY</b>.</feat2>
THANKS."""
========
Thanks for your help