How to apply text changes to HTML, keeping it intact if inside "a" tags

V

vbfoobar

Hello,

I have HTML input to which I apply some changes.

Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.

The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).

As a test case, suppose I want to uppercase all the text
except the text that is within "a href" tags:

ExampleString = """
<footag>Lorem Ipsum</footag> is simply
dummy text of <a href="junk.html">the printing</a> and
<a href="junk2.html">typesetting <b>industry</b>.</a>
Thanks."""

When applying the text transform, I want to obtain:

<footag>LOREM IPSUM</footag> IS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a> AND
<a href="junk2.html">typesetting <b>industry</b>.</a>
THANKS."""


Feature 2:
========
Another thing I may want to do: If the text I would normally
transform is inside an "a href" tag, then do not transform it,
but insert the result of text transformation just after the "</a>".

Using the same example as input, application of
this feature2 would give something like that:

<footag>LOREM IPSUM</footag> IS SIMPLY
DUMMY TEXT OF <a href="junk.html">the printing</a><feat2>THE
PRINTING</feat2> AND
<a href="junk2.html">typesetting
<b>industry</b>.</a><feat2>TYPESETTING <b>INDUSTRY</b>.</feat2>
THANKS."""

========
Thanks for your help
 
D

Diez B. Roggisch

Hello,

I have HTML input to which I apply some changes.

Feature 1:
=======
I want to tranform all the text, but if the text is inside
an "a href" tag, I want to leave the text as it is.

The HTML is not necessarily well-formed, so
I would like to do that using BeautifulSoup (or
maybe another tolerant parser).

<snip/>

Use the BeautifulSoup + XSL. Writing your two features in xsl is close to a
no-brainer, and it is certainly the best tool for the job.

And there are a few implementations for python available.

Diez
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,255
Members
46,853
Latest member
GeorgiaSta

Latest Threads

Top