A little complex usage of Beautiful Soup Parsing Help!

SAKTHEESH · Jul 20, 2011

I am using Beautiful Soup to parse a html to find all text that is Not
contained inside any anchor elements

I came up with this code which finds all links within href but not the
other way around.

How can I modify this code to get only plain text using Beautiful
Soup, so that I can do some find and replace and modify the soup?

for a in soup.findAll('a',href=True):
print a['href']

Example:

<html><body>
<div> <a href="www.test1.com/identify">test1</a> </div>
<div><br></div>
<div><a href="www.test2.com/identify">test2</a></div>
<div><br></div><div><br></div>
<div>
This should be identified

Identify me 1

Identify me 2
<p id="firstpara" align="center"> This paragraph should be<b>
identified </b>.</p>
</div>
</body></html>

Output:

This should be identified
Identify me 1
Identify me 2
This paragraph should be identified.

I am doing this operation to find text not within `<a></a>` : then
find "Identify" and do replace operation with "Replaced"

So the final output will be like this:

<html><body>
<div> <a href="www.test1.com/identify">test1</a> </div>
<div><br></div>
<div><a href="www.test2.com/identify">test2</a></div>
<div><br></div><div><br></div>
<div>
This should be identified

Repalced me 1

Replaced me 2
<p id="firstpara" align="center"> This paragraph should be<b>
identified </b>.</p>
</div>
</body></html>

Thanks for your time and help !

Thomas 'PointedEars' Lahn · Jul 22, 2011

SAKTHEESH said:
I am using Beautiful Soup to parse a html to find all text that is Not
contained inside any anchor elements

I came up with this code which finds all links within href

_anchors_ _with_ `href' _attribute_ (commonly: links.)

but not the other way around.

What would that be anyway?

How can I modify this code to get only plain text using Beautiful
Soup, so that I can do some find and replace and modify the soup?

RTFM:
<http://www.crummy.com/software/BeautifulSoup/documentation.html#contents>

Beautiful Soup Looping Extraction Question	5	Mar 24, 2008
Help with code	0	Jun 12, 2022
Final chapter of "Learn PHP, MySQL and JavaScript"	3	Jun 4, 2024
Positioning CSS components	1	Nov 16, 2023
Help with my responsive home page	2	Dec 14, 2022
I dont get this. Please help me!!	2	Jan 24, 2023
How can I get my menu inside of a menu to function properly?	1	Jan 19, 2023
Aligned to the left	3	Apr 19, 2023

A little complex usage of Beautiful Soup Parsing Help!

SAKTHEESH

Thomas 'PointedEars' Lahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads