BeautifulSoup to get string inner 'p' and 'a' tags

G

GinTon

I'm trying to get the 'FOO' string but the problem is that inner 'P'
tag there is another tag, 'a'. So:
from BeautifulSoup import BeautifulSoup
s = '<td width="88%" valign="TOP"> <p class="contentBody">FOO <a name="f"></a> </p></td>'
tree = BeautifulSoup(s)
print tree.first('p')
<p class="contentBody">FOO <a name="f"></a> </p>

So if I run 'print tree.first('p').string' to get the 'FOO' string it
shows Null value because it's the 'a' tag:
print tree.first('p').string
Null

Any solution?
 
M

Marc 'BlackJack' Rintsch

I'm trying to get the 'FOO' string but the problem is that inner 'P'
tag there is another tag, 'a'. So:


<p class="contentBody">FOO <a name="f"></a> </p>

So if I run 'print tree.first('p').string' to get the 'FOO' string it
shows Null value because it's the 'a' tag:

Null

Any solution?

In [53]: print tree.first('p').contents[0]
FOO

Ciao,
Marc 'BlackJack' Rintsch
 
N

Nick Vatamaniuc

Quick-n-dirty way:
After you get your whole p string: <p class="contentBody">FOO <a
name="f"></a> </p>
Remove any tags delimited by '<' and '>' with a regex. In your short
example you _don't_ show that there might be something between the <a>
and </a> tags so I assume there won't be anything or if there would be
something then you also want it included in the final text. As in
'<p class="contentBody">FOO <a name="f">URLNAME</a> </p>' ==> 'FOO
URLNAME'

For the regex start with something simple like <.*?> and see if it
works then improve it. Use kiki or kodos - python visual regex
helpers.

Hope this helps,
Nick V.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top