beautiful soup library question

M

meyerkp

Hi all,

I'm trying to extract some information from an html file using
beautiful soup. The strings I want get are after br tags, eg:

<font size='6'>
<br>this info
<br>more info
<br>and more info
</font>

I can navigate to the first br tag using find_next_sibling, but how do
I get the string after the br's?
br.contents is empty.

thanks for any ideas.
 
E

Erik Max Francis

I'm trying to extract some information from an html file using
beautiful soup. The strings I want get are after br tags, eg:

<font size='6'>
<br>this info
<br>more info
<br>and more info
</font>

I can navigate to the first br tag using find_next_sibling, but how do
I get the string after the br's?
br.contents is empty.

I'm not familiar with Beautiful Soup specifically, but this isn't how
the <br> tag works. Unlike a tag like <li> or <p>, which need not be
closed in HTML, <br> does not contain anything, it's just a line break.
If it were XHTML, it would be <br />, indicating that it's a
standalone tag.

Instead you want to traverse the contents of the font tag, taking into
account line breaks that you encounter.
 
E

Enigma Curry

Here's how I print each line after the <br>'s:

import BeautifulSoup as Soup
page=open("test.html").read()
soup=Soup.BeautifulSoup(page)
for br in soup.fetch('br'):
print br.next
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top