cannot get html content of tag with BeautifulSoup

someone · Jun 18, 2010

Hello,

does anyone know how to get html contents of an tag with
BeautifulSoup? In example I'd like to get all html which is in first
 tag, i.e. This is paragraph one. as
unicode object

p.contents gives me a list which I cannot join TypeError: sequence
item 0: expected string, Tag found

Thanks!

from BeautifulSoup import BeautifulSoup
import re

doc = ['<html><head><title>Page title</title></head>',
'<body>This is
paragraph one.',
'This is paragraph two.',
'</body></html>']
soup = BeautifulSoup(''.join(doc))
#print soup.prettify()
r = re.compile(r'<[^<]*?/?>')
for i, p in enumerate(soup.findAll('p')):
#print type(p) #<class 'BeautifulSoup.Tag'>
#print type(p.contents) #list
content = "".join(p.contents) #fails

p_without_html = r.sub(' ', content)
print p_without_html

someone · Jun 18, 2010

Hello,

does anyone know how to get html contents of an tag with
BeautifulSoup? In example I'd like to get all html which is in first
 tag, i.e. This is paragraph one. as
unicode object

p.contents gives me a list which I cannot join TypeError: sequence
item 0: expected string, Tag found

Thanks!

from BeautifulSoup import BeautifulSoup
import re

doc = ['<html><head><title>Page title</title></head>',
'<body>This is
paragraph one.',
'This is paragraph two.',
'</body></html>']
soup = BeautifulSoup(''.join(doc))
#print soup.prettify()
r = re.compile(r'<[^<]*?/?>')
for i, p in enumerate(soup.findAll('p')):
#print type(p) #<class 'BeautifulSoup.Tag'>
#print type(p.contents) #list
content = "".join(p.contents) #fails

p_without_html = r.sub(' ', content)
print p_without_html

p.renderContents() was what I've looked for

BeautifulSoup	8	Jan 13, 2010
Extracting text using Beautifulsoup	0	Oct 25, 2009
How to extract contents of inner text of html tag?	0	Mar 1, 2014
Parsing html with Beautifulsoup	0	Dec 10, 2009
I'm about to get in trouble with the HTML <body></body> tags	10	Aug 12, 2023
HTML Anchor tag not working	2	Dec 15, 2020
I need help making an html website	2	Aug 2, 2023
Justify-content not working	1	Mar 29, 2021

cannot get html content of tag with BeautifulSoup

someone

someone

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads