T
Tess
Hello All,
I have a Beautiful Soup question and I'd appreciate any guidance the
forum can provide.
Let's say I have a file that looks at file.html pasted below.
My goal is to extract all elements where the following is true: <p
align="left"> and <div align="center">.
The lines should be ordered in the same order as they appear in the
file - therefore the output file would look like output.txt below.
I experimented with something similar to this code:
for i in soup.findAll('p', align="left"):
print i
for i in soup.findAll('p', align="center"):
print i
I get something like this:
<p align="left">P4</p>
<p align="left">P3</p>
<p align="left">P1</p>
<div align="center">div4b</div>
<div align="center">div3b</div>
<div align="center">div2b</div>
<div align="center">div2a</div>
Any guidance would be greatly appreciated.
Best,
Ira
##########begin: file.html############
<html>
<body>
<p align="left">P1</p>
<p align="right">P2</p>
<div align="center">div2a</div>
<div align="center">div2b</div>
<p align="left">P3</p>
<div align="right">div3a</div>
<div align="center">div3b</div>
<div align="left">div3c</div>
<p align="left">P4</p>
<div align="left">div4a</div>
<div align="center">div4b</div>
</body>
</html>
##########end: file.html############
===================begin: output.txt===================
<p align="left">P1</p>
<div align="center">div2a</div>
<div align="center">div2b</div>
<p align="left">P3</p>
<div align="center">div3b</div>
<p align="left">P4</p>
<div align="center">div4b</div>
===================end: output.txt===================
I have a Beautiful Soup question and I'd appreciate any guidance the
forum can provide.
Let's say I have a file that looks at file.html pasted below.
My goal is to extract all elements where the following is true: <p
align="left"> and <div align="center">.
The lines should be ordered in the same order as they appear in the
file - therefore the output file would look like output.txt below.
I experimented with something similar to this code:
for i in soup.findAll('p', align="left"):
print i
for i in soup.findAll('p', align="center"):
print i
I get something like this:
<p align="left">P4</p>
<p align="left">P3</p>
<p align="left">P1</p>
<div align="center">div4b</div>
<div align="center">div3b</div>
<div align="center">div2b</div>
<div align="center">div2a</div>
Any guidance would be greatly appreciated.
Best,
Ira
##########begin: file.html############
<html>
<body>
<p align="left">P1</p>
<p align="right">P2</p>
<div align="center">div2a</div>
<div align="center">div2b</div>
<p align="left">P3</p>
<div align="right">div3a</div>
<div align="center">div3b</div>
<div align="left">div3c</div>
<p align="left">P4</p>
<div align="left">div4a</div>
<div align="center">div4b</div>
</body>
</html>
##########end: file.html############
===================begin: output.txt===================
<p align="left">P1</p>
<div align="center">div2a</div>
<div align="center">div2b</div>
<p align="left">P3</p>
<div align="center">div3b</div>
<p align="left">P4</p>
<div align="center">div4b</div>
===================end: output.txt===================