Parsing html with Beautifulsoup

J

Johann Spies

I am trying to get csv-output from a html-file.

With this code I had a little success:
=========================
from BeautifulSoup import BeautifulSoup
from string import replace, join
import re

f = open("configuration.html","r")
g = open("configuration.csv",'w')
soup = BeautifulSoup(f)
t = soup.findAll('table')
for table in t:
rows = table.findAll('tr')
for th in rows[0]:
t = th.find(text=True)
g.write(t)
g.write(',')
# print(','.join(t))

for tr in rows:
cols = tr.findAll('td')
for td in cols:
try:
t = td.find(text=True).replace(' ','')
g.write(t)
except:
g.write ('')
g.write(",")
g.write("\n")
===============================

producing output like this:

RULE,SOURCE,DESTINATION,SERVICES,ACTION,TRACK,TIME,INSTALL ON,COMMENTS,
1,,,,drop,Log,Any,,,
2,All Users@Any,,Any,clientencrypt,Log,Any,,,
3,Any,Any,,drop,None,Any,,,
4,,,,drop,None,Any,,,
....

It left out all the non-plaintext parts of <td></td>

I then tried using

t.renderContents and then got something like this (one line broken into
many for the sake of this email):

1,<img src=icons/group.png>&nbsp;<a href=#OBJ_sunetint>
Rainwall_Cluster</A> <BR>,
<img>src=icons/udp.png>&nbsp;<a href=#SVC_IKE >IKE</a><br>,
<img src=icons/drop.png>&nbsp;drop,
<img src=icons/log.png>&nbsp;Log&nbsp;,
Rainwall_Cluster</A> <BR>&nbsp;,&nbsp;

How do I get Beautifulsoup to render (taking the above line as
example)

sunentint for <img src=icons/group.png>&nbsp;<a
href=#OBJ_sunetint>sunetint</A><BR>

and still provide the text-parts in the <td>'s with plain text?

I have experimented a little bit with regular expressions, but could
so far not find a solution.

Regards
Johann
--
Johann Spies Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

"Lo, children are an heritage of the LORD: and the
fruit of the womb is his reward." Psalms 127:3
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top