Parsing an HTML file

C

CodeGuru73

I am trying to find the best way to parse a bunch of html files. They
are all simillar in structure and I need to get them into a database.
Their relevant structure is:
<html><head></head>
<body>
<h1>title</h1>
<address> authors </address>
<div> Main html content</div>

I basically need to get the values between <h1></h1>,
<address></address> and <div></div>

I am able to read the the files into an array.
 
P

Paul McGuire

CodeGuru73 said:
I am trying to find the best way to parse a bunch of html files. They
are all simillar in structure and I need to get them into a database.
Their relevant structure is:
<html><head></head>
<body>
<h1>title</h1>
<address> authors </address>
<div> Main html content</div>

I basically need to get the values between <h1></h1>,
<address></address> and <div></div>

I am able to read the the files into an array.

Check out this simple XML parsing code:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/157358

-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,172
Messages
2,570,934
Members
47,478
Latest member
ReginaldVi

Latest Threads

Top