Parse and clean odt docs: with lxml ? hints to start ?

K

kaer

Basically, I have to upgrade a website with a lot of new content. I
received those docs in the openoffice format. If I open and save one
of those documents in the html format, I can cut and paste the result
in the html page, it's not that bad as a start but I need to clean
that html (remove tags, remove or change attributes, ...). My first
idea is to use lxml for that. My questions:
- is there a better way ?
- is lxml the right tool for that ?
- some examples of code for doing that ?

Have a nice day.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top