Suggestion for converting PDF files to HTML/txt files

  • Thread starter srinivasan srinivas
  • Start date
B

brad

srinivasan said:
Could someone suggest me ways to convert PDF files to HTML files??
Does Python have any modules to do that job??

Thanks,
Srini

Unless there is some recent development, the answer is no, it's not
possible. Getting text out of PDF is difficult (to say the least) and at
times impossible... i.e. a PDF can be an image that contains some text, etc.
 
A

alex23

Very neat program. Would be cool if it could easily integrate into other
py apps instead of being a standalone CLI tool.

Perhaps, but I think you could get a long way using os.system().
 
B

brad

alex23 said:
Perhaps, but I think you could get a long way using os.system().

Yes, that is possible, but there's a lot of overhead when doing that...
unfortunately. Also, if using os.system() is the answer, then one could
just use the xpdf pdftotext program. A native Python solution that could
be called from other PY apps naturally, would be awesome.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,990
Messages
2,570,211
Members
46,796
Latest member
SteveBreed

Latest Threads

Top