B
brad
I have a very crude Python script that extracts text from some (and I
emphasize some) PDF documents. On many PDF docs, I cannot extract text,
but this is because I'm doing something wrong. The PDF spec is large and
complex and there are various ways in which to store and encode text. I
wanted to post here and ask if anyone is interested in helping make the
script better which means it should accurately extract text from most
any pdf file... not just some.
I know the topic of reading/extracting the text from a PDF document
natively in Python comes up every now and then on comp.lang.python...
I've posted about it in the past myself. After searching for other
solutions, I've resorted to attempting this on my own in my spare time.
Using apps external to Python (pdftotext, etc.) is not really an option
for me. If someone knows of a free native Python app that does this now,
let me know and I'll use that instead!
So, if other more experienced programmer are interested in helping make
the script better, please let me know. I can host a website and the
latest revision and do all of the grunt work.
Thanks,
Brad
emphasize some) PDF documents. On many PDF docs, I cannot extract text,
but this is because I'm doing something wrong. The PDF spec is large and
complex and there are various ways in which to store and encode text. I
wanted to post here and ask if anyone is interested in helping make the
script better which means it should accurately extract text from most
any pdf file... not just some.
I know the topic of reading/extracting the text from a PDF document
natively in Python comes up every now and then on comp.lang.python...
I've posted about it in the past myself. After searching for other
solutions, I've resorted to attempting this on my own in my spare time.
Using apps external to Python (pdftotext, etc.) is not really an option
for me. If someone knows of a free native Python app that does this now,
let me know and I'll use that instead!
So, if other more experienced programmer are interested in helping make
the script better, please let me know. I can host a website and the
latest revision and do all of the grunt work.
Thanks,
Brad