How to read a pdf file using active perl?

J

johny

Hi,
I am trying to read a PDF file using active Perl. I tried with
PDF::API2 but no use. For example - I should get the text which is on
the third line of first page...

or

Is there any way where I can save the pdf file as a .txt file and then
read the file?
Please help........

Thanks,
AJ
 
D

David Squire

johny said:
Hi,
I am trying to read a PDF file using active Perl. I tried with
PDF::API2 but no use. For example - I should get the text which is on
the third line of first page...

or

Is there any way where I can save the pdf file as a .txt file and then
read the file?
Please help........

Do you need to use Perl? There is the command-line utility pdftotext
that is available on most UNIX-like systems (and no doubt cygwin).

You need to be aware that there is no guarantee that you can get text
out of a PDF document. The PDF standard allows arbitrary encodings to be
used, so you would have to know what the glyph names mean to reconstruct
the text. In some cases the glyph names are not meaningful. See
http://www.glyphandcog.com/textext.html

That being said, pdftotext works in the great majority of cases.


DS
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top