How to get text from PDF?

S

Shahid

Hi all,

I have my web server bases on linux. I am working on a project for
which I need to get text out of PDF file. I need to know which text
belongs to which PDF page number?

Is there any utility/tool that should be installed on linux and I can
use it from command line in PHP through exec() or system() etc for
this purpose?

Please reply me urgently.

Thanks in advance.
 
S

smallpond

Hi all,

I have my web server bases on linux. I am working on a project for
which I need to get text out of PDF file. I need to know which text
belongs to which PDF page number?

Is there any utility/tool that should be installed on linux and I can
use it from command line in PHP through exec() or system() etc for
this purpose?

Please reply me urgently.

Thanks in advance.


There is a module on CPAN called PDF::OCR::Thorough which attempts
to extract text from pdf docs. I've never used it and it looks like
a fair amount of work to set up. If the pdf file has a known simple
structure, there may be easier ways.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top