On Dec 22, 10:06 am, Shahid <mirzashahidmahm...@gmail.com> wrote:
> Hi all,
>
> I have my web server bases on linux. I am working on a project for
> which I need to get text out of PDF file. I need to know which text
> belongs to which PDF page number?
>
> Is there any utility/tool that should be installed on linux and I can
> use it from command line in PHP through exec() or system() etc for
> this purpose?
>
> Please reply me urgently.
>
> Thanks in advance.
There is a module on CPAN called PDF::OCR::Thorough which attempts
to extract text from pdf docs. I've never used it and it looks like
a fair amount of work to set up. If the pdf file has a known simple
structure, there may be easier ways.
|