Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Extracting images from a PDF file

Reply
Thread Tools

Extracting images from a PDF file

 
 
Doug Farrell
Guest
Posts: n/a
 
      12-27-2007
Hi all,

Does anyone know how to extract images from a PDF file? What I'm looking
to do is use pdflib_py to open large PDF files on our Linux servers,
then use PIL to verify image data. I want to do this in order
to find corrupt images in the PDF files. If anyone could help
me out, or point me in the right direction, it would be most
appreciated!

Also, does anyone know of a way to validate a PDF file?

Thanks in advance,
Doug
 
Reply With Quote
 
 
 
 
Carl K
Guest
Posts: n/a
 
      12-27-2007
Doug Farrell wrote:
> Hi all,
>
> Does anyone know how to extract images from a PDF file? What I'm looking
> to do is use pdflib_py to open large PDF files on our Linux servers,
> then use PIL to verify image data. I want to do this in order
> to find corrupt images in the PDF files. If anyone could help
> me out, or point me in the right direction, it would be most
> appreciated!
>


If you are ok shelling out to a binary:

pdfimages - Portable Document Format (PDF) image extractor (version
3.00)
http://packages.ubuntu.com/gutsy/text/xpdf-utils

I am trying to convert the pdf to a png, but without having to run external
commands. so I will understand if you arn't happy with pdfimages.

Carl K
 
Reply With Quote
 
 
 
 
writeson
Guest
Posts: n/a
 
      12-27-2007
On Dec 27, 1:12 am, Carl K <(E-Mail Removed)> wrote:
> Doug Farrell wrote:
> > Hi all,

>
> > Does anyone know how to extract images from a PDF file? What I'm looking
> > to do is use pdflib_py to open large PDF files on our Linux servers,
> > then use PIL to verify image data. I want to do this in order
> > to find corrupt images in the PDF files. If anyone could help
> > me out, or point me in the right direction, it would be most
> > appreciated!

>
> If you are ok shelling out to a binary:
>
> pdfimages - Portable Document Format (PDF) image extractor (version
> 3.00)http://packages.ubuntu.com/gutsy/text/xpdf-utils
>
> I am trying to convert the pdf to a png, but without having to run external
> commands. so I will understand if you arn't happy with pdfimages.
>
> Carl K


Carl,

Thanks for the feedback, and I don't mind shelling out to an external
command if it gets the job done. Thanks for the link to xpdf-utils,
I'm going to look into it this morning.

Doug
 
Reply With Quote
 
Max Erickson
Guest
Posts: n/a
 
      12-27-2007
Doug Farrell <(E-Mail Removed)> wrote:

> Hi all,
>
> Does anyone know how to extract images from a PDF file? What I'm
> looking to do is use pdflib_py to open large PDF files on our
> Linux servers, then use PIL to verify image data. I want to do
> this in order to find corrupt images in the PDF files. If anyone
> could help me out, or point me in the right direction, it would
> be most appreciated!
>
> Also, does anyone know of a way to validate a PDF file?
>
> Thanks in advance,
> Doug


There is some discussion here:

http://nedbatchelder.com/blog/200712...0071210T064608



max

 
Reply With Quote
 
writeson
Guest
Posts: n/a
 
      12-28-2007
On Dec 27, 10:13 am, writeson <(E-Mail Removed)> wrote:
> On Dec 27, 1:12 am, Carl K <(E-Mail Removed)> wrote:
>
>
>
> > Doug Farrell wrote:
> > > Hi all,

>
> > > Does anyone know how to extract images from aPDFfile? What I'm looking
> > > to do is use pdflib_py to open largePDFfiles on our Linux servers,
> > > then use PIL to verify image data. I want to do this in order
> > > to find corrupt images in thePDFfiles. If anyone could help
> > > me out, or point me in the right direction, it would be most
> > > appreciated!

>
> > If you are ok shelling out to a binary:

>
> > pdfimages - Portable Document Format (PDF) image extractor (version
> > 3.00)http://packages.ubuntu.com/gutsy/text/xpdf-utils

>
> > I am trying to convert thepdfto a png, but without having to run external
> > commands. so I will understand if you arn't happy with pdfimages.

>
> > Carl K

>
> Carl,
>
> Thanks for the feedback, and I don't mind shelling out to an external
> command if it gets the job done. Thanks for the link to xpdf-utils,
> I'm going to look into it this morning.
>
> Doug


Hi,

Our linux servers run CentOS (4.X) I believe, and the repositories for
this version doesn't have xpdf-utils available. I'm going to look into
editing the sources.list file in order to get yum to install the
necessary dependencies for me as xpdf-utils looks very useful!

Doug
 
Reply With Quote
 
writeson
Guest
Posts: n/a
 
      12-28-2007
On Dec 27, 2:17 pm, Max Erickson <(E-Mail Removed)> wrote:
> Doug Farrell <(E-Mail Removed)> wrote:
> > Hi all,

>
> > Does anyone know how to extract images from aPDFfile? What I'm
> > looking to do is use pdflib_py to open largePDFfiles on our
> > Linux servers, then use PIL to verify image data. I want to do
> > this in order to find corrupt images in thePDFfiles. If anyone
> > could help me out, or point me in the right direction, it would
> > be most appreciated!

>
> > Also, does anyone know of a way to validate aPDFfile?

>
> > Thanks in advance,
> > Doug

>
> There is some discussion here:
>
> http://nedbatchelder.com/blog/200712...0071210T064608
>
> max


Max,

That's a very interesting snippet of code, thanks for posting the
link! Much appreciated!

Doug

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Use TrueType Fonts for PDF Generation & Adding Buffered Images to PDF sherazam Java 0 06-22-2012 10:53 AM
extracting number from a pdf karthikprs Java 0 02-25-2012 09:34 AM
Postscript to PDF with pdf-tools, pdf-writer, or other Sean Nakasone Ruby 1 04-14-2008 09:13 PM
PDF::Writer, create pdf and insert in other pdf file. Ricardo Pog Ruby 1 03-26-2008 08:24 PM
Extracting text from .png images Henrik Berg Nielsen Python 7 10-02-2003 09:37 PM



Advertisments