Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > read pdf header in c

Reply
Thread Tools

read pdf header in c

 
 
Rudra Banerjee
Guest
Posts: n/a
 
      09-18-2012
Can anyone kindly show me the steps required to read pdf headers in human readable format?
 
Reply With Quote
 
 
 
 
Paul
Guest
Posts: n/a
 
      09-18-2012
Rudra Banerjee wrote:
> Can anyone kindly show me the steps required to read pdf headers in human readable format?


Adobe offers documentation.

http://www.adobe.com/devnet/pdf/pdf_...e_archive.html

There is this 1310 page book.

http://wwwimages.adobe.com/www.adobe...erence_1-7.pdf

For comparison, you can also get yourself a copy of the
PostScript Language Reference Manual (PLRM.pdf).

http://www.adobe.com/products/postscript/pdfs/PLRM.pdf

In the beginning, there was PostScript. It's documented in
PLRM.pdf and it's a language of its own. PDF builds on those
concepts.

A lot of tools, when they produce PDF, they emit output in a
binary format which is hard for humans to read. But there
is also an option, to output in a text format (less compressed
perhaps). A tool such as GhostScript, can help with such
a transformation. And the source code for GhostScript, will
teach you a lot about PDF and PostScript in general.

http://stackoverflow.com/questions/3...t-in-a-text-ed

gswin32c.exe -- c:/path/to/pdfinflt.ps your-input.pdf deflated-output.pdf

It's unclear to me, what you mean by "headers" in this context.
PDF defines setup and subroutines, ahead of the definition
of the actual pages. But that's not particularly useful.
You could also be referring to tagging information. And
that may be OS specific for all I know.

In any case, have fun.

Paul
 
Reply With Quote
 
 
 
 
Malcolm McLean
Guest
Posts: n/a
 
      09-18-2012
בתאריך יום שלישי, 18 בספטמבר 2012 16:54:23 UTC+1, מאת Rudra Banerjee:
> Can anyone kindly show me the steps required to read pdf headers in human
> readable format?
>

PDF is a binary format. To read any binary format, you need to have a copy
of the format specification. That tells you how the bits are to be interpreted.

With PDF, the gross file structure is quite straightforwards. Whilst I forget
the details, basically you have a tag which tells you what type of data the
section is (text, image, font, copyright notice, etc), then you have the
length of the data, then you have the data itself.
However the data itself is usually compressed, using zlib. Whilst it is
possible to write your owen decompressor, this is a major undertaking.
usually the only realistic option is to use a library.
What this means is that whilst you can get an idea of waht a PDF file
contains, you can't easily read the actual data, certainly not with your own
little scratch program.

--
http://www.malcolmmclean.site11.com/www

 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      09-18-2012
On 09/18/2012 03:42 PM, Malcolm McLean wrote:
....
> However the data itself is usually compressed, using zlib. Whilst it is
> possible to write your owen decompressor, this is a major undertaking.
> usually the only realistic option is to use a library.
> What this means is that whilst you can get an idea of waht a PDF file
> contains, you can't easily read the actual data, certainly not with your own
> little scratch program.


I'd expect zlib to include decompression algorithms, and a quick look at
the zlib documentation seems to confirm this expectation, so it should
be relatively easy to write a scratch program linked to zlib for reading
the actual data. I've never actually tried it - am I missing something?

 
Reply With Quote
 
Rudra Banerjee
Guest
Posts: n/a
 
      09-18-2012
Thanks to all of you.
But, first of all, it seems, I need to have a profound knowledge of pdf file structure, as Paul suggested.

 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      09-20-2012
On Tue, 2012-09-18, Malcolm McLean wrote:
> ????? 2012 16:54:23 UTC+1, ?????? Rudra Banerjee:
>> Can anyone kindly show me the steps required to read pdf headers in human
>> readable format?
>>

> PDF is a binary format. To read any binary format, you need to have a copy
> of the format specification.


And that's true for text-based formats as well. It's just more
tempting to rely on guesswork in that case: "all C programs start with
a few #include lines, because all of those I looked at did".

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Header files with "header.h" or <header.h> ?? mlt C++ 2 01-31-2009 02:54 PM
Postscript to PDF with pdf-tools, pdf-writer, or other Sean Nakasone Ruby 1 04-14-2008 09:13 PM
PDF::Writer, create pdf and insert in other pdf file. Ricardo Pog Ruby 1 03-26-2008 08:24 PM
What is better /standard for creating files. a cpp file with header or cpp and seperate file for header DrUg13 C++ 1 02-10-2004 09:20 AM
how to avoid using another header file inside a header file? Newsgroup - Ann C++ 4 11-02-2003 01:20 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57