Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   read pdf header in c (http://www.velocityreviews.com/forums/t952376-read-pdf-header-in-c.html)

Rudra Banerjee 09-18-2012 03:54 PM

read pdf header in c
 
Can anyone kindly show me the steps required to read pdf headers in human readable format?

Paul 09-18-2012 06:00 PM

Re: read pdf header in c
 
Rudra Banerjee wrote:
> Can anyone kindly show me the steps required to read pdf headers in human readable format?


Adobe offers documentation.

http://www.adobe.com/devnet/pdf/pdf_...e_archive.html

There is this 1310 page book.

http://wwwimages.adobe.com/www.adobe...erence_1-7.pdf

For comparison, you can also get yourself a copy of the
PostScript Language Reference Manual (PLRM.pdf).

http://www.adobe.com/products/postscript/pdfs/PLRM.pdf

In the beginning, there was PostScript. It's documented in
PLRM.pdf and it's a language of its own. PDF builds on those
concepts.

A lot of tools, when they produce PDF, they emit output in a
binary format which is hard for humans to read. But there
is also an option, to output in a text format (less compressed
perhaps). A tool such as GhostScript, can help with such
a transformation. And the source code for GhostScript, will
teach you a lot about PDF and PostScript in general.

http://stackoverflow.com/questions/3...t-in-a-text-ed

gswin32c.exe -- c:/path/to/pdfinflt.ps your-input.pdf deflated-output.pdf

It's unclear to me, what you mean by "headers" in this context.
PDF defines setup and subroutines, ahead of the definition
of the actual pages. But that's not particularly useful.
You could also be referring to tagging information. And
that may be OS specific for all I know.

In any case, have fun.

Paul

Malcolm McLean 09-18-2012 07:42 PM

Re: read pdf header in c
 
בתאריך יום שלישי, 18 בספטמבר 2012 16:54:23 UTC+1, מאת Rudra Banerjee:
> Can anyone kindly show me the steps required to read pdf headers in human
> readable format?
>

PDF is a binary format. To read any binary format, you need to have a copy
of the format specification. That tells you how the bits are to be interpreted.

With PDF, the gross file structure is quite straightforwards. Whilst I forget
the details, basically you have a tag which tells you what type of data the
section is (text, image, font, copyright notice, etc), then you have the
length of the data, then you have the data itself.
However the data itself is usually compressed, using zlib. Whilst it is
possible to write your owen decompressor, this is a major undertaking.
usually the only realistic option is to use a library.
What this means is that whilst you can get an idea of waht a PDF file
contains, you can't easily read the actual data, certainly not with your own
little scratch program.

--
http://www.malcolmmclean.site11.com/www


James Kuyper 09-18-2012 08:02 PM

Re: read pdf header in c
 
On 09/18/2012 03:42 PM, Malcolm McLean wrote:
....
> However the data itself is usually compressed, using zlib. Whilst it is
> possible to write your owen decompressor, this is a major undertaking.
> usually the only realistic option is to use a library.
> What this means is that whilst you can get an idea of waht a PDF file
> contains, you can't easily read the actual data, certainly not with your own
> little scratch program.


I'd expect zlib to include decompression algorithms, and a quick look at
the zlib documentation seems to confirm this expectation, so it should
be relatively easy to write a scratch program linked to zlib for reading
the actual data. I've never actually tried it - am I missing something?


Rudra Banerjee 09-18-2012 09:04 PM

Re: read pdf header in c
 
Thanks to all of you.
But, first of all, it seems, I need to have a profound knowledge of pdf file structure, as Paul suggested.
:(

Jorgen Grahn 09-20-2012 11:30 AM

Re: read pdf header in c
 
On Tue, 2012-09-18, Malcolm McLean wrote:
> ????? 2012 16:54:23 UTC+1, ?????? Rudra Banerjee:
>> Can anyone kindly show me the steps required to read pdf headers in human
>> readable format?
>>

> PDF is a binary format. To read any binary format, you need to have a copy
> of the format specification.


And that's true for text-based formats as well. It's just more
tempting to rely on guesswork in that case: "all C programs start with
a few #include lines, because all of those I looked at did".

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .


All times are GMT. The time now is 07:45 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.