![]() |
read pdf header in c
Can anyone kindly show me the steps required to read pdf headers in human readable format?
|
Re: read pdf header in c
Rudra Banerjee wrote:
> Can anyone kindly show me the steps required to read pdf headers in human readable format? Adobe offers documentation. http://www.adobe.com/devnet/pdf/pdf_...e_archive.html There is this 1310 page book. http://wwwimages.adobe.com/www.adobe...erence_1-7.pdf For comparison, you can also get yourself a copy of the PostScript Language Reference Manual (PLRM.pdf). http://www.adobe.com/products/postscript/pdfs/PLRM.pdf In the beginning, there was PostScript. It's documented in PLRM.pdf and it's a language of its own. PDF builds on those concepts. A lot of tools, when they produce PDF, they emit output in a binary format which is hard for humans to read. But there is also an option, to output in a text format (less compressed perhaps). A tool such as GhostScript, can help with such a transformation. And the source code for GhostScript, will teach you a lot about PDF and PostScript in general. http://stackoverflow.com/questions/3...t-in-a-text-ed gswin32c.exe -- c:/path/to/pdfinflt.ps your-input.pdf deflated-output.pdf It's unclear to me, what you mean by "headers" in this context. PDF defines setup and subroutines, ahead of the definition of the actual pages. But that's not particularly useful. You could also be referring to tagging information. And that may be OS specific for all I know. In any case, have fun. Paul |
Re: read pdf header in c
בתאריך יום שלישי, 18 בספטמבר 2012 16:54:23 UTC+1, מאת Rudra Banerjee:
> Can anyone kindly show me the steps required to read pdf headers in human > readable format? > PDF is a binary format. To read any binary format, you need to have a copy of the format specification. That tells you how the bits are to be interpreted. With PDF, the gross file structure is quite straightforwards. Whilst I forget the details, basically you have a tag which tells you what type of data the section is (text, image, font, copyright notice, etc), then you have the length of the data, then you have the data itself. However the data itself is usually compressed, using zlib. Whilst it is possible to write your owen decompressor, this is a major undertaking. usually the only realistic option is to use a library. What this means is that whilst you can get an idea of waht a PDF file contains, you can't easily read the actual data, certainly not with your own little scratch program. -- http://www.malcolmmclean.site11.com/www |
Re: read pdf header in c
On 09/18/2012 03:42 PM, Malcolm McLean wrote:
.... > However the data itself is usually compressed, using zlib. Whilst it is > possible to write your owen decompressor, this is a major undertaking. > usually the only realistic option is to use a library. > What this means is that whilst you can get an idea of waht a PDF file > contains, you can't easily read the actual data, certainly not with your own > little scratch program. I'd expect zlib to include decompression algorithms, and a quick look at the zlib documentation seems to confirm this expectation, so it should be relatively easy to write a scratch program linked to zlib for reading the actual data. I've never actually tried it - am I missing something? |
Re: read pdf header in c
Thanks to all of you.
But, first of all, it seems, I need to have a profound knowledge of pdf file structure, as Paul suggested. :( |
Re: read pdf header in c
On Tue, 2012-09-18, Malcolm McLean wrote:
> ????? 2012 16:54:23 UTC+1, ?????? Rudra Banerjee: >> Can anyone kindly show me the steps required to read pdf headers in human >> readable format? >> > PDF is a binary format. To read any binary format, you need to have a copy > of the format specification. And that's true for text-based formats as well. It's just more tempting to rely on guesswork in that case: "all C programs start with a few #include lines, because all of those I looked at did". /Jorgen -- // Jorgen Grahn <grahn@ Oo o. . . \X/ snipabacken.se> O o . |
| All times are GMT. The time now is 05:56 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.