Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Suggestion for converting PDF files to HTML/txt files

Reply
Thread Tools

Suggestion for converting PDF files to HTML/txt files

 
 
srinivasan srinivas
Guest
Posts: n/a
 
      08-11-2008
Could someone suggest me ways to convert PDF files to HTML files??
Does Python have any modules to do that job??

Thanks,
Srini


Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mai...tools-08.html/
 
Reply With Quote
 
 
 
 
brad
Guest
Posts: n/a
 
      08-11-2008
srinivasan srinivas wrote:
> Could someone suggest me ways to convert PDF files to HTML files??
> Does Python have any modules to do that job??
>
> Thanks,
> Srini


Unless there is some recent development, the answer is no, it's not
possible. Getting text out of PDF is difficult (to say the least) and at
times impossible... i.e. a PDF can be an image that contains some text, etc.
 
Reply With Quote
 
 
 
 
alex23
Guest
Posts: n/a
 
      08-12-2008
srinivasan srinivas wrote:
> Could someone suggest me ways to convert PDF files to HTML files??
> Does Python have any modules to do that job??


PDFMiner is a set of CLI tools written in Python, one of which
converts PDF to text, HTML and more:
http://www.unixuser.org/~euske/pytho...ner/index.html


 
Reply With Quote
 
brad
Guest
Posts: n/a
 
      08-12-2008
alex23 wrote:

> PDFMiner is a set of CLI tools written in Python, one of which
> converts PDF to text, HTML and more:
> http://www.unixuser.org/~euske/pytho...ner/index.html


Very neat program. Would be cool if it could easily integrate into other
py apps instead of being a standalone CLI tool.
 
Reply With Quote
 
alex23
Guest
Posts: n/a
 
      08-12-2008
On Aug 12, 11:13*pm, brad <byte8b...@gmail.com> wrote:
> Very neat program. Would be cool if it could easily integrate into other
> py apps instead of being a standalone CLI tool.


Perhaps, but I think you could get a long way using os.system().

 
Reply With Quote
 
brad
Guest
Posts: n/a
 
      08-12-2008
alex23 wrote:
> On Aug 12, 11:13 pm, brad <byte8b...@gmail.com> wrote:
>> Very neat program. Would be cool if it could easily integrate into other
>> py apps instead of being a standalone CLI tool.

>
> Perhaps, but I think you could get a long way using os.system().


Yes, that is possible, but there's a lot of overhead when doing that...
unfortunately. Also, if using os.system() is the answer, then one could
just use the xpdf pdftotext program. A native Python solution that could
be called from other PY apps naturally, would be awesome.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Need to convert PDF files to PowerPoint,Any suggestion? hourer Software 6 12-23-2011 03:15 AM
Postscript to PDF with pdf-tools, pdf-writer, or other Sean Nakasone Ruby 1 04-14-2008 09:13 PM
PDF::Writer, create pdf and insert in other pdf file. Ricardo Pog Ruby 1 03-26-2008 08:24 PM
fdisc "suggestion" leads to pdf problem Wereo_INFALLIBLE Computer Support 49 04-05-2006 08:36 AM
Converting PowerPoint Files into PDF Schmigula Computer Support 3 06-11-2004 09:22 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57