Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Computing > Computer Support > Google and View as HTML

Reply
Thread Tools

Google and View as HTML

 
 
null
Guest
Posts: n/a
 
      01-24-2007
If you enter the search term "google.pdf" into google, the first listing
return is the following.

[PDF]
The Anatomy of a Search Engine
File Format: PDF/Adobe Acrobat - View as HTML
The Anatomy of a Large-Scale Hypertextual. Web Search Engine. Sergey
Brin and Lawrence Page. Computer Science Department,. Stanford
University, Stanford ...
infolab.stanford.edu/pub/papers/google.pdf - Similar pages

Clicking on "View as HTML" brings up the pdf in html format. The url
displayed is

http://209.85.165.104/search?q=cache...s&ct=clnk&cd=1

My question is there a way to encode the location (on the web or on my
computer) of an arbitrary pdf in the link above and have google convert
it to html for me?


 
Reply With Quote
 
 
 
 
Mike Easter
Guest
Posts: n/a
 
      01-24-2007
null wrote:
> If you enter the search term "google.pdf" into google, the first
> listing return is the following.
>
> [PDF]
> The Anatomy of a Search Engine
> File Format: PDF/Adobe Acrobat - View as HTML
> The Anatomy of a Large-Scale Hypertextual. Web Search Engine. Sergey
> Brin and Lawrence Page. Computer Science Department,. Stanford
> University, Stanford ...
> infolab.stanford.edu/pub/papers/google.pdf - Similar pages


That's because your searchterm is the filename of the Stanford file, and
the hit was of sufficient popularity to put it at the top of the google
search engine's ranking.

The second ranked google hit was a link to the Stanford .pdf file which
is found in the google help pages.
http://www.google.com/help/features.html File Types - Google has
expanded the number of non-HTML file types searched to 12 file formats.
In addition to PDF documents, Google now searches Microsoft Office,
PostScript, Corel WordPerfect, Lotus 1-2-3, and others. The new file
types will simply appear in Google search results whenever they are
relevant to the user query. - Google also offers the user the ability to
"View as HTML", allowing users to examine the contents of these file
formats even if the corresponding application is not installed. The
"View as HTML" option also allows users to avoid viruses which are
sometimes carried in certain file formats

> Clicking on "View as HTML" brings up the pdf in html format. The url
> displayed is
>
>

http://209.85.165.104/search?q=cache...s&ct=clnk&cd=1
>
> My question is there a way to encode the location (on the web or on my
> computer) of an arbitrary pdf in the link above and have google
> convert it to html for me?


If you can 'cause' the file which is located on the web to be found by
the google search engine, the function described above will work for the
file in question. If you cannot cause the file to be found by the
google search engine, you should use some other way to convert it, such
as Adobe's service
http://www.adobe.com/products/acroba...linetools.html Online
conversion tools for Adobe PDF documents - Adobe PDF Conversion by Email
Attachment - To convert an Adobe® Portable Document Format (PDF) file to
HTML or text, simply type a URL for an Adobe PDF document into this
electronic form and select "Convert".




--
Mike Easter

 
Reply With Quote
 
 
 
 
null
Guest
Posts: n/a
 
      01-24-2007
>
> That's because your searchterm is the filename of the Stanford file, and
> the hit was of sufficient popularity to put it at the top of the google
> search engine's ranking.
>
> The second ranked google hit was a link to the Stanford .pdf file which
> is found in the google help pages.
> http://www.google.com/help/features.html File Types - Google has
> expanded the number of non-HTML file types searched to 12 file formats.
> In addition to PDF documents, Google now searches Microsoft Office,
> PostScript, Corel WordPerfect, Lotus 1-2-3, and others. The new file
> types will simply appear in Google search results whenever they are
> relevant to the user query. - Google also offers the user the ability to
> "View as HTML", allowing users to examine the contents of these file
> formats even if the corresponding application is not installed. The
> "View as HTML" option also allows users to avoid viruses which are
> sometimes carried in certain file formats
>


Ok. But I did not have a question about what or why something was
returned to google.

>> Clicking on "View as HTML" brings up the pdf in html format. The url
>> displayed is
>>
>>

> http://209.85.165.104/search?q=cache...s&ct=clnk&cd=1
>> My question is there a way to encode the location (on the web or on my
>> computer) of an arbitrary pdf in the link above and have google
>> convert it to html for me?

>
> If you can 'cause' the file which is located on the web to be found by
> the google search engine, the function described above will work for the
> file in question. If you cannot cause the file to be found by the
> google search engine, you should use some other way to convert it, such
> as Adobe's service
> http://www.adobe.com/products/acroba...linetools.html Online
> conversion tools for Adobe PDF documents - Adobe PDF Conversion by Email
> Attachment - To convert an Adobe® Portable Document Format (PDF) file to
> HTML or text, simply type a URL for an Adobe PDF document into this
> electronic form and select "Convert".
>
>


These arbitrary pdf files I mentioned are actually files that I download
with a C++ program from various sites. I want them in html and/or text
so that I can extract data from them. I wanted to use google as my
conversion engine, but from your response I will not be able to do this.
I was trying to avoid buying a utility program like the ones at
pdf2text.com. Sourceforge has an interesting open source project named
PoDoFo. This library will *probably* do what I want, but it will be
like using a sledge hammer to drive a thumbtack.


 
Reply With Quote
 
Mike Easter
Guest
Posts: n/a
 
      01-24-2007
null wrote:
<my cite>
>> That's because your searchterm is the filename of the Stanford file,


> Ok. But I did not have a question about what or why something was
> returned to google.


You made a statement to which I corresponded:

---------
Mike Easter wrote:
> null wrote:
>> If you enter the search term "google.pdf" into google, the first
>> listing return is the following.


> That's because your searchterm is the filename of the Stanford file,
> and the hit was of sufficient popularity to put it at the top of the
> google search engine's ranking.

---------

Then you asked a question:

>>> My question is there a way to encode the location (on the web or on
>>> my computer) of an arbitrary pdf in the link above and have google
>>> convert it to html for me?


.... which I answered.

>> If you can 'cause' the file which is located on the web to be found
>> by the google search engine,


and then I further told you how to use the free Adobe online converter
and gave a link to it

> These arbitrary pdf files I mentioned are actually files that I
> download with a C++ program from various sites. I want them in html
> and/or text so that I can extract data from them.


The Adobe online/email converter would do that.

> I wanted to use
> google as my conversion engine,


I haven't found google's conversion engine to be better than Adobe's.

> I was trying to avoid buying a utility program
> like the ones at pdf2text.com.


That's why I pointed you to an online converter.


--
Mike Easter

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Google's: How to change from Summary view to Title view? lbbss Computer Support 6 09-29-2005 06:42 PM
HTML code warnings in asp.net html code view Craig Kenisston ASP .Net 3 10-07-2004 04:05 PM
Web Controls switching from Design view to HTML view causing problems Ziyad Makki ASP .Net Web Controls 1 08-23-2004 07:13 PM
How to make a week view and day view calendar just like month view calendar in .NET ? Parthiv Joshi ASP .Net Web Controls 1 07-06-2004 03:15 PM
Wierd error when going to Design View from HTML view VB Programmer ASP .Net 1 07-10-2003 03:20 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57