Google and View as HTML

Discussion in 'Computer Support' started by null, Jan 24, 2007.

  1. null

    null Guest

    If you enter the search term "google.pdf" into google, the first listing
    return is the following.

    [PDF]
    The Anatomy of a Search Engine
    File Format: PDF/Adobe Acrobat - View as HTML
    The Anatomy of a Large-Scale Hypertextual. Web Search Engine. Sergey
    Brin and Lawrence Page. Computer Science Department,. Stanford
    University, Stanford ...
    infolab.stanford.edu/pub/papers/google.pdf - Similar pages

    Clicking on "View as HTML" brings up the pdf in html format. The url
    displayed is

    http://209.85.165.104/search?q=cach...oogle.pdf google.pdf&hl=en&gl=us&ct=clnk&cd=1

    My question is there a way to encode the location (on the web or on my
    computer) of an arbitrary pdf in the link above and have google convert
    it to html for me?
     
    null, Jan 24, 2007
    #1
    1. Advertising

  2. null

    Mike Easter Guest

    null wrote:
    > If you enter the search term "google.pdf" into google, the first
    > listing return is the following.
    >
    > [PDF]
    > The Anatomy of a Search Engine
    > File Format: PDF/Adobe Acrobat - View as HTML
    > The Anatomy of a Large-Scale Hypertextual. Web Search Engine. Sergey
    > Brin and Lawrence Page. Computer Science Department,. Stanford
    > University, Stanford ...
    > infolab.stanford.edu/pub/papers/google.pdf - Similar pages


    That's because your searchterm is the filename of the Stanford file, and
    the hit was of sufficient popularity to put it at the top of the google
    search engine's ranking.

    The second ranked google hit was a link to the Stanford .pdf file which
    is found in the google help pages.
    http://www.google.com/help/features.html File Types - Google has
    expanded the number of non-HTML file types searched to 12 file formats.
    In addition to PDF documents, Google now searches Microsoft Office,
    PostScript, Corel WordPerfect, Lotus 1-2-3, and others. The new file
    types will simply appear in Google search results whenever they are
    relevant to the user query. - Google also offers the user the ability to
    "View as HTML", allowing users to examine the contents of these file
    formats even if the corresponding application is not installed. The
    "View as HTML" option also allows users to avoid viruses which are
    sometimes carried in certain file formats

    > Clicking on "View as HTML" brings up the pdf in html format. The url
    > displayed is
    >
    >

    http://209.85.165.104/search?q=cach...oogle.pdf google.pdf&hl=en&gl=us&ct=clnk&cd=1
    >
    > My question is there a way to encode the location (on the web or on my
    > computer) of an arbitrary pdf in the link above and have google
    > convert it to html for me?


    If you can 'cause' the file which is located on the web to be found by
    the google search engine, the function described above will work for the
    file in question. If you cannot cause the file to be found by the
    google search engine, you should use some other way to convert it, such
    as Adobe's service
    http://www.adobe.com/products/acrobat/access_onlinetools.html Online
    conversion tools for Adobe PDF documents - Adobe PDF Conversion by Email
    Attachment - To convert an AdobeĀ® Portable Document Format (PDF) file to
    HTML or text, simply type a URL for an Adobe PDF document into this
    electronic form and select "Convert".




    --
    Mike Easter
     
    Mike Easter, Jan 24, 2007
    #2
    1. Advertising

  3. null

    null Guest

    >
    > That's because your searchterm is the filename of the Stanford file, and
    > the hit was of sufficient popularity to put it at the top of the google
    > search engine's ranking.
    >
    > The second ranked google hit was a link to the Stanford .pdf file which
    > is found in the google help pages.
    > http://www.google.com/help/features.html File Types - Google has
    > expanded the number of non-HTML file types searched to 12 file formats.
    > In addition to PDF documents, Google now searches Microsoft Office,
    > PostScript, Corel WordPerfect, Lotus 1-2-3, and others. The new file
    > types will simply appear in Google search results whenever they are
    > relevant to the user query. - Google also offers the user the ability to
    > "View as HTML", allowing users to examine the contents of these file
    > formats even if the corresponding application is not installed. The
    > "View as HTML" option also allows users to avoid viruses which are
    > sometimes carried in certain file formats
    >


    Ok. But I did not have a question about what or why something was
    returned to google.

    >> Clicking on "View as HTML" brings up the pdf in html format. The url
    >> displayed is
    >>
    >>

    > http://209.85.165.104/search?q=cach...oogle.pdf google.pdf&hl=en&gl=us&ct=clnk&cd=1
    >> My question is there a way to encode the location (on the web or on my
    >> computer) of an arbitrary pdf in the link above and have google
    >> convert it to html for me?

    >
    > If you can 'cause' the file which is located on the web to be found by
    > the google search engine, the function described above will work for the
    > file in question. If you cannot cause the file to be found by the
    > google search engine, you should use some other way to convert it, such
    > as Adobe's service
    > http://www.adobe.com/products/acrobat/access_onlinetools.html Online
    > conversion tools for Adobe PDF documents - Adobe PDF Conversion by Email
    > Attachment - To convert an AdobeĀ® Portable Document Format (PDF) file to
    > HTML or text, simply type a URL for an Adobe PDF document into this
    > electronic form and select "Convert".
    >
    >


    These arbitrary pdf files I mentioned are actually files that I download
    with a C++ program from various sites. I want them in html and/or text
    so that I can extract data from them. I wanted to use google as my
    conversion engine, but from your response I will not be able to do this.
    I was trying to avoid buying a utility program like the ones at
    pdf2text.com. Sourceforge has an interesting open source project named
    PoDoFo. This library will *probably* do what I want, but it will be
    like using a sledge hammer to drive a thumbtack.
     
    null, Jan 24, 2007
    #3
  4. null

    Mike Easter Guest

    null wrote:
    <my cite>
    >> That's because your searchterm is the filename of the Stanford file,


    > Ok. But I did not have a question about what or why something was
    > returned to google.


    You made a statement to which I corresponded:

    ---------
    Mike Easter wrote:
    > null wrote:
    >> If you enter the search term "google.pdf" into google, the first
    >> listing return is the following.


    > That's because your searchterm is the filename of the Stanford file,
    > and the hit was of sufficient popularity to put it at the top of the
    > google search engine's ranking.

    ---------

    Then you asked a question:

    >>> My question is there a way to encode the location (on the web or on
    >>> my computer) of an arbitrary pdf in the link above and have google
    >>> convert it to html for me?


    .... which I answered.

    >> If you can 'cause' the file which is located on the web to be found
    >> by the google search engine,


    and then I further told you how to use the free Adobe online converter
    and gave a link to it

    > These arbitrary pdf files I mentioned are actually files that I
    > download with a C++ program from various sites. I want them in html
    > and/or text so that I can extract data from them.


    The Adobe online/email converter would do that.

    > I wanted to use
    > google as my conversion engine,


    I haven't found google's conversion engine to be better than Adobe's.

    > I was trying to avoid buying a utility program
    > like the ones at pdf2text.com.


    That's why I pointed you to an online converter.


    --
    Mike Easter
     
    Mike Easter, Jan 24, 2007
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sandman

    Cannot view html in email

    sandman, Feb 7, 2006, in forum: Firefox
    Replies:
    4
    Views:
    3,374
  2. paris

    view (html) source problem

    paris, Jan 4, 2004, in forum: Computer Support
    Replies:
    3
    Views:
    495
  3. lbbss
    Replies:
    6
    Views:
    631
    Whiskers
    Sep 29, 2005
  4. Replies:
    3
    Views:
    469
    Ron Hunter
    Sep 29, 2007
  5. Monima
    Replies:
    0
    Views:
    1,864
    Monima
    Dec 14, 2010
Loading...

Share This Page