Is my robots.txt file set up properly?

Discussion in 'Computer Support' started by Evan Platt, Sep 20, 2007.

  1. Evan Platt

    Evan Platt Guest

    http://www.espphotography.com/robots.txt is where it's at...

    # ls -al robots.txt
    4 -rw-r--r-- 1 root admin 27 May 22 10:10 robots.txt

    It's been the same:
    # cat robots.txt
    User-agent: *
    Disallow: /

    since May 22nd.

    I'm getting HAMMERED with robots.txt requests...

    If I grep my access_log for a IP requesting robots.txt, then grep for
    that IP:

    # grep 74.6.19.115 /var/log/httpd/access_log
    74.6.19.115 - - [13/Sep/2007:12:54:04 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [13/Sep/2007:14:44:30 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [14/Sep/2007:02:45:37 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [14/Sep/2007:19:10:57 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [15/Sep/2007:06:35:38 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [15/Sep/2007:15:34:40 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [16/Sep/2007:13:55:54 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [17/Sep/2007:17:48:55 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [18/Sep/2007:20:56:39 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [19/Sep/2007:07:19:48 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [20/Sep/2007:00:05:23 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"
    74.6.19.115 - - [20/Sep/2007:04:52:21 -0700] "GET /robots.txt
    HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    http://help.yahoo.com/help/us/ysearch/slurp)"

    What, does disallow only work for a day?

    Am I missing something?

    Any help appreciated.

    Thanks.
    --
    To reply via e-mail, remove The Obvious from my e-mail address.
    Evan Platt, Sep 20, 2007
    #1
    1. Advertising

  2. Evan Platt

    why? Guest

    On Thu, 20 Sep 2007 09:38:25 -0700, Evan Platt wrote:

    >
    >http://www.espphotography.com/robots.txt is where it's at...
    >
    ># ls -al robots.txt
    >4 -rw-r--r-- 1 root admin 27 May 22 10:10 robots.txt
    >
    >It's been the same:
    ># cat robots.txt
    >User-agent: *
    >Disallow: /
    >
    >since May 22nd.
    >
    >I'm getting HAMMERED with robots.txt requests...
    >
    >If I grep my access_log for a IP requesting robots.txt, then grep for
    >that IP:
    >
    ># grep 74.6.19.115 /var/log/httpd/access_log
    >74.6.19.115 - - [13/Sep/2007:12:54:04 -0700] "GET /robots.txt
    >HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
    >http://help.yahoo.com/help/us/ysearch/slurp)"


    <snip>

    Maybe something like
    http://www.clockwatchers.com/robots_bad.html
    robots.txt Tutorial - Block Bad Bots

    Some bots will ignore robots.txt files as they don't care if you want
    them on your web site or not.

    These can be blocked by using a .htaccess file instead.


    Another site also mentions blocking a named bot, as above.
    http://www.thesitewizard.com/archive/robotstxt.shtml
    If you have a particular robot in mind, such as the picsearch robot, you
    may have lines like the following:

    User-agent: psbot
    Disallow: /

    This means that the picsearch robot, "psbot", should not try to access
    any file in the root directory "/" and all its subdirectories. This
    effectively means that psbot is banned from the entire of your website.

    You can have multiple Disallow lines for each user agent (ie, for each
    spider). Here is an example of a longer robots.txt file:


    Maybe adding the meta tag to pages automatically when requested?
    http://www.askapache.com/seo/updated-robotstxt-for-wordpress.html
    Robots Meta Tag


    HTH
    Me
    why?, Sep 20, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Scot Gardner
    Replies:
    0
    Views:
    1,199
    Scot Gardner
    Sep 2, 2003
  2. Allan
    Replies:
    0
    Views:
    420
    Allan
    Mar 10, 2005
  3. Doug MacLean
    Replies:
    0
    Views:
    542
    Doug MacLean
    Jun 21, 2005
  4. DVD Verdict
    Replies:
    0
    Views:
    451
    DVD Verdict
    Sep 27, 2005
  5. Thor

    one of the MS robots has a sense of humor

    Thor, Dec 1, 2003, in forum: Computer Information
    Replies:
    13
    Views:
    650
Loading...

Share This Page