Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Computing > Computer Support > Is my robots.txt file set up properly?

Reply
Thread Tools

Is my robots.txt file set up properly?

 
 
Evan Platt
Guest
Posts: n/a
 
      09-20-2007

http://www.espphotography.com/robots.txt is where it's at...

# ls -al robots.txt
4 -rw-r--r-- 1 root admin 27 May 22 10:10 robots.txt

It's been the same:
# cat robots.txt
User-agent: *
Disallow: /

since May 22nd.

I'm getting HAMMERED with robots.txt requests...

If I grep my access_log for a IP requesting robots.txt, then grep for
that IP:

# grep 74.6.19.115 /var/log/httpd/access_log
74.6.19.115 - - [13/Sep/2007:12:54:04 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [13/Sep/2007:14:44:30 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [14/Sep/2007:02:45:37 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [14/Sep/2007:19:10:57 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [15/Sep/2007:06:35:38 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [15/Sep/2007:15:34:40 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [16/Sep/2007:13:55:54 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [17/Sep/2007:17:48:55 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [18/Sep/2007:20:56:39 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [19/Sep/2007:07:19:48 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [20/Sep/2007:00:05:23 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
74.6.19.115 - - [20/Sep/2007:04:52:21 -0700] "GET /robots.txt
HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"

What, does disallow only work for a day?

Am I missing something?

Any help appreciated.

Thanks.
--
To reply via e-mail, remove The Obvious from my e-mail address.
 
Reply With Quote
 
 
 
 
why?
Guest
Posts: n/a
 
      09-20-2007

On Thu, 20 Sep 2007 09:38:25 -0700, Evan Platt wrote:

>
>http://www.espphotography.com/robots.txt is where it's at...
>
># ls -al robots.txt
>4 -rw-r--r-- 1 root admin 27 May 22 10:10 robots.txt
>
>It's been the same:
># cat robots.txt
>User-agent: *
>Disallow: /
>
>since May 22nd.
>
>I'm getting HAMMERED with robots.txt requests...
>
>If I grep my access_log for a IP requesting robots.txt, then grep for
>that IP:
>
># grep 74.6.19.115 /var/log/httpd/access_log
>74.6.19.115 - - [13/Sep/2007:12:54:04 -0700] "GET /robots.txt
>HTTP/1.0" 200 27 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
>http://help.yahoo.com/help/us/ysearch/slurp)"


<snip>

Maybe something like
http://www.clockwatchers.com/robots_bad.html
robots.txt Tutorial - Block Bad Bots

Some bots will ignore robots.txt files as they don't care if you want
them on your web site or not.

These can be blocked by using a .htaccess file instead.


Another site also mentions blocking a named bot, as above.
http://www.thesitewizard.com/archive/robotstxt.shtml
If you have a particular robot in mind, such as the picsearch robot, you
may have lines like the following:

User-agent: psbot
Disallow: /

This means that the picsearch robot, "psbot", should not try to access
any file in the root directory "/" and all its subdirectories. This
effectively means that psbot is banned from the entire of your website.

You can have multiple Disallow lines for each user agent (ie, for each
spider). Here is an example of a longer robots.txt file:


Maybe adding the meta tag to pages automatically when requested?
http://www.askapache.com/seo/updated...wordpress.html
Robots Meta Tag


HTH
Me
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to create a node set of elements through C++ code while executing a style sheet and process the created node set. pvssvikas@gmail.com XML 0 01-25-2006 12:48 PM
Treeview questions - how to set set start node and how to catch click event Alan Silver ASP .Net 0 12-21-2005 10:40 AM
MissingSourceFile in <controller not set>#<action not set> - No such file to load -- /config/routes.rb Seth Rasmussen Ruby 3 09-17-2005 12:13 AM
java.lang.Set with elements of type java.lang.Set Harald Kirsch Java 4 08-31-2004 10:40 AM
Unable to set focus to textfield in a applet if browser is set to Sun JRE 1.4 Manav Java 0 10-15-2003 03:42 PM



Advertisments