Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > simple robots.txt question

Reply
Thread Tools

simple robots.txt question

 
 
CRON
Guest
Posts: n/a
 
      07-24-2006
Hi,
How do i disallow all search engines access to:

http://www.scouttalk.ie/user.php?userID=1


where 1 in the above line can be any number?


Thanks a lot
Ciaran

 
Reply With Quote
 
 
 
 
jojo
Guest
Posts: n/a
 
      07-24-2006
CRON wrote:

> How do i disallow all search engines access to:
>
> http://www.scouttalk.ie/user.php?userID=1
>
>
> where 1 in the above line can be any number?
>


Put all the pages you want to save in one directory and disallow it for
the whole directory.
 
Reply With Quote
 
 
 
 
=?iso-8859-1?Q?Kim_Andr=E9_Aker=F8?=
Guest
Posts: n/a
 
      07-24-2006
CRON wrote:

> Hi,
> How do i disallow all search engines access to:
>
> http://www.scouttalk.ie/user.php?userID=1
>
>
> where 1 in the above line can be any number?


Closest thing would be:

User-agent: *
Disallow: /user.php

You can disallow single files or entire directories, but not specific
query strings.

http://www.robotstxt.org/wc/exclusion-admin.html

Keep in mind that the robots.txt file is usually followed by "good"
spiders, such as MSN, Google and Overture. It doesn't specifically
disallow access for search engines, it only serves as a suggestion to
the spiders what they should ignore on their journey; more of a "please
don't include these files/directories in your index".

Rogue spiders/bots might ignore your robots.txt file altogether or even
specifically go to the "disallowed" locations, just to grab exploitable
content.

--
Kim André Akerĝ
-
(remove NOSPAM to contact me directly)
 
Reply With Quote
 
CRON
Guest
Posts: n/a
 
      07-24-2006
OK thanks,
I guess I'll leave it out then. It's strange that it can't be done. Is
it possible in the page header code to tell the spiders to ignore it?
is there a meta tag maybe?

Cheers,
Ciaran

 
Reply With Quote
 
CRON
Guest
Posts: n/a
 
      07-24-2006
Found this:

<meta name="robots" content="noindex, nofollow">

Apparantly only a few robots support it. Anyone know which ones?

 
Reply With Quote
 
Nikita the Spider
Guest
Posts: n/a
 
      07-24-2006
In article < om>,
"CRON" <> wrote:

> Found this:
>
> <meta name="robots" content="noindex, nofollow">
>
> Apparantly only a few robots support it. Anyone know which ones?


Hi Cron,
What makes you say that only a few robots support it? I had always
assumed the opposite; that most robots support it. (Most decent ones,
anyway -- the same that would respect robots.txt.)

Just for the record, Nikita the Spider supports it. =)

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
 
Reply With Quote
 
Leonard Blaisdell
Guest
Posts: n/a
 
      07-25-2006
In article
<NikitaTheSpider-
m>,
Nikita the Spider <> wrote:

> What makes you say that only a few robots support it? I had always
> assumed the opposite; that most robots support it. (Most decent ones,
> anyway -- the same that would respect robots.txt.)


I don't think robots are that difficult to create. I seem to remember
that I saw how to create a rudimentary one in a Perl book. If I wanted
to mine information from the net and was unscrupulous, I certainly
wouldn't worry about robots.txt and configure the robot to look for what
I wanted.
I think there are a pile of robots you don't see looking at your site if
it's available through httpd.conf or .htaccess holes. But then again,
I'm often wrong.

leo

--
<http://web0.greatbasin.net/~leo/>
 
Reply With Quote
 
CRON
Guest
Posts: n/a
 
      07-25-2006

> What makes you say that only a few robots support it? I had always
> assumed the opposite; that most robots support it. (Most decent ones,
> anyway -- the same that would respect robots.txt.)


I saw it on http://www.robotstxt.org/wc/exclusion.html but i think its
mentioned in a few places. try a search for robots meta tag nofollow.

Ciaran

 
Reply With Quote
 
Nikita the Spider
Guest
Posts: n/a
 
      07-25-2006
In article < .com>,
"CRON" <> wrote:

> > What makes you say that only a few robots support it? I had always
> > assumed the opposite; that most robots support it. (Most decent ones,
> > anyway -- the same that would respect robots.txt.)

>
> I saw it on http://www.robotstxt.org/wc/exclusion.html but i think its
> mentioned in a few places. try a search for robots meta tag nofollow.


That page and all of the pages on robotstxt.org are very old. It is
still the closest thing there is to an authoritative standard, but only
because the standard hasn't changed much, not because the site's been
kept up to date.

Given that the majors (Google, Yahoo & friends) even support the
non-standard nofollow on individual links
(http://blog.searchenginewatch.com/blog/050118-20472, I think it is
safe to assume that they respect it when applied to the whole page.

Cheers

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
 
Reply With Quote
 
Nikita the Spider
Guest
Posts: n/a
 
      07-25-2006
In article <leo->,
Leonard Blaisdell <> wrote:

> In article
> <NikitaTheSpider-
> m>,
> Nikita the Spider <> wrote:
>
> > What makes you say that only a few robots support it? I had always
> > assumed the opposite; that most robots support it. (Most decent ones,
> > anyway -- the same that would respect robots.txt.)

>
> I don't think robots are that difficult to create. I seem to remember
> that I saw how to create a rudimentary one in a Perl book. If I wanted
> to mine information from the net and was unscrupulous, I certainly
> wouldn't worry about robots.txt and configure the robot to look for what
> I wanted.
>
> I think there are a pile of robots you don't see looking at your site if
> it's available through httpd.conf or .htaccess holes. But then again,
> I'm often wrong.


True, a quick and sloppy bot is not hard to create. But anyone looking
to use robots.txt or a META noindex/nofollow as security against
unscrupulous or sloppy bots is misguided, regardless of whether such
bots are numerous or few. I think (hope!) the OP understands that.
That's just not what robots.txt and noindex/nofollow were intended for:
dealing with evil or sloppy bots (or nosy human surfers for that matter)
is a job for other technology (like httpd.conf, as you suggest).

So, setting aside the issue that robots.txt doesn't do something it was
not intended to accomplish, it remains an effective way of controlling
well-behaved bots like Googlebot. (And Nikita!)

Cheers

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Plz, a simple answer to a simple question about IP addresses MeekiMoo Computer Support 0 07-28-2009 08:10 AM
Simple region code question... simple answer?? joseph.greer@gmail.com DVD Video 7 01-26-2007 09:07 PM
Simple Question - Simple Answer? Daniel Frey XML 4 01-12-2005 04:25 PM
Re: Simple Simple question!!! Kevin Spencer ASP .Net 0 06-25-2004 05:25 PM
Re: Simple Simple question!!! ashelley@inlandkwpp.com ASP .Net 0 06-25-2004 04:18 PM



Advertisments