Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > robot.txt

Reply
Thread Tools

robot.txt

 
 
David Graham
Guest
Posts: n/a
 
      06-28-2003
Hi
I have a folder on my site that I use to practice on, I don't want robots
indexing this folder. I believe the meta tag is not as good as a robot.txt
file. I would like to use a robot.txt file but...

1. What is the syntax of the line that I write to prevent access to a folder
(the folder is called 'sefriendly' and it lives off the root folder which is
called 'www'

2. In which folder is the robot.txt file stored?

thanks

David


 
Reply With Quote
 
 
 
 
PeterMcC
Guest
Posts: n/a
 
      06-28-2003
David Graham wrote:
> Hi
> I have a folder on my site that I use to practice on, I don't want
> robots indexing this folder. I believe the meta tag is not as good as
> a robot.txt file. I would like to use a robot.txt file but...
>
> 1. What is the syntax of the line that I write to prevent access to a
> folder (the folder is called 'sefriendly' and it lives off the root
> folder which is called 'www'


User-agent: *
Disallow: /sefriendly/

> 2. In which folder is the robot.txt file stored?

in your root - in your case, www - folder

There's lots of info at:
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi
And a script that checks your robot.txt file

--
PeterMcC
If you feel that any of the above is incorrect,
inappropriate or offensive in any way,
please ignore it and accept my apologies.
 
Reply With Quote
 
 
 
 
David Graham
Guest
Posts: n/a
 
      06-28-2003

"PeterMcC" <(E-Mail Removed)> wrote in message
news:uweLa.44927$(E-Mail Removed)9.net...
> David Graham wrote:
> > Hi
> > I have a folder on my site that I use to practice on, I don't want
> > robots indexing this folder. I believe the meta tag is not as good as
> > a robot.txt file. I would like to use a robot.txt file but...
> >
> > 1. What is the syntax of the line that I write to prevent access to a
> > folder (the folder is called 'sefriendly' and it lives off the root
> > folder which is called 'www'

>
> User-agent: *
> Disallow: /sefriendly/
>
> > 2. In which folder is the robot.txt file stored?

> in your root - in your case, www - folder
>
> There's lots of info at:
> http://www.searchengineworld.com/cgi-bin/robotcheck.cgi
> And a script that checks your robot.txt file


Thanks for the link

David


 
Reply With Quote
 
David Graham
Guest
Posts: n/a
 
      06-28-2003

"PeterMcC" <(E-Mail Removed)> wrote in message
news:uweLa.44927$(E-Mail Removed)9.net...
> David Graham wrote:
> > Hi
> > I have a folder on my site that I use to practice on, I don't want
> > robots indexing this folder. I believe the meta tag is not as good as
> > a robot.txt file. I would like to use a robot.txt file but...
> >
> > 1. What is the syntax of the line that I write to prevent access to a
> > folder (the folder is called 'sefriendly' and it lives off the root
> > folder which is called 'www'

>
> User-agent: *
> Disallow: /sefriendly/
>


I put the robot.txt file into the www folder containing the two lines above
(exactly as you indicate i.e. on two lines) but I can still visit the site
using IE6. I thought those two lines ban access from all UA's. I have
cleared out my browsers cache in case that was what I was viewing, but that
made no difference. I will read up on this subject, but could you point out
were my thinking is a bit off here. Does the robot.txt file just ban spiders
and not browsers?

TIA
David


 
Reply With Quote
 
PeterMcC
Guest
Posts: n/a
 
      06-28-2003
David Graham wrote:
> "PeterMcC" <(E-Mail Removed)> wrote in message
> news:uweLa.44927$(E-Mail Removed)9.net...
>> David Graham wrote:
>>> Hi
>>> I have a folder on my site that I use to practice on, I don't want
>>> robots indexing this folder. I believe the meta tag is not as good
>>> as a robot.txt file. I would like to use a robot.txt file but...
>>>
>>> 1. What is the syntax of the line that I write to prevent access to
>>> a folder (the folder is called 'sefriendly' and it lives off the
>>> root folder which is called 'www'

>>
>> User-agent: *
>> Disallow: /sefriendly/
>>

>
> I put the robot.txt file into the www folder containing the two lines
> above (exactly as you indicate i.e. on two lines) but I can still
> visit the site using IE6. I thought those two lines ban access from
> all UA's. I have cleared out my browsers cache in case that was what
> I was viewing, but that made no difference. I will read up on this
> subject, but could you point out were my thinking is a bit off here.
> Does the robot.txt file just ban spiders and not browsers?


Just spiders.

--
PeterMcC
If you feel that any of the above is incorrect,
inappropriate or offensive in any way,
please ignore it and accept my apologies.
 
Reply With Quote
 
PeterMcC
Guest
Posts: n/a
 
      06-28-2003
PeterMcC wrote:
> David Graham wrote:

<snip>
>> I put the robot.txt file into the www folder containing the two lines
>> above (exactly as you indicate i.e. on two lines) but I can still
>> visit the site using IE6. I thought those two lines ban access from
>> all UA's. I have cleared out my browsers cache in case that was what
>> I was viewing, but that made no difference. I will read up on this
>> subject, but could you point out were my thinking is a bit off here.
>> Does the robot.txt file just ban spiders and not browsers?

>
> Just spiders.


BTW - if you don't have a link to a page, it won't get spidered because the
spider only follows links.

If you want to have links to the page but don't want it spidering or seeing
by others, use .htaccess to password protect the directory that holds the
page.

HTH
--
PeterMcC
If you feel that any of the above is incorrect,
inappropriate or offensive in any way,
please ignore it and accept my apologies.

 
Reply With Quote
 
Jacqui or (maybe) Pete
Guest
Posts: n/a
 
      06-28-2003
In article <U%gLa.1981$(E-Mail Removed)>,
http://www.velocityreviews.com/forums/(E-Mail Removed) says...
>
> "PeterMcC" <(E-Mail Removed)> wrote in message
> news:uweLa.44927$(E-Mail Removed)9.net...
> > David Graham wrote:


> > > I have a folder on my site that I use to practice on, I don't want
> > > robots indexing this folder. I believe the meta tag is not as good as
> > > a robot.txt file. I would like to use a robot.txt file but...

....
> > User-agent: *
> > Disallow: /sefriendly/
> >

....
> Does the robot.txt file just ban spiders
> and not browsers?
>

Correct.
 
Reply With Quote
 
David Graham
Guest
Posts: n/a
 
      06-28-2003

"PeterMcC" <(E-Mail Removed)> wrote in message
news:SBhLa.44961$(E-Mail Removed)9.net...
> PeterMcC wrote:
> > David Graham wrote:

> <snip>
> >> I put the robot.txt file into the www folder containing the two lines
> >> above (exactly as you indicate i.e. on two lines) but I can still
> >> visit the site using IE6. I thought those two lines ban access from
> >> all UA's. I have cleared out my browsers cache in case that was what
> >> I was viewing, but that made no difference. I will read up on this
> >> subject, but could you point out were my thinking is a bit off here.
> >> Does the robot.txt file just ban spiders and not browsers?

> >
> > Just spiders.

>
> BTW - if you don't have a link to a page, it won't get spidered because

the
> spider only follows links.
>
> If you want to have links to the page but don't want it spidering or

seeing
> by others, use .htaccess to password protect the directory that holds the
> page.
>
> HTH
> --
> PeterMcC
> If you feel that any of the above is incorrect,
> inappropriate or offensive in any way,
> please ignore it and accept my apologies.


Thanks for the help. I have one more question. Google indexed one of my
practice sites, before I had a chance to use a robot.txt file. Do you know
how long it will be before Google deletes the cached version of this site
which I never intended to be indexed. The reason I ask is because the
unwanted site is competing in the search results with the site which I want
to be indexed (the unwanted site is doing better than the wanted site - I
have not yet got round to making my main site more optimised for search
engines)

TIA
David


 
Reply With Quote
 
Denise Enck
Guest
Posts: n/a
 
      06-28-2003
"David Graham" <(E-Mail Removed)> wrote in message
news:n6eLa.339$(E-Mail Removed)...
> Hi
> I have a folder on my site that I use to practice on, I don't want robots
> indexing this folder. I believe the meta tag is not as good as a robot.txt
> file. I would like to use a robot.txt file but...
>
> 1. What is the syntax of the line that I write to prevent access to a

folder
> (the folder is called 'sefriendly' and it lives off the root folder which

is
> called 'www'
>
> 2. In which folder is the robot.txt file stored?
>
> thanks
>
> David
>



the file should be called robots.txt rather than robot.txt else it won't
keep any spiders out ~

Denise


 
Reply With Quote
 
David Graham
Guest
Posts: n/a
 
      06-28-2003

"Denise Enck" <(E-Mail Removed)> wrote in message
news:tQiLa.69023$(E-Mail Removed) thlink.net...
> "David Graham" <(E-Mail Removed)> wrote in message
> news:n6eLa.339$(E-Mail Removed)...
> > Hi
> > I have a folder on my site that I use to practice on, I don't want

robots
> > indexing this folder. I believe the meta tag is not as good as a

robot.txt
> > file. I would like to use a robot.txt file but...
> >
> > 1. What is the syntax of the line that I write to prevent access to a

> folder
> > (the folder is called 'sefriendly' and it lives off the root folder

which
> is
> > called 'www'
> >
> > 2. In which folder is the robot.txt file stored?
> >
> > thanks
> >
> > David
> >

>
>
> the file should be called robots.txt rather than robot.txt else it won't
> keep any spiders out ~
>
> Denise
>

Thanks loads - didn't know it had to have the the 's' on the name

David


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments