"William Tasso" <(E-Mail Removed)> wrote in message
> Denise Enck wrote:
> > "David Graham" <(E-Mail Removed)> wrote in message
> > news:Nd_La.9865$(E-Mail Removed)...
> >> I can't follow most of this thread, could you very simply, in
> >> non-technical jargon, just confirm if robots.txt is any good or not!
> > sure, it is good. It will keep spiders out of directories you don't
> > want crawled,
> and point the way for misbehaving crawlers at the same time.
> William Tasso - http://www.WilliamTasso.com
not necessarily. there are numerous ways to block 'bad' spiders ~~
"PeterMcC" <(E-Mail Removed)> wrote in message
> And, if you really want to be safe, you could always password protect the
> directory with .htaccess - dead easy and the spiders don't get past the
> password protect.
Thanks to everyone. I wonder if you could give me a few pointers to how I go
about password protecting using .htaccess
> Again there is a risk of ambiguity here, http://www.user.host.com
> should be labeled as a "sub-domain", it's not registered anywhere
> and it's not portable, so you certainly can not call it "owning a
There is no ambiguity here. The domain host.com exists. The domain
user.host.com currently does not exist. Calling some domains subdomains
has no relevance to this, or to our topic. Either a domain name exists
or it does not, on the Internet, according to domain name servers. And
this has little to do with robots.txt.
>>In that particular case, it simply depends on
>>http://www.host.com/.htaccess, which does not currently exist.
> I don't see how the robots.txt convention relates to Apache
> .htaccess files.
>> Again there is a risk of ambiguity here, http://www.user.host.com
>> should be labeled as a "sub-domain", it's not registered anywhere
>> and it's not portable, so you certainly can not call it "owning a
>There is no ambiguity here. The domain host.com exists. The domain
>user.host.com currently does not exist. Calling some domains subdomains
>has no relevance to this, or to our topic. Either a domain name exists
>or it does not, on the Internet, according to domain name servers. And
>this has little to do with robots.txt.
http://www.user.host.com resolves to host.com which then resolves the
prefix "user" locally. The relevance to this robots.txt thread is that
you are using incorrect terminology by referring to "servers". This
needs to be replaced by "(sub)domain", the "sub" prefix is needed to
prevent ambiguity as most people would (rightly) interpret "domain" as
"a registered domain". As demonstrated the usage of robots.txt is not
restricted to registered domains.
You've failed to explain your claim of a relation between Apache
..htaccess config files and the robots.txt convention.
>> You have not provided any evidence that Atomz does not follow the
>> correct procedure for retrieving a robots.txt.
>It was you who wrote an objection, based on a claim on Atomz behavior,
>to my statement that said that robots.txt must reside on the server
Indeed, I was correct, and you accused Atomz from not following the
rules, which is incorrect.
>> (all my sites use
>> http://www.user.host.com urls).
>Too bad then. They do not work until you get that domain registered.
> Same thing since http://www.host.com/~user is the only format where a
> robots.txt cannot be used by the user.
Been working for me.
If a turtle doesn't have a shell, is he/she homeless or naked?