Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > robots.txt

Reply
Thread Tools

robots.txt

 
 
Paul Furman
Guest
Posts: n/a
 
      01-04-2007
I've had this web site up for years and I don't know maybe a couple
years ago i added a more advanced php driven system under edgehill.net/1
many of the images are annotated or at least the title gives an
indication of the location and date. I just did a test and was able to
find a keyword under the new /1 section but google images doesn't see it
at all. The problem I think is the galleries are set up as subfolders
and the php files format the gallery so I've got all these indexless
folders which are basically garbage for a web browser:

What should look like this:
<http://www.edgehill.net/1/?SC=go.php&DIR=California/Bay-Area/Oakland/2005-11-05-pinehurst>
Is indexed like this:
<http://www.edgehill.net/1/California/Bay-Area/Oakland/2005-11-05-pinehurst/>

On my other baynatives site I added a line in the longest irrelevant
page to prevent indexing that like such:
<meta name='googlebot' content='noarchive, noindex'>
and that works like a charm but the edgehill.net site as far as I know
does not directly point to these nested content folder except in the php
address [?SC=go.php&DIR=]

I'm not opposed to people viewing raw directories, sometimes that's
useful to point to an image or file without all the formatting but I
don't want them indexed in search engines.

Any Advice?

--
Paul Furman
Bay Natives Nursery
http://www.baynatives.com
Photography
http://www.edgehill.net/1
(415) 722-6037
 
Reply With Quote
 
 
 
 
Paul Furman
Guest
Posts: n/a
 
      01-04-2007
Seems to be a .htaccess issue, I added this line:

Options -Indexes

Although I'd prefer to simply prevent search engines from indexing. I
probably have some folders in there which are now inacessible.

Paul Furman wrote:

> I've had this web site up for years and I don't know maybe a couple
> years ago i added a more advanced php driven system under edgehill.net/1
> many of the images are annotated or at least the title gives an
> indication of the location and date. I just did a test and was able to
> find a keyword under the new /1 section but google images doesn't see it
> at all. The problem I think is the galleries are set up as subfolders
> and the php files format the gallery so I've got all these indexless
> folders which are basically garbage for a web browser:
>
> What should look like this:
> <http://www.edgehill.net/1/?SC=go.php&DIR=California/Bay-Area/Oakland/2005-11-05-pinehurst>
>
> Is indexed like this:
> <http://www.edgehill.net/1/California/Bay-Area/Oakland/2005-11-05-pinehurst/>
>
>
> On my other baynatives site I added a line in the longest irrelevant
> page to prevent indexing that like such:
> <meta name='googlebot' content='noarchive, noindex'>
> and that works like a charm but the edgehill.net site as far as I know
> does not directly point to these nested content folder except in the php
> address [?SC=go.php&DIR=]
>
> I'm not opposed to people viewing raw directories, sometimes that's
> useful to point to an image or file without all the formatting but I
> don't want them indexed in search engines.
>
> Any Advice?
>

 
Reply With Quote
 
 
 
 
Bergamot
Guest
Posts: n/a
 
      01-04-2007
Paul Furman wrote:
>
> On my other baynatives site I added a line in the longest irrelevant
> page to prevent indexing that like such:
> <meta name='googlebot' content='noarchive, noindex'>


Your subject line indicates you are asking about robots.txt, but you
haven't mentioned it in either of your messages.
http://www.robotstxt.org/wc/robots.html

--
Berg
 
Reply With Quote
 
Paul Furman
Guest
Posts: n/a
 
      01-05-2007
Bergamot wrote:

> Paul Furman wrote:
>
>>On my other baynatives site I added a line in the longest irrelevant
>>page to prevent indexing that like such:
>><meta name='googlebot' content='noarchive, noindex'>

>
>
> Your subject line indicates you are asking about robots.txt, but you
> haven't mentioned it in either of your messages.
> http://www.robotstxt.org/wc/robots.html
>


I used a googlebot metadata in the one instance.

The best I can figure robots.txt requires you to list each directory
that's forbidden and I've got a huge list or directories that grows
weekly. I was hoping for a robot command that forbids indexing indexless
directories.

Do you know if robots.txt is effected by php calls? All the pages appear
to be located in edgehill.net/1/ but they are really in many deeply
nested subdirectories.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57