Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   HTML (http://www.velocityreviews.com/forums/f31-html.html)
-   -   Checking links and robots. (http://www.velocityreviews.com/forums/t159059-checking-links-and-robots.html)

Luigi Donatello Asero 08-13-2004 09:49 AM

Checking links and robots.
 
I tried to check the links of some pages of the website
http://www.scaiecat-spa-gigi.com and I got this message
http://validator.w3.org/checklink?ur...h=&check=Check
As far as I remember I have not set any robots.txt .
Is robots.txt on the validator?

--
Luigi ( un italiano che vive in Svezia)
http://www.italymap.dk
http://www.scaiecat-spa-gigi.com/sv/boendeiitalien.html






Jukka K. Korpela 08-13-2004 10:25 AM

Re: Checking links and robots.
 
"Luigi Donatello Asero" <jaggillarfotboll@telia.com> wrote:

> I tried to check the links of some pages of the website
> http://www.scaiecat-spa-gigi.com and I got this message


I guess the relevant part of the message page you got is this:

"The link was not checked due to robots exclusion rules. Check the link
manually, and see also the link checker documentation on robots
exclusion."

for two URLs. It misleadingly appears under the heading "List of broken
links and redirects" - it means that the link checker _did not check_
those links, so it cannot know whether they are broken or redirected or
just fine.

> As far as
> I remember I have not set any robots.txt .


You don't. The URL http://www.scaiecat-spa-gigi.com/robots.txt
does not refer to anything; and that's the URL that any well-behaving
robot checks first, before fetching anything from your site - if the
resource does not exist, the robot assumes it's welcome. (You would use
robots.txt to _exclude_ robots if you wanted to.)

> Is robots.txt on the validator?


Yes. And elsewhere.

The link checker is presumably a well-behaving robot. This means that
before checking links pointing to a site, it first checks for robots.txt
at the site pointed to. Thus, when you have a link with href value
<http://validator.w3.org/check?uri=http%3A%2F%2Fwww.
scaiecat-spa-gigi.com%2Fit%2Fsvezia.html>
the checker first asks for
http://validator.w3.org/robots.txt
and when it gets it, it finds out that it says

User-agent: *
Disallow: /check

which means that all robots are forbidden to fetch anything with a URL
that begins with

http://validator.w3.org/check

Similar things happen to
<http://jigsaw.w3.org/css-validator/v...uri=http://www.
scaiecat-spa-gigi.com/it/svezia.html>
because http://jigsaw.w3.org/robots.txt says "no" to all robots as
regards to some parts of the site - including
Disallow: /css-validator/validator

For reasons unknown to me, the W3C thus wants to restrict link checking
(with W3C's tool) for "Valid HTML!" and "Valid CSS!" types of links that
the W3C recommends.

If you ask me, and even if you don't, this is yet another evidence for
the fact that "Valid HTML!" and "Valid CSS!" icons are worse than
useless. (For other evidence see
http://www.cs.tut.fi/~jkorpela/html/...tion.html#icon )

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html



Luigi Donatello Asero 08-13-2004 10:50 AM

Re: Checking links and robots.
 

"Jukka K. Korpela" <jkorpela@cs.tut.fi> skrev i meddelandet
news:Xns954487B798481jkorpelacstutfi@193.229.0.31. ..
> "Luigi Donatello Asero" <jaggillarfotboll@telia.com> wrote:
> For reasons unknown to me, the W3C thus wants to restrict link checking
> (with W3C's tool) for "Valid HTML!" and "Valid CSS!" types of links that
> the W3C recommends.
>
> If you ask me, and even if you don't, this is yet another evidence for
> the fact that "Valid HTML!" and "Valid CSS!" icons are worse than
> useless. (For other evidence see
> http://www.cs.tut.fi/~jkorpela/html/...tion.html#icon )


Well. may-be that someone from W3C has something to say about the opinion
you have expressed.
I find it useful to have the icons because they let me check faster if the
page which I have updated is still valid or not.
As to my questions I was wondering whether the fact that the robots did not
look at those links should mean that they did not look at the whole code
within
<div class="bottom">
and </div>
I wrote when the page was last updated within <div class="bottom">
and </div> so I was afraid that the robots could miss that for example the
page http://www.scaiecat-spa-gigi.com/it/svezia.html has been recently
updated ..

--
Luigi ( un italiano che vive in Svezia)
http://www.italymap.dk
http://www.scaiecat-spa-gigi.com/sv/boendeiitalien.html






tm 08-13-2004 02:51 PM

Re: Checking links and robots.
 
Jukka K. Korpela wrote:

> If you ask me, and even if you don't, this is yet another evidence for
> the fact that "Valid HTML!" and "Valid CSS!" icons are worse than
> useless. (For other evidence see
> http://www.cs.tut.fi/~jkorpela/html/...tion.html#icon )


At the bottom of the above page you write-

"This page is intentionally not valid HTML. Not so much as a protest
to false or misleading claims on validity but as a simple measure
against DOCTYPE sniffing. The simplest way to promote more
standards-compliant processing of a document by browsers is to use an
HTML 4.01 Strict DOCTYPE, no matter what markup is actually used in
the document. It is moral to fool browsers that way, since they have
been intentionally designed to do the wrong thing with a DOCTYPE (and
unintentionally made to do the wrong thing in differing wrong ways)."

Could you explain? What is wrong with DOCTYPE sniffing?

Sam Hughes 08-13-2004 03:21 PM

Re: Checking links and robots.
 
tm <tm@tmoero.invalid> wrote in
news:tm-9C0620.23512813082004@newsflood.tokyo.att.ne.jp:

> Jukka K. Korpela wrote:
>
>> [...]

>
> At the bottom of the above page you write-
>
> "This page is intentionally not valid HTML. Not so much as a
> protest
> to false or misleading claims on validity but as a simple measure
> against DOCTYPE sniffing. The simplest way to promote more
> standards-compliant processing of a document by browsers is to use
> an HTML 4.01 Strict DOCTYPE, no matter what markup is actually used
> in the document. It is moral to fool browsers that way, since they
> have been intentionally designed to do the wrong thing with a
> DOCTYPE (and unintentionally made to do the wrong thing in differing
> wrong ways)."
>
> Could you explain? What is wrong with DOCTYPE sniffing?


First of all, Web browsers use this sniffing to justify rendering those
documents with a certain/missing document type declaration incorrectly.
Also, such behavior can prevent authors from using the appropriate DTD.
This is not what doctypes are for, and it is not how doctypes should be
treated.

--
How to make it so visitors can't resize your fonts:
<http://www.rpi.edu/~hughes/www/wise_guy/unresizable_text.html>

Jukka K. Korpela 08-13-2004 04:16 PM

Re: Checking links and robots.
 
"Luigi Donatello Asero" <jaggillarfotboll@telia.com> wrote:

> Well. may-be that someone from W3C has something to say about the
> opinion you have expressed.


Perhaps. There's a rich supply of opinions in the world. But they lack
reasonable arguments.

> I find it useful to have the icons because they let me check faster
> if the page which I have updated is still valid or not.


If you have difficulties in using a validator, then you should find some
convenient tools for the purpose, like bookmarks. _Not_ pollute your
pages with obscure icons. If you had problems with using a spelling
checker, would you consider adding an icon that _claims_ that your text
has been spelling checked, yet use it to _check_ whether its spelling is
correct? If your page is not valid _all the time_, it is dishonest to
claim (with the icon) that it is.

> As to my questions I was wondering whether the fact that the robots
> did not look at those links should mean that they did not look at the
> whole code within
> <div class="bottom">
> and </div>


I don't see how that could affect robots the least.

--
Yucca, http://www.cs.tut.fi/~jkorpela/
Pages about Web authoring: http://www.cs.tut.fi/~jkorpela/www.html



tm 08-13-2004 04:17 PM

Re: Checking links and robots.
 
Sam Hughes <hughes@rpi.edu> wrote:
> tm wrote
> > Jukka K. Korpela wrote:
> >
> >> [...]

> >
> > At the bottom of the above page you write-
> >
> > "This page is intentionally not valid HTML. Not so much as a
> > protest
> > to false or misleading claims on validity but as a simple measure
> > against DOCTYPE sniffing. The simplest way to promote more
> > standards-compliant processing of a document by browsers is to use
> > an HTML 4.01 Strict DOCTYPE, no matter what markup is actually used
> > in the document. It is moral to fool browsers that way, since they
> > have been intentionally designed to do the wrong thing with a
> > DOCTYPE (and unintentionally made to do the wrong thing in differing
> > wrong ways)."
> >
> > Could you explain? What is wrong with DOCTYPE sniffing?

>
> First of all, Web browsers use this sniffing to justify rendering those
> documents with a certain/missing document type declaration incorrectly.
> Also, such behavior can prevent authors from using the appropriate DTD.
> This is not what doctypes are for, and it is not how doctypes should be
> treated.


No offense Sam, I'm sure that makes sense to you since you know what
you are trying to say, but I'm still lost.
Web browsers use sniffing to render documents incorrectly?
This prevents authors from using the appropriate DTD?

How do browsers use sniffing to render documents incorrectly?

Steve Pugh 08-13-2004 04:46 PM

Re: Checking links and robots.
 
tm <tm@tmoero.invalid> wrote:

>How do browsers use sniffing to render documents incorrectly?


What do you think quirks mode is? It's when the browser decides to
render the document according to the bugs in previous generations of
browsers, i.e. incorrectly.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <steve@pugh.net> <http://steve.pugh.net/>

tm 08-13-2004 05:35 PM

Re: Checking links and robots.
 
Steve Pugh <steve@pugh.net> wrote:
> tm <tm@tmoero.invalid> wrote:


> >How do browsers use sniffing to render documents incorrectly?

>
> What do you think quirks mode is? It's when the browser decides to
> render the document according to the bugs in previous generations of
> browsers, i.e. incorrectly.


Yeah yeah. That's not the question.

"The simplest way to promote more standards-compliant processing of a
document by browsers is to use an HTML 4.01 Strict DOCTYPE, no matter
what markup is actually used in the document. It is moral to fool
browsers that way, since they have been intentionally designed to do
the wrong thing with a DOCTYPE (and unintentionally made to do the
wrong thing in differing wrong ways)." "
--Jukka K. Korpela

Why only HTML 4.01 Strict? What evil will befall if i use, say, XHTML
1.0 Transitional?

Steve Pugh 08-13-2004 06:02 PM

Re: Checking links and robots.
 
tm <tm@tmoero.invalid> wrote:
>Steve Pugh <steve@pugh.net> wrote:
>> tm <tm@tmoero.invalid> wrote:

>
>> >How do browsers use sniffing to render documents incorrectly?

>>
>> What do you think quirks mode is? It's when the browser decides to
>> render the document according to the bugs in previous generations of
>> browsers, i.e. incorrectly.

>
>Yeah yeah. That's not the question.


Pardon me for answering the question you asked. If you meant to ask a
different question...

>"The simplest way to promote more standards-compliant processing of a
>document by browsers is to use an HTML 4.01 Strict DOCTYPE, no matter
>what markup is actually used in the document. It is moral to fool
>browsers that way, since they have been intentionally designed to do
>the wrong thing with a DOCTYPE (and unintentionally made to do the
>wrong thing in differing wrong ways)." "
>--Jukka K. Korpela
>
>Why only HTML 4.01 Strict? What evil will befall if i use, say, XHTML
>1.0 Transitional?


Pick one doctype, it doesn't matter which one, that triggers Standards
mode and use that doctype regardless of the actual markup in the
document. That's what Jukka seems to be saying here. And HTML 4.01
Strict is as good a choice as any and better than some.

Steve

--
"My theories appal you, my heresies outrage you,
I never answer letters and you don't like my tie." - The Doctor

Steve Pugh <steve@pugh.net> <http://steve.pugh.net/>


All times are GMT. The time now is 07:01 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.