Checking links and robots.

Luigi Donatello Asero · Aug 13, 2004

I tried to check the links of some pages of the website
http://www.scaiecat-spa-gigi.com and I got this message
http://validator.w3.org/checklink?u.../svezia.html&hide_type=all&depth=&check=Check
As far as I remember I have not set any robots.txt .
Is robots.txt on the validator?

Jukka K. Korpela · Aug 13, 2004

Luigi Donatello Asero said:
I tried to check the links of some pages of the website
http://www.scaiecat-spa-gigi.com and I got this message

I guess the relevant part of the message page you got is this:

"The link was not checked due to robots exclusion rules. Check the link
manually, and see also the link checker documentation on robots
exclusion."

for two URLs. It misleadingly appears under the heading "List of broken
links and redirects" - it means that the link checker _did not check_
those links, so it cannot know whether they are broken or redirected or
just fine.

As far as
I remember I have not set any robots.txt .

You don't. The URL http://www.scaiecat-spa-gigi.com/robots.txt
does not refer to anything; and that's the URL that any well-behaving
robot checks first, before fetching anything from your site - if the
resource does not exist, the robot assumes it's welcome. (You would use
robots.txt to _exclude_ robots if you wanted to.)

Is robots.txt on the validator?

Yes. And elsewhere.

The link checker is presumably a well-behaving robot. This means that
before checking links pointing to a site, it first checks for robots.txt
at the site pointed to. Thus, when you have a link with href value
<http://validator.w3.org/check?uri=http://www.
scaiecat-spa-gigi.com%2Fit%2Fsvezia.html>
the checker first asks for
http://validator.w3.org/robots.txt
and when it gets it, it finds out that it says

User-agent: *
Disallow: /check

which means that all robots are forbidden to fetch anything with a URL
that begins with

http://validator.w3.org/check

Similar things happen to
<http://jigsaw.w3.org/css-validator/validator?uri=http://www.
scaiecat-spa-gigi.com/it/svezia.html>
because http://jigsaw.w3.org/robots.txt says "no" to all robots as
regards to some parts of the site - including
Disallow: /css-validator/validator

For reasons unknown to me, the W3C thus wants to restrict link checking
(with W3C's tool) for "Valid HTML!" and "Valid CSS!" types of links that
the W3C recommends.

If you ask me, and even if you don't, this is yet another evidence for
the fact that "Valid HTML!" and "Valid CSS!" icons are worse than
useless. (For other evidence see
http://www.cs.tut.fi/~jkorpela/html/validation.html#icon )

Luigi Donatello Asero · Aug 13, 2004

Jukka K. Korpela said:
For reasons unknown to me, the W3C thus wants to restrict link checking
(with W3C's tool) for "Valid HTML!" and "Valid CSS!" types of links that
the W3C recommends.

If you ask me, and even if you don't, this is yet another evidence for
the fact that "Valid HTML!" and "Valid CSS!" icons are worse than
useless. (For other evidence see
http://www.cs.tut.fi/~jkorpela/html/validation.html#icon )

Well. may-be that someone from W3C has something to say about the opinion
you have expressed.
I find it useful to have the icons because they let me check faster if the
page which I have updated is still valid or not.
As to my questions I was wondering whether the fact that the robots did not
look at those links should mean that they did not look at the whole code
within
<div class="bottom">
and </div>
I wrote when the page was last updated within <div class="bottom">
and </div> so I was afraid that the robots could miss that for example the
page http://www.scaiecat-spa-gigi.com/it/svezia.html has been recently
updated ..

tm · Aug 13, 2004

Jukka said:
If you ask me, and even if you don't, this is yet another evidence for
the fact that "Valid HTML!" and "Valid CSS!" icons are worse than
useless. (For other evidence see
http://www.cs.tut.fi/~jkorpela/html/validation.html#icon )

At the bottom of the above page you write-

"This page is intentionally not valid HTML. Not so much as a protest
to false or misleading claims on validity but as a simple measure
against DOCTYPE sniffing. The simplest way to promote more
standards-compliant processing of a document by browsers is to use an
HTML 4.01 Strict DOCTYPE, no matter what markup is actually used in
the document. It is moral to fool browsers that way, since they have
been intentionally designed to do the wrong thing with a DOCTYPE (and
unintentionally made to do the wrong thing in differing wrong ways)."

Could you explain? What is wrong with DOCTYPE sniffing?

Sam Hughes · Aug 13, 2004

Jukka said:
Jukka said:

[...]

Click to expand...

At the bottom of the above page you write-

"This page is intentionally not valid HTML. Not so much as a
protest
to false or misleading claims on validity but as a simple measure
against DOCTYPE sniffing. The simplest way to promote more
standards-compliant processing of a document by browsers is to use
an HTML 4.01 Strict DOCTYPE, no matter what markup is actually used
in the document. It is moral to fool browsers that way, since they
have been intentionally designed to do the wrong thing with a
DOCTYPE (and unintentionally made to do the wrong thing in differing
wrong ways)."

Could you explain? What is wrong with DOCTYPE sniffing?

First of all, Web browsers use this sniffing to justify rendering those
documents with a certain/missing document type declaration incorrectly.
Also, such behavior can prevent authors from using the appropriate DTD.
This is not what doctypes are for, and it is not how doctypes should be
treated.

Jukka K. Korpela · Aug 13, 2004

Luigi Donatello Asero said:
Well. may-be that someone from W3C has something to say about the
opinion you have expressed.

Perhaps. There's a rich supply of opinions in the world. But they lack
reasonable arguments.

I find it useful to have the icons because they let me check faster
if the page which I have updated is still valid or not.

If you have difficulties in using a validator, then you should find some
convenient tools for the purpose, like bookmarks. _Not_ pollute your
pages with obscure icons. If you had problems with using a spelling
checker, would you consider adding an icon that _claims_ that your text
has been spelling checked, yet use it to _check_ whether its spelling is
correct? If your page is not valid _all the time_, it is dishonest to
claim (with the icon) that it is.

As to my questions I was wondering whether the fact that the robots
did not look at those links should mean that they did not look at the
whole code within
<div class="bottom">
and </div>

I don't see how that could affect robots the least.

tm · Aug 13, 2004

Sam Hughes said:
tm wrote

Jukka said:

[...]

Click to expand...

At the bottom of the above page you write-

"This page is intentionally not valid HTML. Not so much as a
protest
to false or misleading claims on validity but as a simple measure
against DOCTYPE sniffing. The simplest way to promote more
standards-compliant processing of a document by browsers is to use
an HTML 4.01 Strict DOCTYPE, no matter what markup is actually used
in the document. It is moral to fool browsers that way, since they
have been intentionally designed to do the wrong thing with a
DOCTYPE (and unintentionally made to do the wrong thing in differing
wrong ways)."

Could you explain? What is wrong with DOCTYPE sniffing?

Click to expand...

First of all, Web browsers use this sniffing to justify rendering those
documents with a certain/missing document type declaration incorrectly.
Also, such behavior can prevent authors from using the appropriate DTD.
This is not what doctypes are for, and it is not how doctypes should be
treated.

No offense Sam, I'm sure that makes sense to you since you know what
you are trying to say, but I'm still lost.
Web browsers use sniffing to render documents incorrectly?
This prevents authors from using the appropriate DTD?

How do browsers use sniffing to render documents incorrectly?

Steve Pugh · Aug 13, 2004

tm said:
How do browsers use sniffing to render documents incorrectly?

What do you think quirks mode is? It's when the browser decides to
render the document according to the bugs in previous generations of
browsers, i.e. incorrectly.

Steve

tm · Aug 13, 2004

Steve Pugh said:
What do you think quirks mode is? It's when the browser decides to
render the document according to the bugs in previous generations of
browsers, i.e. incorrectly.

Yeah yeah. That's not the question.

"The simplest way to promote more standards-compliant processing of a
document by browsers is to use an HTML 4.01 Strict DOCTYPE, no matter
what markup is actually used in the document. It is moral to fool
browsers that way, since they have been intentionally designed to do
the wrong thing with a DOCTYPE (and unintentionally made to do the
wrong thing in differing wrong ways)." "
--Jukka K. Korpela

Why only HTML 4.01 Strict? What evil will befall if i use, say, XHTML
1.0 Transitional?

Steve Pugh · Aug 13, 2004

tm said:
Yeah yeah. That's not the question.

Pardon me for answering the question you asked. If you meant to ask a
different question...

"The simplest way to promote more standards-compliant processing of a
document by browsers is to use an HTML 4.01 Strict DOCTYPE, no matter
what markup is actually used in the document. It is moral to fool
browsers that way, since they have been intentionally designed to do
the wrong thing with a DOCTYPE (and unintentionally made to do the
wrong thing in differing wrong ways)." "
--Jukka K. Korpela

Why only HTML 4.01 Strict? What evil will befall if i use, say, XHTML
1.0 Transitional?

Pick one doctype, it doesn't matter which one, that triggers Standards
mode and use that doctype regardless of the actual markup in the
document. That's what Jukka seems to be saying here. And HTML 4.01
Strict is as good a choice as any and better than some.

Steve

tm · Aug 13, 2004

Steve said:
tm wrote:

Pick one doctype, it doesn't matter which one, that triggers Standards
mode and use that doctype regardless of the actual markup in the
document. That's what Jukka seems to be saying here. And HTML 4.01
Strict is as good a choice as any and better than some.

Ah. That makes sense.

Luigi Donatello Asero · Aug 14, 2004

Jukka K. Korpela said:
Perhaps. There's a rich supply of opinions in the world. But they lack
reasonable arguments.

And who should decide which opinions lack reasonable arguments?

If you have difficulties in using a validator, then you should find some
convenient tools for the purpose, like bookmarks. _Not_ pollute your
pages with obscure icons. If you had problems with using a spelling
checker, would you consider adding an icon that _claims_ that your text
has been spelling checked, yet use it to _check_ whether its spelling is
correct? If your page is not valid _all the time_, it is dishonest to
claim (with the icon) that it is.

I do not share your opinion but you may have yours. When you put an icon on
the page it may happen that you insert something which is wrong.
Actually, the code has a space before uri
http://validator.w3.org/check?
uri=http%3A%2F%2Fwww.scaiecat-spa-gigi.com%2Fit%2Fsvezia.html
which must be corrected.
http://validator.w3.org/check?uri=h...charset=(detect+automatically)&doctype=Inline
That does not mean though that the icon and the validator are useless.

Sam Hughes · Aug 14, 2004

Well. may-be that someone from W3C has something to say about the
opinion you have expressed.
I find it useful to have the icons because they let me check faster
if the page which I have updated is still valid or not.

You can create a bookmark or favorite with the following code. Be
careful about word wrap:
javascript:void(location='http://validator.w3.org/check?uri='+escape
(location))

CSS Validator:
javascript:void(location='http://jigsaw.w3.org/css-validator/validator?
uri='+escape(location)+'&warning=1&profile=css2')

Some info:
http://www.fjordaan.uklinux.net/moveabletype/fblog/archives/000059.html

Luigi Donatello Asero · Aug 14, 2004

Sam Hughes said:
You can create a bookmark or favorite with the following code. Be
careful about word wrap:
javascript:void(location='http://validator.w3.org/check?uri='+escape
(location))

CSS Validator:
javascript:void(location='http://jigsaw.w3.org/css-validator/validator?
uri='+escape(location)+'&warning=1&profile=css2')

Some info:
http://www.fjordaan.uklinux.net/moveabletype/fblog/archives/000059.html

I do not like using javascript because those who disable it cannot access
the page.
I often disable it myself when I navigate on the internet!
Besides, as I already tried to explain, I got a code for the link showing
that my page validates which has a space before uri.
For example
http://validator.w3.org/check?
uri=http%3A%2F%2Fwww.scaiecat-spa-gigi.com%2Fit%2Fsvezia.html
which I corrected.
http://validator.w3.org/check?uri=h...charset=(detect+automatically)&doctype=Inline

Also, as far as I understood, Bruce was in favour of HTML icons, wasn´t he?
Bruce, did you change your mind about that?

Sam Hughes · Aug 14, 2004

I do not like using javascript because those who disable it cannot
access the page.

I am talking about a _Favorite_ or a _Bookmark_ that only gets placed
into _your_ browser's bookmarks! This has nothing to do with putting
javascript on your web page; it is an easy way to use the validator
which eliminates a reason for the W3c icons.

Jukka K. Korpela · Aug 14, 2004

Luigi Donatello Asero said:
And who should decide which opinions lack reasonable arguments?

You. You put their icons on your page, or you don't. You can read their
arguments and see that they are bogus - my document just tries to help
you see that.

That does not mean though that the icon and the validator are useless.

I didn't say they are useless. They are worse than useless. There's a big
difference.

Luigi Donatello Asero · Aug 14, 2004

Jukka K. Korpela said:
You. You put their icons on your page, or you don't. You can read their
arguments and see that they are bogus - my document just tries to help
you see that.

I didn't say they are useless. They are worse than useless. There's a big
difference.

I think that they are useful although they are not perfect.

Luigi Donatello Asero · Aug 14, 2004

Sam Hughes said:
I am talking about a _Favorite_ or a _Bookmark_ that only gets placed
into _your_ browser's bookmarks! This has nothing to do with putting
javascript on your web page; it is an easy way to use the validator
which eliminates a reason for the W3c icons.

And where should the code be placed?
Does the licence agreeement for the use of IE let users modify the browsers?

Toby Inkster · Aug 14, 2004

Luigi said:
And where should the code be placed?
Does the licence agreeement for the use of IE let users modify the browsers?

You don't need to modify the browsers. The Javascript is simply added as a
bookmark.

For example, you can add a bookmark that points to "http://www.google.com/",
right?

You can also add a bookmark that points to "javascript:resizeTo(100,100);".

Jukka K. Korpela · Aug 14, 2004

Luigi Donatello Asero said:
I think that they are useful although they are not perfect.

As you like it, but that does not make them useful. All claims about the
usefulness of the Valid HTML! icons have been proven incorrect. In fact,
all the purported uses can be proven to be _harmful_.

Strangely, after _every_ purported use has been disproved, people _still_
keep saying "well, they are maybe not perfect, but they are useful!".
It is hard to understand this as other than a strange form of
religiousness - iconolatry.

(The W3C is probably too proud to admit this, after a long period of
propaganda. But you need not be.)

Absolute links in subsubdirectories	3	Aug 14, 2005
Link checker	6	Jan 6, 2006
Problems with validation	4	May 11, 2004
CSS validator and Word pad	2	Feb 6, 2005
Css error?	1	Dec 31, 2004
css validator	3	Apr 24, 2005
Parse error (css)	2	Mar 31, 2005
Grouping and bypassing links	3	Oct 1, 2003

Checking links and robots.

Luigi Donatello Asero

Jukka K. Korpela

Luigi Donatello Asero

tm

Sam Hughes

Jukka K. Korpela

tm

Steve Pugh

tm

Steve Pugh

tm

Luigi Donatello Asero

Sam Hughes

Luigi Donatello Asero

Sam Hughes

Jukka K. Korpela

Luigi Donatello Asero

Luigi Donatello Asero

Toby Inkster

Jukka K. Korpela

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads