robot.txt

D

David Graham

PeterMcC said:
And, if you really want to be safe, you could always password protect the
directory with .htaccess - dead easy and the spiders don't get past the
password protect.

Thanks to everyone. I wonder if you could give me a few pointers to how I go
about password protecting using .htaccess

David
 
J

Jukka K. Korpela

Headless said:
Assuming that "The majority of Web authors" use
http://www.host.com/~user url's is a very bold claim.

I made no such claim. I made the claim that most authors have no
control over what resides at http://www.foo.example/robots.txt
Again there is a risk of ambiguity here, http://www.user.host.com
should be labeled as a "sub-domain", it's not registered anywhere
and it's not portable, so you certainly can not call it "owning a
domain".

There is no ambiguity here. The domain host.com exists. The domain
user.host.com currently does not exist. Calling some domains subdomains
has no relevance to this, or to our topic. Either a domain name exists
or it does not, on the Internet, according to domain name servers. And
this has little to do with robots.txt.
I don't see how the robots.txt convention relates to Apache
.htaccess files.

Sorry, my typo. I meant http://www.host.com/robots.txt of course. The
point stands.
http://www.host.com/~user would resolve to
http://www.host.com/robots.txt for compliant clients looking for a
robots.txt
Indeed.

You have not provided any evidence that Atomz does not follow the
correct procedure for retrieving a robots.txt.

It was you who wrote an objection, based on a claim on Atomz behavior,
to my statement that said that robots.txt must reside on the server
root.
(all my sites use
http://www.user.host.com urls).

Too bad then. They do not work until you get that domain registered.
 
H

Headless

Jukka K. Korpela said:
I made no such claim. I made the claim that most authors have no
control over what resides at http://www.foo.example/robots.txt

Same thing since http://www.host.com/~user is the only format where a
robots.txt cannot be used by the user.
There is no ambiguity here. The domain host.com exists. The domain
user.host.com currently does not exist. Calling some domains subdomains
has no relevance to this, or to our topic. Either a domain name exists
or it does not, on the Internet, according to domain name servers. And
this has little to do with robots.txt.

http://www.user.host.com resolves to host.com which then resolves the
prefix "user" locally. The relevance to this robots.txt thread is that
you are using incorrect terminology by referring to "servers". This
needs to be replaced by "(sub)domain", the "sub" prefix is needed to
prevent ambiguity as most people would (rightly) interpret "domain" as
"a registered domain". As demonstrated the usage of robots.txt is not
restricted to registered domains.
Sorry, my typo. I meant http://www.host.com/robots.txt of course. The
point stands.

You've failed to explain your claim of a relation between Apache
..htaccess config files and the robots.txt convention.
It was you who wrote an objection, based on a claim on Atomz behavior,
to my statement that said that robots.txt must reside on the server
root.

Indeed, I was correct, and you accused Atomz from not following the
rules, which is incorrect.
Too bad then. They do not work until you get that domain registered.

Don't follow.


Headless
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,073
Messages
2,570,539
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top