robot.txt

Denise Enck · Jun 30, 2003

William Tasso said:
and point the way for misbehaving crawlers at the same time.

not necessarily. there are numerous ways to block 'bad' spiders ~~

David Graham · Jun 30, 2003

PeterMcC said:
And, if you really want to be safe, you could always password protect the
directory with .htaccess - dead easy and the spiders don't get past the
password protect.

Thanks to everyone. I wonder if you could give me a few pointers to how I go
about password protecting using .htaccess

David

Jukka K. Korpela · Jun 30, 2003

Headless said:
Assuming that "The majority of Web authors" use
http://www.host.com/~user url's is a very bold claim.

I made no such claim. I made the claim that most authors have no
control over what resides at http://www.foo.example/robots.txt

Again there is a risk of ambiguity here, http://www.user.host.com
should be labeled as a "sub-domain", it's not registered anywhere
and it's not portable, so you certainly can not call it "owning a
domain".

There is no ambiguity here. The domain host.com exists. The domain
user.host.com currently does not exist. Calling some domains subdomains
has no relevance to this, or to our topic. Either a domain name exists
or it does not, on the Internet, according to domain name servers. And
this has little to do with robots.txt.

I don't see how the robots.txt convention relates to Apache
.htaccess files.

Sorry, my typo. I meant http://www.host.com/robots.txt of course. The
point stands.

http://www.host.com/~user would resolve to
http://www.host.com/robots.txt for compliant clients looking for a
robots.txt
Indeed.

You have not provided any evidence that Atomz does not follow the
correct procedure for retrieving a robots.txt.

It was you who wrote an objection, based on a claim on Atomz behavior,
to my statement that said that robots.txt must reside on the server
root.

(all my sites use
http://www.user.host.com urls).

Too bad then. They do not work until you get that domain registered.

Headless · Jul 1, 2003

Jukka K. Korpela said:
I made no such claim. I made the claim that most authors have no
control over what resides at http://www.foo.example/robots.txt

Same thing since http://www.host.com/~user is the only format where a
robots.txt cannot be used by the user.

There is no ambiguity here. The domain host.com exists. The domain
user.host.com currently does not exist. Calling some domains subdomains
has no relevance to this, or to our topic. Either a domain name exists
or it does not, on the Internet, according to domain name servers. And
this has little to do with robots.txt.

http://www.user.host.com resolves to host.com which then resolves the
prefix "user" locally. The relevance to this robots.txt thread is that
you are using incorrect terminology by referring to "servers". This
needs to be replaced by "(sub)domain", the "sub" prefix is needed to
prevent ambiguity as most people would (rightly) interpret "domain" as
"a registered domain". As demonstrated the usage of robots.txt is not
restricted to registered domains.

Sorry, my typo. I meant http://www.host.com/robots.txt of course. The
point stands.

You've failed to explain your claim of a relation between Apache
..htaccess config files and the robots.txt convention.

It was you who wrote an objection, based on a claim on Atomz behavior,
to my statement that said that robots.txt must reside on the server
root.

Indeed, I was correct, and you accused Atomz from not following the
rules, which is incorrect.

Too bad then. They do not work until you get that domain registered.

Don't follow.

Headless

Wipkip · Jul 1, 2003

Headless said:
Same thing since http://www.host.com/~user is the only format where a
robots.txt cannot be used by the user.

Been working for me.

Search nested folders with specific names in python	0	Sep 23, 2022
Error with a script to separate Undertale proggress	2	Dec 19, 2024
Converting several Markdown files into DOCX with pandoc	4	Feb 1, 2023
How do I Exchange MBOX Files in PST format?	10	Oct 17, 2024
Ow do I easily convert my PST file into a PDF?	9	Dec 28, 2024
Google spidering & traffic	4	Feb 20, 2007
Trouble with PHP coding	3	Aug 5, 2024
What is the easiest way to import PST files into Office 365?	8	Dec 30, 2024

robot.txt

Denise Enck

David Graham

Jukka K. Korpela

Headless

Wipkip

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads