I always arrange to not have to care. I'm simply curious
and would like to have some numbers to show.
The desire to have numbers to show is completely understandable, but
numbers are not particularly useful in themselves. You need to know what
those numbers really mean, and be in a position to explain their meaning
to others.
I would guess that about 95% of the users that I code
for are using IE 6. Of the rest, most probably use NN 7.
That's something I'm going to find out,
I don't think you will find that out, unless you are in a position to
tell me which 3 of the web browsers I have installed on this computer
send the User Agent header:-
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
- by default (literally), and distinguish between them (and there is a
good chance that you have never even heard of the two that are not IE
6).
and I'd also just like to know how many of them
have turned off JS.
The usual proposal is to have some JS execute in a way that will send
evidence of its action back to the server. But a client with JS enabled
cannot guarantee the success of that action as it will be at least in
part dependent on the DOM provided by the browser, and will be subject
to language implementation bugs (ECMA standardisation has gone a long
way towards removing language implementation bugs but there are still
some). So any test script is virtually guaranteed to be incapable of
performing the required actions on at least some clients and so will
result in a false indication of a javascript incapable client.
But knowing a client's willingness to execute javascript remains
pointless, given javascript capable clients with such limited DOMs that
simple form validation is about the limit of their capabilities.
I'm guessing that percentage will be very low--in the
neighborhood of 2%. I am also curious as to what about
HTTP makes you think that statistics gathering is
pointless.
Caching is the main factor. Clients cache and organisations like ISPs
operate large-scale caches. HTTP is designed to encourage caching so
many request-response transactions will never arrive at the server(s)
for any given site, instead being handled by an intervening cache.
Clients often have user (and administrator) configurable cache settings
and exhibit different behaviour even with superficially similar cache
settings. And the other caches mean that the customers of one ISP may
have considerably more requests served from a cache than the customers
of another. The exact behaviour and characteristics of these caches is
impossible to gauge.
It has been proposed that this caching is already essential as the extra
load imposed on the network of having to route each and every
request/response to/from the actual server responsible would result in
traffic too heavy to be handled.
Obviously, with only a proportion of requests actually arriving at the
server, any statistics gathered on the server cannot be complete. And
the extent to which they may be representative of total requests cannot
be determined (or even estimated) because the information needed to do
that is never available.
This, and the many other factors that a well worded google search would
highlight, leave me thinking that any effort directed at the gathering
of web statistics would be wasted.
The consequences of individuals placing any trust in web statistics
(Think: Hans Christian Anderson's "The Emperor's New Clothes") produces
a feed-back effect that contributes to the meaninglessness of those
statistics. You often encounter people asserting that it is OK to build
IE only JS dependant web sites because their server logs show only a
tiny percentage of visitors not using IE with JS enabled. But given an
IE only, JS dependent web site, is it surprising that users who do not
fall into that category do not hand around on such a site producing log
entries? It becomes a chicken and egg situation where the statistics are
being used to justify the design decisions, while the statistics are
heavily biased by the consequences of the decisions that they were used
to justify.
In the end I think that web statistics are used to re-enforce prejudices
or justify decisions that were going to be made anyway, and so long as
those statistics broadly conform to the expectations of the interested
parities then they will not be (sufficiently) questioned and their
(in)accuracy won't make much difference. In which case I can see no
reason for going to any effort to gather and analyse any data, instead
it would just be a matter of finding out what the
expectations/prejudices of the interested parties were and generating
numbers that broadly conform with them. A set of meaningless numbers
derived with baseless assumptions is not really different from a set of
numbers directly based on assumptions and prejudices.
Richard.