Lots of noise about user agent strings

R

Richard Cornford

RobG said:
Seems developers of mobile applications are pretty much devoted
to UA sniffing:

<URL: http://wurfl.sourceforge.net/vodafonerant/index.htm >

Yes, it appears to be someone who has made a serious mistake complaining
about the reality that made it a mistake. Quoting a bit of that page
shows that quite a few false beliefs were behind the mistake:-

| All the way since the inception of the web, HTTP clients have
| had unique User-Agent (UA) strings which let the server know
| who they were. This mechanism was taken over as-is by mobile
| browser manufacturers. While there have been a few exceptions
| due to device manufacturer's sloppiness, it is accurate to say
| that 99.99% of the devices out there have unique UA strings
| which can be associated to brand, model and a bunch of other
| info about the device properties.

HTTP clients have not had unique UA strings for well over a decade now,
and even UA strings that were distinct from others were explicitly
designed to prevent the server from knowing which client was being used
(hence the "Mozilla" bit at the front of IE's UA header, it was put
there only to mislead servers).

And it is absolutely not the case the 99.99% of devices have unique UA
strings. I may not get to look at mobile device browsers that often but
to date the vast majority of the scriptable browsers I have examined
have had UA strings that were more or less indistinguishable from those
of IE 6.

The bottom line remains that the HTTP specification defines the
User-Agent header as an arbitrary sequence of characters and does not
even require that the sequence of characters be the same for two
consecutive requests, let alone being in any sense unique. Any attempt
to treat something that is specified in that way as a source of
information is going to be a very obvious mistake.

Richard.
 
V

VK

RobG meinte:



A real expert:

"All the way since the inception of the web, HTTP clients have had
unique User-Agent (UA) strings which let the server know who they were."

What offense to you have to that?
User-Agent string is intended to let the server know who is requesting
the page, this is why they were created and how they are used since
Mosaic. Is your version being that it is a NCSA invented beatifying
ornament of HTTP request? Sorry to say then that it is utterly wrong
though semi-poetic version.

User-Agent are unique to each browser and each browser version unless
the browser code is manually reverse-engineered and altered by the end
user in violence of EULA. Such situation is definitely possible but
serious solutions do not normally account users surfing the Web with
hacked software: unless it is a special statistics collector.

The Vodafon problem is a problem causing money loss to the involved
MMS / media push providers. The problem is not deadly because the
agent info is not removed but only reformatted - yet such things are
not allowed anyway, the intermediary servers must not be refactoring
HTTP requests. Otherwise one day by typing example.com one will end up
in example.net because some intermediary server's admin will decide
that example.net is better as the target address. It is easy to start
and much harder to stop so, yes, such things should be squeezed out
right at the beginning.
 
H

Holger Jeromin

VK schrieb am 27.05.2008 15:50:
What offense to you have to that?
User-Agent string is intended to let the server know who is requesting
the page, this is why they were created and how they are used since
Mosaic. Is your version being that it is a NCSA invented beatifying
ornament of HTTP request? Sorry to say then that it is utterly wrong
though semi-poetic version.

User-Agent are unique to each browser and each browser version unless
the browser code is manually reverse-engineered and altered by the end
user in violence of EULA. Such situation is definitely possible but

Opera was shipped a while in the default config of cloning IE6 to avoid
user problems.
serious solutions do not normally account users surfing the Web with
hacked software: unless it is a special statistics collector.

i dont think all those opera users hacked their software .-)
 
V

VK

Opera was shipped a while in the default config of cloning IE6 to avoid
user problems.

Opera never was shipped User-Agent string _cloning_ any of existing IE
User-String. In some older releases Opera's User-Agent string was
partially altered to contain some keywords most oftenly used for
server-side or client-side IE detection so it had "MSIE" and
"Microsoft Internet Explorer" in it. It still was vendor-specific
enough to detect Opera if one was targeted to detect exactly Opera and
not IE.

Opera is not alone nor it is the champion in this producer-humiliating
activity. All time the best in my collection are remaining some
releases of Safari with User-Agent string - and I'm not kidding -
"Mozilla/5.0 MSIE Microsoft Internet Explorer like Gecko; KHTML..."
with the part after KHTML giving the actual Safari-specific info.
Sometimes when I am depressed this "Microsoft Internet Explorer like
Gecko" cheers me up. :)

In any case the User-Agent spoofing business faded out way of time ago
as the "brand putting down" impact proved to be much higher than any
possible immediate benefits.

For the client-side there is also navigator.vendor string which is
more easy to parse for the producer name. It is not relevant for the
original Vodafon topic.
 
G

Gregor Kofler

VK meinte:
What offense to you have to that?
User-Agent string is intended to let the server know who is requesting
the page, this is why they were created and how they are used since
Mosaic. Is your version being that it is a NCSA invented beatifying
ornament of HTTP request? Sorry to say then that it is utterly wrong
though semi-poetic version.

It isn't even "beautifying".
User-Agent are unique to each browser and each browser version unless
the browser code is manually reverse-engineered and altered by the end
user in violence of EULA.

A-ha. How come, that Firefox explicitly allows me to set my UA
identification string to whatever I want?

[fantasizing what is allowed and what not snipped]

Gregor
 
V

VK

A-ha. How come, that Firefox explicitly allows me to set my UA
identification string to whatever I want?

You are sounding like a child, really: "Look ma', I have just pulled
out a wheel out of my new toy car!. Am I cool or what?" :)

I have no intention to stop you from your hacking exercises. After
Firefox is done
http://www.beatnikpad.com/archives/2006/10/28/how-to-change-the-user-agent-in-firefox
please feel free to start pulling out another wheel :)
http://www.pctools.com/guides/registry/detail/799/

As I said before it is perfectly possible in this or another way for
absolutely any browser on the market. Another question that it has no
relation to the real Web development. The amount of users going
through the User-Agent string change doesn't exceed the amount of
other inadequately acting groups of Web surfers: Lynx users, self-made
browser users, officially registered psycho cases with Internet
access: and other inevitable small shrink one has to expect in any
business involved with a big amount of people.

There is one puzzling point that bothers while reading similar to that
discussions. It is a bit OT to OP but overall fits well into User-
Agent string questions.
Let's us imagine that the cell phone radio pollution indeed did get
some unexpected brain cells damage effect - so every single user in
the world first thing of all finds the appropriate hack for her
proffered browser and changing the current User-Agent string to "There
is no Web, there is only Xul". Therefore server-side detection and
User-Agent detection as such become totally unreliable. The only hope
is the client-side feature detection (and a few remaining sane doctors
searching a remedy from the brain disease). So the dream of some came
true. Cool. I just have one small question:
Look at the User-Agent hacks for Firefox or at IE. Now look at this
chunk of code:
document.getFoobar = new Function;
window.alert(typeof document.getFoobar);
or
document.getFoobar = new Object;
window.alert(typeof document.getFoobar);
So now the question: who and why had decided that much more labor and
skill intensive procedure will be most probably used - but an easy
like a moo-cow runtime feature spoofing will never be? Was it some
common "internal feeling" or at least once there was a reasonable
discussion on the subject? What were the arguments? Thank you in
advance.
 
G

Gregor Kofler

VK meinte:
You are sounding like a child, really: "Look ma', I have just pulled
out a wheel out of my new toy car!. Am I cool or what?" :)

I have no intention to stop you from your hacking exercises.
As I said before it is perfectly possible in this or another way for
absolutely any browser on the market.

In case you forgot, 4 hours 21 minutes earlier you stated:

"User-Agent are unique to each browser and each browser version unless
the browser code is manually reverse-engineered and altered by the end
user in violence of EULA."
 
R

RobG

RobG meinte:



A real expert:

"All the way since the inception of the web, HTTP clients have had
unique User-Agent (UA) strings which let the server know who they were."

He should probably try to understand web technologies and not write 43kB
of rant.

The complaint is based on the belief that the UA string is the only
viable way to reliably deliver web content and downloads to mobile
devices. The owners of these sites want users to be able to download
games, software, ringtones, etc. to their mobile devices and have
confidence that they should work.

As a result, they have a database of thousands of UA strings that are
used to identify the browser and device and attempt to deliver
appropriate content.

It seems to me that they've ignored the simplest of solutions, which
might include delivering small test downloads so the user can discover
what works on their device (or not), or to just ask the user what
device they are using.

I don't have any experience in developing or using such sites, I was
wondering if anyone here has and can comment on the situation.
 
R

Richard Cornford

If that statement were true there would be precisely zero examples of
web browsers with User Agent strings that were not unique (i.e. that
could not be discriminated from all other user agent's UA strings by
examining the sequence of characters that they contain). Such a
statement is proved false by any single example of a browser with a
(default) UA string that is indistinguishable from that of another
browser. So when a 2003 release of IceBrowsr 5 defaults to a UA string
of "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)" (observing that IE
5.0 installed on Windows 98 would use precisely the same sequence of
characters in its default UA string) we know the assertion to be
false.

The second browser with a non-unique UA string makes the point stronger,
and the third stronger still, and so on, but the fact that the belief is
false is revealed by just the first.

It is virtually inevitable (at least in a technical context) that acting
on a false belief will have more or less undesirable consequences.
The complaint is based on the belief that the UA string is the
only viable way to reliably deliver web content and downloads
to mobile devices.

That would be a false belief, because even if UA string based browser
sniffing were the best alternative available that still would not make
it "reliable". And the article is about the author's realisation of one
of the many reasons why it is not reliable.
The owners of these sites want users to be able to download
games, software, ringtones, etc. to their mobile devices
and have confidence that they should work.

So they must have information relating make/model of these devices
in order to know what 'should work' on them.
As a result, they have a database of thousands of UA strings

Literally thousands? That is realistic if you are talking about all
models of device having an embedded web browser, but with thousands of
possible permutations the sanity of any test based on searching for any
short character sequence in a UA string looks extremely doubtful.
that are used to identify the browser and device and attempt
to deliver appropriate content.

But that also implies a database of makes/models relating to
capabilities.
It seems to me that they've ignored the simplest of solutions,
which might include delivering small test downloads so the user
can discover what works on their device (or not), or to just ask
the user what device they are using.

I would have thought a very simple solution would be to have the user
identify the make and model of the device they were using (as they are
probably in a very good position to know as it is usually written on the
casing) and then filter the available content based on that information.

I.E. First page; pick a manufacture: Nokia, Sonny, Motorola, etc. ->
Second page; pick a model range - > 0 -1000, 1001 - 2000 ... (or
whatever) -> third page; pick the specific model -> forth page; pick a
category: ring tones, games, etc. All either dynamically generated or
pre-processed from the database of model/capability/resource
relationships that must exist in order for the UA string nonsense to be
as functional as it is.

That would also deal with one of my pet complaints about this "too
clever for its own good" content delivery, which is that I (or any user)
might not want to download something for the device that I am using at
the time, but instead I might want to download something for a different
device. Apple have done a good job of illustrating this issue when you
attempt to download Safari updates. I have a number of PCs running
different operating systems and I have a Mac mini (that I use for
testing Mac browsers). The Mac mini does not have an Internet
connection, and in any event I much prefer downloading software at work
(where the bandwidth is extremely good and so the process very fast) and
putting in on a flash disc for transfer to the Mac. But Apple's web site
has started insisting on only presetting options to download Safari
versions for the OS it deduces you are using from the UA string of the
client visiting their site. Left to its own devices it will only give me
options to download Windows Safari. Obviously I get round Apple's
attempt to be 'clever' by changing IE's UA string, but I should not have
to, and I certainly don't like the implication that I am not capable of
deciding what I should be downloading for myself.

That illustrates why UA strings are not a viable means of identifying
web browsers; as soon as web site developers started using them to
discriminate about what they would serve to the client, while making
poor choices about what they were serving, it became necessary
circumvent the actions of those developers and the available approach
was to change the UA string. And that all started over a decade ago, and
was formalised in 1999 in RFC 2616, so nobody has a good excuse for not
understanding the true nature of the UA string after all this time.
I don't have any experience in developing or using such
sites, I was wondering if anyone here has and can comment
on the situation.

I don't have any experience in that particular area either, but I bet
things would go more smoothly for them if their developers were not
labouring under false beliefs, technically ignorant (about HTTP in this
case) and a little more willing to get on with correcting their mistakes
instead of blaming the external factors that expose them.

Richard.
 
P

Peter Michaux

On May 28, 3:37 pm, "Richard Cornford" <[email protected]>
wrote:

[snip]
That illustrates why UA strings are not a viable means of identifying
web browsers

If you were a system administrator and you wanted to send gzipped
JavaScript files to save bandwidth, how would you determine which
browsers could accept gzipped files and which could not? I have only
read explanations how to do this with the user agent string. I have
some ideas but have never tried any of them. For example, send a
gzipped file and then a non-gzipped file to see if the first file
worked. I'm curious what you would do or if you have any experience
with this area where user agent string is used.

[snip]

Peter
 
R

RobG

On May 28, 3:37 pm, "Richard Cornford" <[email protected]>
wrote:

[snip]
That illustrates why UA strings are not a viable means of identifying
web browsers

If you were a system administrator and you wanted to send gzipped
JavaScript files to save bandwidth,

Many people consider bandwidth to be bytes sent by the server, whereas
it should be measured as bits transmitted by the modem after the
addition of network stuff and compression.

I asked about this on the Apache news group (figuring they'd be a
suitably open-minded and knowledgable lot) and got two responses, the
one I listened to suggested it is pointless zipping files because:

1. modems are optimised to compress text for transmission and so
likely compress files better (or at least no worse) than general
purpose compression programs

2. letting the modem do the work saves CPU effort at both ends

3. if the file is zipped before transmission, subsequent modem
compression may actually result in more data to transmit (though
likely not much more)

Anyhow, here's a link:

<URL:
http://groups.google.com.au/group/a...15c37/2655117c72c4b5e0?hl=en#2655117c72c4b5e0
how would you determine which
browsers could accept gzipped files and which could not? I have only
read explanations how to do this with the user agent string. I have
some ideas but have never tried any of them. For example, send a
gzipped file and then a non-gzipped file to see if the first file
worked. I'm curious what you would do or if you have any experience
with this area where user agent string is used.

There is an Open Mobile Alliance (OMA) user agent profile
specification that UAs can use to transmit capability and preference
information:

<URL: http://www.openmobilealliance.org/tech/affiliates/wap/wap-248-uaprof-20011020-a.pdf
It is supposed to be used by developers of WAP and other mobile-
specific sites to identify device characteristics and then serve
appropriate content. The fact that projects like WURFL seem to be
more popular for such things shows that developers don't trust the
profile information to do the job.

Whether UA string or agent profiles are used, it seems that feature
detection has been abaondoned (perhaps it was never seen as a serious
contenter) as a strategy for determining mobile UA capability.

Maybe feature detection is employed as a second strategy for minor
irregularities and after device capability has already been defined
(or assumed) fairly precisely (where "precisely" is used in its
mathematic sense, which is quite different to "accurately").
 
G

Gordon

Lots of rantage demonstrating a total lack of understanding of how browsers work snipped

You know, it's better to remain silent and be thought a fool than to
open your mouth and remove all doubt.
 
V

VK

A strange snip note as I did not talk about of "how browsers work"
whatever that would mean. I explained how the server-side detection
works in commercial solutions and I hinted why client-side browser
identification methods are more reliable than the features detection.
It is for sure that both can be spoofed but the feature detection is
much more easy to spoof, plus ot has two other major faults:
1) it is client-side scripting dependent
2) it often causes round trip for the content (first load feature
detecting page, see what can the agent handle, get it from the
server).
Neither of both is acceptable for mobile device solutions where still
a noticeable share of devices is still asking for WML and WMLScript
respectively, not Javascript; and where a noticeable share of devices
is still working under the legacy GPRS networks with the max speed
9,200 bod with the actual speed rarely going above 7,600 bod because
of implied error correction mechanics.
You know, it's better to remain silent and be thought a fool than to
open your mouth and remove all doubt.

Well, you can open your own - pretty big as I see - mouth to explain
the reliability advantages of feature detection over User-Agent string
study for the prominent desktop UAs.
And I have one humble demand to everyone: anyone willing to post in
the relation of WAP devices _please_ have at least one commercial WAP
project participation in your portfolio; otherwise so far this thread
gets more and more looking like a bunch of virgins discussing secrets
of sex: funny to read but practically pointless. :)
 
P

Peter Michaux

On May 28, 3:37 pm, "Richard Cornford" <[email protected]>
wrote:
That illustrates why UA strings are not a viable means of identifying
web browsers
If you were a system administrator and you wanted to send gzipped
JavaScript files to save bandwidth,

Many people consider bandwidth to be bytes sent by the server, whereas
it should be measured as bits transmitted by the modem after the
addition of network stuff and compression.

I asked about this on the Apache news group (figuring they'd be a
suitably open-minded and knowledgable lot) and got two responses, the
one I listened to suggested it is pointless zipping files because:

1. modems are optimised to compress text for transmission and so
likely compress files better (or at least no worse) than general
purpose compression programs

2. letting the modem do the work saves CPU effort at both ends

I didn't know modems do this. There must be a standard compression
algorithm to ensure the receiver knows how to decompress.

3. if the file is zipped before transmission, subsequent modem
compression may actually result in more data to transmit (though
likely not much more)

Anyhow, here's a link:

<URL:http://groups.google.com.au/group/alt.apache.configuration/browse_frm...

Hmm. This is quite contrary to the current popular thought about
gzipping JavaScript before sending it over the wire.

Steve Souders works for Yahoo!'s performance team and has made many
experiments. I believe he watches total page load time in Firebug and
so that would include modem decompression time. You can see in the
Editorial Review section of the following page there are 14 rules to
speed a page.

<URL: http://www.amazon.com/dp/0596529309>

One of the rule is gzip components.

More confusion added to the pile.

Another issue is that files do not need to be compressed "on the fly"
by the server. They can be pre compressed and if the client can handle
the compressed version then that is the one sent.

[snip]

Peter
 
R

Richard Cornford

Peter said:
On May 28, 3:37 pm, Richard Cornford wrote:

If you were a system administrator and you wanted to send
gzipped JavaScript files to save bandwidth, how would you
determine which browsers could accept gzipped files and
which could not?

The HTTP Accept-Encoding header sent with the request would seem
like the obvious place to start (as that is precisely what it is
for).
I have only read explanations how to do this with the user
agent string.

Incredible, and incredibly foolish as HTTP very explicitly allows
proxies to change the encoding. That is, if a client cannot handle
gzip but the proxy can it can ask the server for gzip, decompress
it and send the identity encoded result to the client. It could
also do this the other way around, but it would be unlikely
that doing so would be seen as a good idea. And it could also
disregard any client preference for a compressed encoding and only
make identity requests to servers itself.

So a proxy may or may not send on the client's UA string or
substitute an alternative (which does not matter as the UA string
is arbitrary) and it may or may not impose the same encoding
limitations as the client. That would make looking at the UA
string at all in this context extremely foolish. Indeed more
foolish that ignoring q values in the Accept header when content
negotiating HTML/XHTML.
I have some ideas but have never tried any of them. For
example, send a gzipped file and then a non-gzipped file
to see if the first file worked. I'm curious what you
would do or if you have any experience with this area where
user agent string is used.

I have probably just answered both of those questions(?).

Richard.
 
T

Thomas 'PointedEars' Lahn

Peter said:
If you were a system administrator and you wanted to send gzipped
JavaScript files to save bandwidth, how would you determine which
browsers could accept gzipped files and which could not?

One would scan the Accept-Encoding request header value for "gzip", and then
use gzip(1) or a gzip implementation to compress the message body. There
are libraries like cgi_buffer which are capable of that. Apache 2.0+ has
mod_deflate.
I have only read explanations how to do this with the user agent string.

That is a pity. Had you read RFCs 1945 and 2616 more thoroughly as I think
I recommended to you before, you would also have found the specification of
this header.


PointedEars
 
D

Dr J R Stockton

In comp.lang.javascript message <4f28e817-0351-4268-928e-73f7590011ee@j3
3g2000pri.googlegroups.com>, Wed, 28 May 2008 21:27:09, RobG
Many people consider bandwidth to be bytes sent by the server, whereas
it should be measured as bits transmitted by the modem after the
addition of network stuff and compression.

Not necessarily. My pages are (almost all) small enough that, with any
reasonably recent modem, the download time itself will not upset any
possibly-significant readers. I would support efficiency of transfer,
on general grounds. But what is most important to me is the transfer
count maintained, erratically, by the server system, because there is a
monthly limit.

For any in a similar situation : I put in a ROBOTS.TXT file, denying
access to all but the Home Page INDEX.HTM. Over the course of a month
(during which I re-enabled robot access to some categories, and restored
some hidden topics), access was down by three quarters and still
dropping. I disabled ROBOTS.TXT a fortnight ago, and access had doubled
(to half the original) and is still rising.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,817
Members
47,366
Latest member
IanCulpepp

Latest Threads

Top