The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.
As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.
No, not depending on jurisdiction in France. In French French, one
would capitalize =EAtre as Etre. In Canadian French, one would
capitalize it as =CAtre.
Also, in Turkish, there are four different cases of 'i', not just two.. a= nd which is
correct depends on the jurisdiction.
Not quite. There are two different 'i' letters: one with a dot, one
without. One is capitalized with a dot and one is capitalized without
the dot.
Also, the German eszet (=DF, as in Schlo=DF) would be capitalized as
SCHLOSS, but downcasing that would be schloss, not necessarily schlo=DF.
(Actually, and the Germans here will correct me on this I'm sure, I
think it would always be Schloss or Schlo=DF becaus the leading S would
not be lowercased in proper German. Looking at some German webpages
suggests so.)
Determining the locale in a correct way is really, really hard. Tim
Bray says it's basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.
Not impossible, just fraught with errors and performance issues. One
would not only have to have the locale lookup stuff, but one would
have to do statistical analysis to get better than mostly wrong with
anything but English.
-austin
--=20
Austin Ziegler * (e-mail address removed) *
http://www.halostatue.ca/
* (e-mail address removed) *
http://www.halostatue.ca/feed/
* (e-mail address removed)