J
jallan
A: No. The UnicodeData.txt file includes all of the 1:1 case mappings,
but doesn't include 1:many mappings such as the one needed for
uppercasing ß. Since many parsers now expect this file to have at most
single characters in the case mapping fields, an additional file
(SpecialCasing.txt) was added to provide the 1:many mappings. For more
information, see UTR #21- Case Mappings [MD]
Python specifications make an implied claim of full support for
Unicode and an implied claim that the function upper() uppercases a
string properly.
This is a contradiction: SpecialCasing contains 1:n mappings, whereas
.upper() can only return a single result. So how do you think
SpecialCasing should be considered in the implementation of .upper()?
I am not aware that it is philosophically a *necessary* feature of
..upper() that a single character not be replaced by a string of two or
more characters.
One should fix the contradition by either changing the behavior of
..upper() so that it will properly case all strings or documenting
clearly that .upper() does not handle particular kinds of casing. Of
course users often don't read the documentation. :-(
Things are more difficult than they appear to be.
Yes.
Again and again one thinks one has a solution for a problem and then
exceptions turn up.
Again and again one finds things that one's code doesn't handle, often
from failure to analyze fully in the intitial stages and adopting
algorithms that prove insufficient to handle the data found in
reality.
Jim Allan
Jim Allan