Just for reference, here is the starting para of Chris' original mail
that started this thread.
ie it mentions numbers, strings, PEP 393 *AND jmf.* Â So while it is
true that jmf has been butting in with trollish behavior into
completely unrelated threads with his unicode rants, that cannot be
said for this thread.
-----
That's because you did not understand the analogy, int/long <-> FSR.
One another illustration,
.... if 0 < i <= 100:
.... return i + 10 + 10 + 10 - 10 - 10 - 10 + 1
.... elif 100 < i <= 1000:
.... return i + 100 + 100 + 100 + 100 - 100 - 100 - 100 - 100
+ 1
.... else:
.... return i + 1
....
Do it work? yes.
Is is "correct"? this can be discussed.
Now replace i by a char, a representent of each "subset"
of the FSR, select a method where this FST behave badly
and take a look of what happen.
timeit.repeat("'a' * 1000 + 'z'") [0.6532032148133153, 0.6407248807756699, 0.6407264561239894]
timeit.repeat("'a' * 1000 + '9'") [0.6429508479509245, 0.6242782443215589, 0.6240490311410927]
timeit.repeat("'a' * 1000 + '€'") [1.095694927496563, 1.0696347279235603, 1.0687741939041082]
timeit.repeat("'a' * 1000 + 'ẞ'") [1.0796421281222877, 1.0348612767961853, 1.035325216876231]
timeit.repeat("'a' * 1000 + '\u2345'") [1.0855414137412112, 1.0694677410017164, 1.0688096392412945]
timeit.repeat("'Å“' * 1000 + '\U00010001'") [1.237314015362017, 1.2226262553064657, 1.21994619397816]
timeit.repeat("'Å“' * 1000 + '\U00010002'")
[1.245773635836997, 1.2303978424029651, 1.2258257877430765]
Where does it come from? Simple, the FSR breaks the
simple rules used in all coding schemes (unicode or not).
1) a unique set of chars
2) the "same" algorithm for all chars.
And again that's why utf-8 is working very smoothly.
The "corporates" which understood this very well and
wanted to incorporate, let say, the used characters
of the French language had only the choice to
create new coding schemes (eg mac-roman, cp1252).
In unicode, the "latin-1" range is real plague.
After years of experience, I'm still fascinated to see
the corporates has solved this issue easily and the "free
software" is still relying on latin-1.
I never succeed to find an explanation.
Even, the TeX folks, when they shifted to the Cork
encoding in 199?, were aware of this and consequently
provides special package(s).
No offense, this is in my mind why "corporate software"
will always be "corporate software" and "hobbyist software"
will always stay at the level of "hobbyist software".
A French windows user, understanding nothing in the
coding of characters, assuming he is aware of its
existence (!), has certainly no problem.
Fascinating how it is possible to use Python to teach,
to illustrate, to explain the coding of the characters. No?
jmf