W
wxjmfauth
Le samedi 25 août 2012 11:46:34 UTC+2, Frank Millman a écrit :
Very well explained. Thanks.
More precisely, affected are not only the 'english-speaking'
users, but all the users who are using not latin-1 characters.
(See the title of this topic, ... typography).
Being at the same time, latin-1 and unicode compliant is
a plain absurdity in the mathematical sense.
---
For those you do not know, the go language has introduced
the rune type. As far as I know, nobody is complaining, I
have not even seen a discussion related to this subject.
100% Unicode compliant from the day 0. Congratulations.
jmf
Here's what I think he is saying. I am posting this to test the water. I
am also confused, and if I have got it wrong hopefully someone will
correct me.
In python 3.3, unicode strings are now stored as follows -
if all characters can be represented by 1 byte, the entire string is
composed of 1-byte characters
else if all characters can be represented by 1 or 2 bytea, the entire
string is composed of 2-byte characters
else the entire string is composed of 4-byte characters
There is an overhead in making this choice, to detect the lowest number
of bytes required.
jmfauth believes that this only benefits 'english-speaking' users, as
the rest of the world will tend to have strings where at least one
character requires 2 or 4 bytes. So they incur the overhead, without
getting any benefit.
Therefore, I think he is saying that he would have preferred that python
standardise on 4-byte characters, on the grounds that the saving in
memory does not justify the performance overhead.
Frank Millman
Very well explained. Thanks.
More precisely, affected are not only the 'english-speaking'
users, but all the users who are using not latin-1 characters.
(See the title of this topic, ... typography).
Being at the same time, latin-1 and unicode compliant is
a plain absurdity in the mathematical sense.
---
For those you do not know, the go language has introduced
the rune type. As far as I know, nobody is complaining, I
have not even seen a discussion related to this subject.
100% Unicode compliant from the day 0. Congratulations.
jmf