G
Guest
I am currently writing a python interface to a C++ library. Some of the
functions in this library take unicode strings (UTF-8, mostly) as arguments.
However, when getting these data I run into problem on python 2.2
(RHEL3) - while the data is all nice UCS4 in 2.3, in 2.2 it seems to be
UTF-8 on top of UCS4. UTF8 encoded in UCS4, meaning that 3 bytes of the
UCS4 char is 0 and the first one contains a byte of the string encoding
in UTF-8.
Is there a trick to get python 2.2 to do UCS4 more cleanly?
functions in this library take unicode strings (UTF-8, mostly) as arguments.
However, when getting these data I run into problem on python 2.2
(RHEL3) - while the data is all nice UCS4 in 2.3, in 2.2 it seems to be
UTF-8 on top of UCS4. UTF8 encoded in UCS4, meaning that 3 bytes of the
UCS4 char is 0 and the first one contains a byte of the string encoding
in UTF-8.
Is there a trick to get python 2.2 to do UCS4 more cleanly?