W
wanghz
Hello, everyone.
I have a problem when I'm processing unicode strings. Is it possible
to get the 8bit-string representation of any unicode string?
Suppose I get a unicode string:
a = u'\xc8\xce\xcf\xcd\xc6\xeb';
then, by
a.encode('latin-1');
I can get the 8bit-string representation of it, that is, the physical
storage format of this string.
But for another kind of unicode string, say:
b = u'\u4efb\u8d24\u9f50';
I have to:
b.encode('utf-8')
to get the 8bit-string format of it.
Since these unicode strings are given by an external library function,
I don't know which kind a unicode string belongs to before I get it at
runtime. So, I wonder if there is a unified way to get the 8bit-string
representation, say, byte-by-byte, of any unicode string?
Thank you very much.
I have a problem when I'm processing unicode strings. Is it possible
to get the 8bit-string representation of any unicode string?
Suppose I get a unicode string:
a = u'\xc8\xce\xcf\xcd\xc6\xeb';
then, by
a.encode('latin-1');
I can get the 8bit-string representation of it, that is, the physical
storage format of this string.
But for another kind of unicode string, say:
b = u'\u4efb\u8d24\u9f50';
I have to:
b.encode('utf-8')
to get the 8bit-string format of it.
Since these unicode strings are given by an external library function,
I don't know which kind a unicode string belongs to before I get it at
runtime. So, I wonder if there is a unified way to get the 8bit-string
representation, say, byte-by-byte, of any unicode string?
Thank you very much.