Unicode Newbie

M

Manuel Huesser

The unicode function implies that you only can use 2 ** 16 chars
(unichr supports only this range) but with a given encoding e.g.
unicode(",,,", "utf-8") i should be able to encode
up to 2** 31 chars.

"\xfc\x12\x12\x12\x12\x12\x12" is an example for a 7
byte utf-8 string. But on encoding i get the following
error:

UTF-8 decoding error: unsupported Unicode code range

Is there any possibility to do the job?

Manuel
 
?

=?ISO-8859-1?Q?Gerhard_H=E4ring?=

Manuel said:
The unicode function implies that you only can use 2 ** 16 chars
(unichr supports only this range) but with a given encoding e.g.
unicode(",,,", "utf-8") i should be able to encode
up to 2** 31 chars.

"\xfc\x12\x12\x12\x12\x12\x12" is an example for a 7
byte utf-8 string. But on encoding i get the following
error:

UTF-8 decoding error: unsupported Unicode code range

Is there any possibility to do the job?

You can try compiling Python with --enable-unicode=ucs4.

But just because all characters map to a 0 .. 2^32 interval doesn't mean
that there is a defined character for every number in the interval. So
you'll still get encoding errors when you try to throw random
bytestrings at the encode function.

-- Gerhard
 
F

Fredrik Lundh

Manuel said:
The unicode function implies that you only can use 2 ** 16 chars
(unichr supports only this range) but with a given encoding e.g.
unicode(",,,", "utf-8") i should be able to encode
up to 2** 31 chars.

Nope. Read on.
"\xfc\x12\x12\x12\x12\x12\x12" is an example for a 7
byte utf-8 string. But on encoding i get the following
error:

UTF-8 decoding error: unsupported Unicode code range

Unicode supports ~2**20 code points (17*64k), not 2**31 characters.
Your example is not a valid UTF-8 string.
Is there any possibility to do the job?

Not if you're using a conforming Unicode implementation.

</F>
 
M

Manuel Huesser

"\xfc\x12\x12\x12\x12\x12\x12" is an example for a 7
Unicode supports ~2**20 code points (17*64k), not 2**31 characters.
Your example is not a valid UTF-8 string.

Yep Unicode supports less characters than there are possible with
utf-8 (ucs range = 2 ** 31).

so there is no possibilty to support the full range of the ucs
character set with python?

Manuel
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

Manuel Huesser said:
Yep Unicode supports less characters than there are possible with
utf-8 (ucs range = 2 ** 31).

so there is no possibilty to support the full range of the ucs
character set with python?

The ucs range (for UCS-4) is *not* 2**31; it is 17*2**16. It was 2**32
in ISO/IEC 10646:1993 (I believe), but it got constrained in 10646:2000.

It is certainly possible to represent 2**32 different values in a
Python Unicode character - but you will have to change the Python
interpreter source code for that.

Regards,
Martin
 
?

=?iso-8859-1?Q?Fran=E7ois?= Pinard

[Martin von Löwis]
The ucs range (for UCS-4) is *not* 2**31; it is 17*2**16. It was 2**32
in ISO/IEC 10646:1993 (I believe), but it got constrained in 10646:2000..

I think UCS-4 is (or at least was) defined for 2**31 code points only. I
do not know why the sign bit was excluded (maybe to avoid problems with
negative values for code points?), but if you consider the logic of
UTF-8, you will see than one full byte would be needed to support the
32th bit. This does not mean it was the reason, I do not know.

UTF-16 has 17*2**16 code points. I did not recently study the legal
verses, but my overall impression is that UTF-16 has been more or less
integrated in UCS-2 in more recent Unicode versions, and made official.
I do not know exactly what means UCS-2 nowadays, as it does not really
exist anymore as defined originally (with the intent of being fixed
width). Unless UCS-2 is 2**16 - 2**11 codepoints? The surrogate areas
cannot sensibly be part of it, at least nowadays. Hmph! I should
really read recent legal texts when I get to dive in such areas... :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,901
Members
47,439
Latest member
elif2sghost

Latest Threads

Top