Wrong unichr docstring in 2.7

J

jmfauth

I think there is a small point here.
sys.version 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
print unichr.__doc__
unichr(i) -> Unicode character

Return a Unicode string of one character with ordinal i; 0 <= i <=
0x10ffff.Traceback (most recent call last):
File "<psi last command>", line 1, in <module>
ValueError: unichr() arg not in range(0x10000) (narrow Python
build)

Note:

I find
0x0 <= i <= 0xffff
more logical than
0 <= i <= 0xffff

(orange-apple comparaison)

Ditto, for Python 2.6.5

Regards,
jmf
 
T

Thomas Jollans

I think there is a small point here.

2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]

unichr(i) -> Unicode character

Return a Unicode string of one character with ordinal i; 0 <= i <=
0x10ffff.

Traceback (most recent call last):
File "<psi last command>", line 1, in <module>
ValueError: unichr() arg not in range(0x10000) (narrow Python
build)

This is very tricky ground. I consider the behaviour of unichr() to be wrong
here. The user shouldn't have to care much about UTF-16 and the difference
between wide and narrow Py_UNICODDE builds. In fact, in Python 3.1, this
behaviour has changed:
on a narrow Python 3 build, chr(0x10fff) == '\ud803\udfff' == '\U00010fff'.

Now, the Python 2 behaviour can't be fixed [1] -- it was specified in PEP 261
[2], which means it was pretty much set in stone. Then, it was deemed more
important for unichr() to always return a length-one string that for it to
work with wide characters. And then add pretty half-arsed utf-16 support...

The doc string could be changed for narrow Python builds. I myself don't think
docstrings should change depending on build options like this -- it could be
amended to document the different behaviours here. Note that the docs [3]
already include this information.

If you want to, feel free to report a bug at http://bugs.python.org/
Note:

I find
0x0 <= i <= 0xffff
more logical than
0 <= i <= 0xffff

(orange-apple comparaison)

Would a zero by any other name not look as small? Honestly, I myself find it
nonsensical to qualify 0 by specifying a base, unless you go all the way and
represent the full uint16_t by saying 0x0000 <= i <= 0xffff

- Thomas

[1] http://bugs.python.org/issue1057588
[2] http://www.python.org/dev/peps/pep-0261/
[3] http://docs.python.org/library/functions.html#unichr
 
D

Dave Angel

jmfauth said:
I think there is a small point here.

2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)]
unichr(i) -> Unicode character

Return a Unicode string of one character with ordinal i; 0 <= i <=
0x10ffff.
Traceback (most recent call last):
File "<psi last command>", line 1, in <module>
ValueError: unichr() arg not in range(0x10000) (narrow Python
build)

Note:

I find
0x0 <= i <= 0xffff
more logical than
0 <= i <= 0xffff

(orange-apple comparaison)

Ditto, for Python 2.6.5

Regards,
jmf
There are two variants that CPython can be compiled for, 16 bit Unicode
and 32 bit. By default, the Windows implementation uses 16 bits, and
the Linux one uses 32. I believe you can rebuild your version if you
have access to an appropriate version MSC compiler, but I haven't any
direct experience.

At any rate, the bug here is that the docstring doesn't get patched to
match the compile switches for your particular build of CPython.

DaveA
 
J

jmfauth

Short comments:

1) I'm aware Python can be built in "ucs2" or "ucs4" mode. It remains
that the unichr doc string does not seem correct.

2) 0x0 versus 0
Do not take this too seriously. Sure the value of 0x0 and 0 are equal,
but the "unit" sounds strange.
Eg. If a is a length, I would not express a as beeing
0 mm <= a <= 999 m (or 0 in <= a <= 999 ft) but 0 m <= a <= 999 m .
I agree a notation like 0x0000 <= i <= 0xffff is even the best.

3) Of course, the Python 3 behaviour (chr() instead of unichr()) is
correct.

jmf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top