str() should convert ANY object to a string without EXCEPTIONS !

E

est

From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to
return a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.


now we try this under windows:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

FAIL.

also almighty Linux

Python 2.3.4 (#1, Feb 6 2006, 10:38:46)
[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.4.4 (#2, Apr 5 2007, 20:11:18)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.5 (release25-maint, Jul 20 2008, 20:47:25)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)


The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in
range(256) !!!!!!!!!!

http://bugs.python.org/issue3648

One possible solution(Windows Only)
þŸ


I now spending 60% of my developing time dealing with ASCII range(128)
errors. It was PAIN!!!!!!

Please fix this issue.

http://bugs.python.org/issue3648

Please.
 
L

Lawrence D'Oliveiro

In message
est said:
The problem is, why the f**k set ASCII encoding to range(128) ????????

Because that's how ASCII is defined.
while str() is internally byte array it should be handled in
range(256) !!!!!!!!!!

But that's for random bytes. How would you convert an arbitrary object to
random bytes?
 
M

Marc 'BlackJack' Rintsch

The problem is, why the f**k set ASCII encoding to range(128) ????????

Because that's how ASCII is defined. ASCII is a 7-bit code.
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!

Yes `str` can handle that, but that's not the point. The point is how to
translate the contents of a `unicode` object into that range. There are
many different possibilities and Python refuses to guess and tries the
lowest common denominator -- ASCII -- instead.
I now spending 60% of my developing time dealing with ASCII range(128)
errors. It was PAIN!!!!!!

Please fix this issue.

http://bugs.python.org/issue3648

Please.

The issue was closed as 'invalid'. Dealing with Unicode can be a pain
and frustrating, but that's not a Python problem, it's the subject itself
that needs some thoughts. If you think this through, the relationship
between characters, encodings, and bytes, and stop dreaming of a magic
solution that works without dealing with this stuff explicitly, the pain
will go away -- or ease at least.

Ciao,
Marc 'BlackJack' Rintsch
 
T

Terry Reedy

est said:
From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to
return a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.


now we try this under windows:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

In 3.0 this is fixed:"b'123'"

Problems like this at least partly motivated the change to unicode
instead of bytes as the string type.

tjr
 
S

Steven D'Aprano

In message


Because that's how ASCII is defined.


But that's for random bytes. How would you convert an arbitrary object
to random bytes?

from random import randint
''.join(chr(randint(0, 255)) for i in xrange(len(input)))

of course. How else should you get random bytes? :)
 
S

Steven D'Aprano

est said:
From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to return
a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.


now we try this under windows:
str(u'\ue863')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

In 3.0 this is fixed:"b'123'"

Problems like this at least partly motivated the change to unicode
instead of bytes as the string type.


I'm not sure that "fixed" is the right word. Isn't that more or less the
same as telling the OP to use unicode() instead of str()? It merely
avoids the problem of converting Unicode to ASCII by leaving your string
as Unicode, rather than fixing it. Perhaps that's the right thing to do,
but it's a bit like the old joke:

"Doctor, it hurts when I do this."
"Then don't do it!"



As for the second example you give:
"b'123'"


Perhaps I'm misinterpreting it, but from here it looks to me that str()
is doing what repr() used to do, and I'm really not sure that's a good
thing. I would have expected that str(b'123') in Python 3 should do the
same thing as unicode('123') does now:
u'123'

(except without the u prefix).
 
E

est

Because that's how ASCII is defined.
Because that's how ASCII is defined. ASCII is a 7-bit code.

Then why can't python use another default encoding internally
range(256)?
Python refuses to guess and tries the lowest common denominator -- ASCII -- instead.

That's the problem. ASCII is INCOMPLETE!

If Python choose another default encoding which handles range(256),
80% of python unicode encoding problems are gone.

It's not HARD to process unicode, it's just python & python community
refuse to correct it.
stop dreaming of a magic solution

It's not 'magic' it's a BUG. Just print 0x7F to 0xFF to console,
what's wrong????
Isn't that more or less the same as telling the OP to use unicode() instead of str()?

sockets could handle str() only. If you throw unicode objects to a
socket, it will automatically call str() and cause an error.
 
S

Steven D'Aprano

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

FAIL.

What result did you expect?


[...]
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!


To quote Terry Pratchett:

"What sort of person," said Salzella patiently, "sits down and
*writes* a maniacal laugh? And all those exclamation marks, you
notice? Five? A sure sign of someone who wears his underpants
on his head." -- (Terry Pratchett, Maskerade)



In any case, even if the ASCII encoding used all 256 possible bytes, you
still have a problem. Your unicode string is a single character with
ordinal value 59491:
59491

You can't fit 59491 (or more) characters into 256, so obviously some
unicode chars aren't going to fit into ASCII without some sort of
encoding. You show that yourself:

u'\ue863'.encode('mbcs') # Windows only

But of course 'mbcs' is only one possible encoding. There are others.
Python refuses to guess which encoding you want. Here's another:

u'\ue863'.encode('utf-8')
 
E

est

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

What result did you expect?

[...]
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!

To quote Terry Pratchett:

    "What sort of person," said Salzella patiently, "sits down and
    *writes* a maniacal laugh? And all those exclamation marks, you
    notice? Five? A sure sign of someone who wears his underpants
    on his head." -- (Terry Pratchett, Maskerade)

In any case, even if the ASCII encoding used all 256 possible bytes, you
still have a problem. Your unicode string is a single character with
ordinal value 59491:

59491

You can't fit 59491 (or more) characters into 256, so obviously some
unicode chars aren't going to fit into ASCII without some sort of
encoding. You show that yourself:

u'\ue863'.encode('mbcs')  # Windows only

But of course 'mbcs' is only one possible encoding. There are others.
Python refuses to guess which encoding you want. Here's another:

u'\ue863'.encode('utf-8')

OK, I am tired of arguing these things since python 3.0 fixed it
somehow.

Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?
 
L

Lie

From python manual

str( [object])

Return a string containing a nicely printable representation of an
object. For strings, this returns the string itself. The difference
with repr(object) is that str(object) does not always attempt to
return a string that is acceptable to eval(); its goal is to return a
printable string. If no argument is given, returns the empty string,
''.

now we try this under windows:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0
: ordinal not in range(128)

FAIL.

And it is correct to fail, ASCII is only defined within range(128),
the rest (i.e. range(128, 256)) is not defined in ASCII. The
range(128, 256) are extension slots, with many conflicting meanings.
also almighty Linux

Python 2.3.4 (#1, Feb  6 2006, 10:38:46)
[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.4.4 (#2, Apr  5 2007, 20:11:18)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

Python 2.5 (release25-maint, Jul 20 2008, 20:47:25)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
position 0: ordinal not in range(128)

If that str() function has returned anything but error on this, I'd
file a bug report.
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in
range(256) !!!!!!!!!!

string is a byte array, but unicode and ASCII is NOT. Unicode string
is a character array defined up to range(65535). Each character in
unicode may be one or two bytes long. ASCII string is a character
array defined up to range(127). Other than Unicode (actually utf-8,
utf-16, and utf-32) and ASCII, there are many other encodings (ECBDIC,
iso-8859-1', ..., 'iso-8859-16', 'KOI8', 'GB18030', 'Shift-JIS', etc,
etc, etc) each with conflicting byte to characters mappings.
Fortunately, most of these encodings do share a common ground: ASCII.

Actually, when a strictly stupid str() receives a Unicode string (i.e.
character array), it should return a <unicode s at
0x423549af813e4954>, but it doesn't, str() is smarter than that, it
tries to convert whatever fits into ASCII, i.e. characters lower than
128. Why ASCII? Because character from range(128, 256) varies widely
and it doesn't know which encoding you want to use, so if you don't
tell me what encoding to use it'd not guess (Python Zen: In the face
of ambiguity, refuse the temptation to guess).

If you're trying to convert a character array (Unicode) into a byte
string, it's done by specifying which codec you want to use. str()
tries to convert your character array (Unicode) to byte string using
ASCII codec. s.encode(codec) would convert a given character array
into byte string using codec.
http://bugs.python.org/issue3648

One possible solution(Windows Only)

'\xfe\x9f'

actually str() is not needed, you need only: u'\ue863'.encode('mbcs')
䶮

I now spending 60% of my developing time dealing with ASCII range(128)
errors. It was PAIN!!!!!!

Despair not, there is a quick hack:
# but only use it as temporary solution, FIX YOUR CODE PROPERLY
str_ = str
str = lambda s = '': s.encode('mbcs') if isinstance(s, basestring)
else str_(s)
 
O

Olivier Lauzanne

On Sep 28, 4:38 pm, Steven D'Aprano <st...@REMOVE-THIS-
Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?

I assume you are using python2.5
Edit the file /usr/lib/python2.5/site.py

There is a method called
def setencoding():
[...]
encoding = "ascii"
[...]

Change "encoding = "ascii" to encoding = "utf-8"

On windows you may have to use "mbsc" or something like that. I have
no idea what windows use at its encoding.

As long as all systems don't use the same encoding (let's say utf-8
since it is becoming the standard on unixes and on the web) using
ascii as a default encoding makes sense.
 
M

Marc 'BlackJack' Rintsch

Then why can't python use another default encoding internally
range(256)?

Because that doesn't suffice. Unicode code points can be >255.
If Python choose another default encoding which handles range(256), 80%
of python unicode encoding problems are gone.

80% of *your* problems with it *seems* to be gone then.
It's not HARD to process unicode, it's just python & python community
refuse to correct it.

It is somewhat hard to deal with unicode because many don't want to think
about it or don't grasp the relationship between encodings, byte values,
and characters. Including you.
It's not 'magic' it's a BUG. Just print 0x7F to 0xFF to console, what's
wrong????

What do you mean by "just print 0x7F to 0xFF"? For example if I have ``s
= u'Smørebrød™'`` what bytes should ``str(s)`` produce and why those and
not others?
sockets could handle str() only. If you throw unicode objects to a
socket, it will automatically call str() and cause an error.

Because *you* have to tell explicitly how the unicode object should be
encoded as bytes. Python can't do this automatically because it has *no
idea* what the process at the other end of the socket expects.

Now you are complaining that Python chooses ASCII. If it is changed to
something else, like MBCS, others start complaining why it is MBCS and
not something different. See: No fix, just moving the problem to someone
else.

Ciao,
Marc 'BlackJack' Rintsch
 
L

Lie

What result did you expect?
The problem is, why the f**k set ASCII encoding to range(128) ????????
while str() is internally byte array it should be handled in range(256)
!!!!!!!!!!
To quote Terry Pratchett:
    "What sort of person," said Salzella patiently, "sits down and
    *writes* a maniacal laugh? And all those exclamation marks, you
    notice? Five? A sure sign of someone who wears his underpants
    on his head." -- (Terry Pratchett, Maskerade)
In any case, even if the ASCII encoding used all 256 possible bytes, you
still have a problem. Your unicode string is a single character with
ordinal value 59491:


You can't fit 59491 (or more) characters into 256, so obviously some
unicode chars aren't going to fit into ASCII without some sort of
encoding. You show that yourself:
u'\ue863'.encode('mbcs')  # Windows only
But of course 'mbcs' is only one possible encoding. There are others.
Python refuses to guess which encoding you want. Here's another:

OK, I am tired of arguing these things since python 3.0 fixed it
somehow.

I'm against calling python 3.0 fixed it, python 3.0's default encoding
is utf-8/Unicode, and that is why your problem magically disappears.
Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?

Python used to have sys.setdefaultencoding, but that feature was an
accident. sys.setdefaultencoding was intended to be used for testing
purpose when the developers haven't decided what to use as default
encoding (what use is default when you can change it).
sys.setdefaultencoding has been removed, programmers should encode
characters manually if they want to use something other than the
default encoding (ASCII).
 
E

est

Because that doesn't suffice.  Unicode code points can be >255.


80% of *your* problems with it *seems* to be gone then.


It is somewhat hard to deal with unicode because many don't want to think
about it or don't grasp the relationship between encodings, byte values,
and characters.  Including you.



What do you mean by "just print 0x7F to 0xFF"?  For example if I have ``s
= u'Smørebrød™'`` what bytes should ``str(s)`` produce and why those and
not others?



Because *you* have to tell explicitly how the unicode object should be
encoded as bytes.  Python can't do this automatically because it has *no
idea* what the process at the other end of the socket expects.

Now you are complaining that Python chooses ASCII.  If it is changed to
something else, like MBCS, others start complaining why it is MBCS and
not something different.  See: No fix, just moving the problem to someone
else.

Ciao,
        Marc 'BlackJack' Rintsch

Well, you succeseded in putting all blame to myself alone. Great.

When you guy's are dealing with CJK characters in the future, you'll
find out what I mean.

In fact Boa Constructor keeps prompting ASCII and range(128) error on
my Windows. That's pretty cool.
 
L

Lie

Then why can't python use another default encoding internally
range(256)?


That's the problem. ASCII is INCOMPLETE!

What do you propose? Use mbsc and smack out linux computers? Use KOI
and make non-Russians suicide? Use GB and shot dead non-Chinese? Use
latin-1 and make emails servers scream?
If Python choose another default encoding which handles range(256),
80% of python unicode encoding problems are gone.

It's not HARD to process unicode, it's just python & python community
refuse to correct it.

Python's unicode support is already correct. Only your brainwave have
not been tuned to it yet.
 
E

est

What do you propose? Use mbsc and smack out linux computers? Use KOI
and make non-Russians suicide? Use GB and shot dead non-Chinese? Use
latin-1 and make emails servers scream?



Python's unicode support is already correct. Only your brainwave have
not been tuned to it yet.

Have you ever programmed with CJK characters before?
 
R

Roy Smith

Steven D'Aprano said:
from random import randint
''.join(chr(randint(0, 255)) for i in xrange(len(input)))


of course. How else should you get random bytes? :)

That a UUOL (Useless Usage Of Len; by analogy to UUOC). This works just as
well:

''.join(chr(randint(0, 255)) for i in input)
 
L

Lawrence D'Oliveiro

In message
est said:
Well, you succeseded in putting all blame to myself alone. Great.

Take it as a hint.
When you guy's are dealing with CJK characters in the future, you'll
find out what I mean.

Speaking as somebody who HAS dealt with CJK characters in the past--see
above.
 
G

Gabriel Genellina

En Sun, 28 Sep 2008 07:01:12 -0300, Olivier Lauzanne
Can anyone tell me how to customize a default encoding, let's say
'ansi' which handles range(256) ?

I assume you are using python2.5
Edit the file /usr/lib/python2.5/site.py

There is a method called
def setencoding():
[...]
encoding = "ascii"
[...]

Change "encoding = "ascii" to encoding = "utf-8"

On windows you may have to use "mbsc" or something like that. I have
no idea what windows use at its encoding.

*Not* a good idea at all.
You're just masking errors, and making your programs incompatible with all
other Pythons installed around the world.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top