XML_RPC and unicode problems

T

Thomas

I am currently passing email messages over XML_RPC as the payload for
a certain function call. On some of these messages, XML_RPC blows up
on the server side and says something to the effect of:

exceptions.UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in
position 1599: unexpected code byte

Using the native Python codec for doing conversions gives me a similar
error ('utf8' codec can't decode byte 0x93 in position 1328:
unexpected code byte). That gives me the feeling that these specific
messages are just funky. (Looking at the location in the file that
they are choking seems to be random characters)

What I've come to believe is that XML_RPC automatically assumes any
strings it transfers are unicode and thusly tries to do conversions on
these strings. Therefore, is there any way to keep XML_RPC from doing
unicode conversions, or is there some way for me to just pass raw data
over XML_RPC without having to worry about it?
 
G

Greg Hamilton

Thomas said:
I am currently passing email messages over XML_RPC as the payload for
a certain function call. On some of these messages, XML_RPC blows up
on the server side and says something to the effect of:

exceptions.UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in
position 1599: unexpected code byte

Using the native Python codec for doing conversions gives me a similar
error ('utf8' codec can't decode byte 0x93 in position 1328:
unexpected code byte). That gives me the feeling that these specific
messages are just funky. (Looking at the location in the file that
they are choking seems to be random characters)

What I've come to believe is that XML_RPC automatically assumes any
strings it transfers are unicode and thusly tries to do conversions on
these strings. Therefore, is there any way to keep XML_RPC from doing
unicode conversions, or is there some way for me to just pass raw data
over XML_RPC without having to worry about it?
http://www.xmlrpc.com/spec

Have a look at the <base64> data type.
 
P

phansen

Greg said:
http://www.xmlrpc.com/spec

Have a look at the <base64> data type.

Disclaimer: I haven't used XML-RPC yet.

I looked at the above site, and noted this particular text in the
explanatory section below the very lightweight "spec":

"""Q. What characters are allowed in strings? Non-printable characters?
Null characters? Can a "string" be used to hold an arbitrary chunk of
binary data?

A. Any characters are allowed in a string except < and &, which are
encoded as &lt; and &amp;. A string can be used to encode binary
data.
"""

Seems to me that description is inadequate, if one has to revert
to <base64> to pass through a string with an \xa0 in it.

I did a search and found this page from Fredrik Lundh, which seems
to be more clear on the whole thing, clearer even than the updated
spec which simply removed a previous reference to ASCII:
http://effbot.org/zone/xmlrpc-errata.htm

-Peter
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

phansen said:
Seems to me that description is inadequate, if one has to revert
to <base64> to pass through a string with an \xa0 in it.

No. \xa0 just is not a character. In XML, all bytes must denote
characters, and \xa0 does not denote any character when the
encoding is UTF-8.

To transmit binary data, use the base64 element, available through
xmlrpclib.Binary in Python.

Regards,
Martin
 
P

phansen

Martin said:
No. \xa0 just is not a character. In XML, all bytes must denote
characters, and \xa0 does not denote any character when the
encoding is UTF-8.

To transmit binary data, use the base64 element, available through
xmlrpclib.Binary in Python.

That's what I said, isn't it? Just checking, because you started
your response with "No", as though I had said something incorrect.

-Peter
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

phansen said:
That's what I said, isn't it? Just checking, because you started
your response with "No", as though I had said something incorrect.

Ah, I only read the answer of the fragment you quoted, not the
question :-(

Yes, that answer is confusing, as it doesn't really answer it
(although the answer itself is correct). Binary data and XML-RPC
has a long and confusing history.

Regards,
Martin
 
I

Ivan Voras

Martin said:
Binary data and XML-RPC
has a long and confusing history.

Why is that? There's <base64> for data that's expected to be binary[*],
and <string> for everything else that's valid under chosen encoding.


[*] obviously, aside from truly binary data, also for anything not valid
in current encoding.
 
G

Guest

Ivan said:
Martin said:
Binary data and XML-RPC
has a long and confusing history.


Why is that? There's <base64> for data that's expected to be binary[*],
and <string> for everything else that's valid under chosen encoding.

base64 originally wasn't part of the XML-RPC spec; it was added on
1/21/99. Before, the spec simultaneously claimed that the string
element contains ASCII, that "full XML" is allowed, and that the
string element can carry arbitrary binary data.

These were all mutually contradicting: If you were to put arbitrary
bytes into a string element, it would neither be well-formed XML
(atleast not if you choose us-ascii or utf-8 as the encoding), nor
would the strings be pure ASCII.

Also, if the string can only carry ASCII, how can it be
simultaneously allow for arbitrary XML?

People have asked all these questions, and Dave Winer always
said "read the spec, it says it all", when it really didn't.

I believe that Dave's understanding was the following: With
"ASCII", he didn't really mean "American Standard Code for
Information Interchange". He meant that all bytes in the
document must have ordinals < 127. He was fine with people
putting character references (such as Ü) into string
elements. He clarified that aspect on 6/30/03, by removing
"ASCII" from the description of string.

Wrt. binary data, I think he meant that you could use
base64, uuencode, hex, whatever, in a string element, and
thus represent arbitrary bytes. Of course, this would not
be very interoperable, so he added base64.

Regards,
Martin
 
F

Fredrik Lundh

Ivan said:
Why is that? There's <base64> for data that's expected to be binary[*], and <string> for
everything else that's valid under chosen encoding.

"that's valid in XML", that is. no matter what encoding you use, you
can still use character references to insert other characters.

</F>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top