J
Joakim Hove
Dear newsgroup,
I have written a cgi script in Python, and it has worked fine for some
time. Now the installed Python version has been upgraded to 2.4.1 and
I am having problems with non ascii characters.
The core of the problem I have is as follows:
1. The webpage contains a text field where the user enters her/his
name.
2. My cgi script uses the 'cgi' class to extract the name the user
has entered.
3. The cgi script writes the name the user has given to a file.
Now, the webpage in question is in Norway; and many Norwegian names
are not 8bit clean, i.e. they contain characters which can not be
represented with a 7bit ascii encoding. As a consequence I get
*either* a UnicodeDecodeError *or* just blanks when writing it to
file.
Simplest case:
--------------
name = "Åse"
fileH = open("/tmp/namelist.txt","w")
fileH.write(name)
In this case the first character in the name variable is not in the
plain 7bit ascii encoding. The code written above runs without errors
or warnings, but the problematic character is simple replaced by a
space in the file '/tmp/namelist.txt'.
More complicated case:
----------------------
The application uses the SOAP protocol via the ZSI module to
communicate with some other site. The SOAP call returns a variable,
and when this variable is combined with the name variable above I get
the UnicodeDecodeError:
name = "ÅSE"
ref = SOAP_return_value()
fileH = open("/tmp/namelist.txt","a")
fileH.write("name:%s ref:%s \n" % (name,ref))
fileH.close()
This bombs with:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in
position 45: ordinal not in range(128)
The variable 'ref' returned from the SOAP interaction is (seemingly
....) pure 7bit ascii.
Any suggestions greatly appreciated.
Joakim Hove
--
Joakim Hove
hove AT ntnu.no /
Tlf: +47 (73 5)9 34 27 / Stabburveien 18
Fax: ................. / N-5231 Paradis
http://www.ift.uib.no/~hove/ / 55 91 28 18 / 92 68 57 04
I have written a cgi script in Python, and it has worked fine for some
time. Now the installed Python version has been upgraded to 2.4.1 and
I am having problems with non ascii characters.
The core of the problem I have is as follows:
1. The webpage contains a text field where the user enters her/his
name.
2. My cgi script uses the 'cgi' class to extract the name the user
has entered.
3. The cgi script writes the name the user has given to a file.
Now, the webpage in question is in Norway; and many Norwegian names
are not 8bit clean, i.e. they contain characters which can not be
represented with a 7bit ascii encoding. As a consequence I get
*either* a UnicodeDecodeError *or* just blanks when writing it to
file.
Simplest case:
--------------
name = "Åse"
fileH = open("/tmp/namelist.txt","w")
fileH.write(name)
In this case the first character in the name variable is not in the
plain 7bit ascii encoding. The code written above runs without errors
or warnings, but the problematic character is simple replaced by a
space in the file '/tmp/namelist.txt'.
More complicated case:
----------------------
The application uses the SOAP protocol via the ZSI module to
communicate with some other site. The SOAP call returns a variable,
and when this variable is combined with the name variable above I get
the UnicodeDecodeError:
name = "ÅSE"
ref = SOAP_return_value()
fileH = open("/tmp/namelist.txt","a")
fileH.write("name:%s ref:%s \n" % (name,ref))
fileH.close()
This bombs with:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in
position 45: ordinal not in range(128)
The variable 'ref' returned from the SOAP interaction is (seemingly
....) pure 7bit ascii.
Any suggestions greatly appreciated.
Joakim Hove
--
Joakim Hove
hove AT ntnu.no /
Tlf: +47 (73 5)9 34 27 / Stabburveien 18
Fax: ................. / N-5231 Paradis
http://www.ift.uib.no/~hove/ / 55 91 28 18 / 92 68 57 04