A
Arian Kuschki
Hi all
this has been bugging me for a long time and I do not seem to be able to
understand what to do. I always have problems when dealing input text that
contains umlauts. Consider the following:
In [1]: import urllib
In [2]: f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen")
In [3]: xml = f.read()
In [4]: f.close()
In [5]: print xml
------> print(xml)
data=""/><longitude_e6 data=""/><forecast_date
data="2009-10-17"/><current_date_time data="2009-10
-17 14:20:00 +0000"/><unit_system
data="SI"/></forecast_information><current_conditions><condition data="Meistens
bew�kt"/><temp_f data="43"/><temp_c data="6"/><h
umidity data="Feuchtigkeit: 87�%"/><icon
data="/ig/images/weather/mostly_cloudy.gif"/><wind_condition data="Wind: W mit
Windgeschwindigkeiten von 13 km/h"/></curr
ent_conditions><forecast_conditions><day_of_week data="Sa."/><low
data="1"/><high data="7"/><icon
data="/ig/images/weather/chance_of_rain.gif"/><condition data="V
ereinzelt Regen"/></forecast_conditions><forecast_conditions><day_of_week
data="So."/><low data="-1"/><high data="8"/><icon
data="/ig/images/weather/chance_of_sno
w.gif"/><condition data="Vereinzelt
Schnee"/></forecast_conditions><forecast_conditions><day_of_week
data="Mo."/><low data="-4"/><high data="8"/><icon data="/ig/i
mages/weather/mostly_sunny.gif"/><condition data="Teils
sonnig"/></forecast_conditions><forecast_conditions><day_of_week
data="Di."/><low data="0"/><high data="8"
/><icon data="/ig/images/weather/sunny.gif"/><condition
data="Klar"/></forecast_conditions></weather></xml_api_reply>
As you can see the umlauts in the XML are not displayed properly. When I want
to process this text (for example with xml.sax), I get error messages because
the parses can't read this.
I've tried to read up on this and there is a lot of information on the web, but
nothing seems to work for me. For example setting the coding to UTF like this:
# -*- coding: utf-8 -*- or using the decode() string method.
I always have this kind of problem when input contains umlauts, not just in
this case. My locale (on Ubuntu) is en_GB.UTF-8.
Cheers
Arian
this has been bugging me for a long time and I do not seem to be able to
understand what to do. I always have problems when dealing input text that
contains umlauts. Consider the following:
In [1]: import urllib
In [2]: f = urllib.urlopen("http://www.google.de/ig/api?weather=Muenchen")
In [3]: xml = f.read()
In [4]: f.close()
In [5]: print xml
------> print(xml)
y data="Munich, BY"/><postal_code data="Muenchen"/><latitude_e6<forecast_information><cit
data=""/><longitude_e6 data=""/><forecast_date
data="2009-10-17"/><current_date_time data="2009-10
-17 14:20:00 +0000"/><unit_system
data="SI"/></forecast_information><current_conditions><condition data="Meistens
bew�kt"/><temp_f data="43"/><temp_c data="6"/><h
umidity data="Feuchtigkeit: 87�%"/><icon
data="/ig/images/weather/mostly_cloudy.gif"/><wind_condition data="Wind: W mit
Windgeschwindigkeiten von 13 km/h"/></curr
ent_conditions><forecast_conditions><day_of_week data="Sa."/><low
data="1"/><high data="7"/><icon
data="/ig/images/weather/chance_of_rain.gif"/><condition data="V
ereinzelt Regen"/></forecast_conditions><forecast_conditions><day_of_week
data="So."/><low data="-1"/><high data="8"/><icon
data="/ig/images/weather/chance_of_sno
w.gif"/><condition data="Vereinzelt
Schnee"/></forecast_conditions><forecast_conditions><day_of_week
data="Mo."/><low data="-4"/><high data="8"/><icon data="/ig/i
mages/weather/mostly_sunny.gif"/><condition data="Teils
sonnig"/></forecast_conditions><forecast_conditions><day_of_week
data="Di."/><low data="0"/><high data="8"
/><icon data="/ig/images/weather/sunny.gif"/><condition
data="Klar"/></forecast_conditions></weather></xml_api_reply>
As you can see the umlauts in the XML are not displayed properly. When I want
to process this text (for example with xml.sax), I get error messages because
the parses can't read this.
I've tried to read up on this and there is a lot of information on the web, but
nothing seems to work for me. For example setting the coding to UTF like this:
# -*- coding: utf-8 -*- or using the decode() string method.
I always have this kind of problem when input contains umlauts, not just in
this case. My locale (on Ubuntu) is en_GB.UTF-8.
Cheers
Arian