trouble w/ unicode file

G

Guilherme Salgado

Hi there,

I have a python source file encoded in unicode(utf-8) with some
iso8859-1 strings. I've encoded this file as utf-8 in the hope that
python will understand these strings as unicode (<type 'unicode'>)
strings whithout the need to use unicode() or u"" on these strings. But
this didn't happen.

Am I expecting something that really shoudn't happen or we have a bug?

This is the test i've made:
$cat bar.py
#-*- coding: utf-8 -*-
x = 'ééééáááááííí'
print x, type(x)

$python
Python 2.3.3 (#2, Jan 4 2004, 12:24:16)
[...]ééééáááááííí <type 'str'>

Thanks in advance,
[]'s
Guilherme Salgado
 
S

Serge Orlov

Guilherme Salgado said:
Hi there,

I have a python source file encoded in unicode(utf-8) with some
iso8859-1 strings. I've encoded this file as utf-8 in the hope that
python will understand these strings as unicode (<type 'unicode'>)
strings whithout the need to use unicode() or u"" on these strings. But
this didn't happen.

You hoped, but you forgot to pray <wink> Why do you think Python
should behave this way? There is (an experimental?) option -U that
forces all string literals to be unicode. Obviously if you use this option
your sources won't be easily distributable to other people

C:\Python23>python -U
Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Am I expecting something that really shoudn't happen or we have a bug?

We have a bug here as well. But in your code. The coding must
be the same as the coding of your source file. bar.py must be:
#-*- coding: latin-1 -*-
x = 'ééééáááááííí'
print x, type(x)

-- Serge.
 
G

Guilherme Salgado

You hoped, but you forgot to pray <wink> Why do you think Python
should behave this way? There is (an experimental?) option -U that

Ok, ok. I'll remember to pray next time. :)
I need to store unicode strings(declared in files) in ZODB, but i don't
want to use u"" around all my strings (cause most of them are latin-1),
so i think storing the file as unicode will work. Is there a better way
for doing this?
forces all string literals to be unicode. Obviously if you use this option
your sources won't be easily distributable to other people
[...]
Am I expecting something that really shoudn't happen or we have a bug?

We have a bug here as well. But in your code. The coding must
be the same as the coding of your source file. bar.py must be:
#-*- coding: latin-1 -*-
x = 'ééééáááááííí'
print x, type(x)

I didn't understand this (even after some pray) :)
My file is encoded in utf-8, look:
$ file bar.py
bar.py: UTF-8 Unicode text

Why should i declare it as latin1 encoded though?

[]'s
Guilherme Salgado
 
S

Serge Orlov

Guilherme Salgado said:
Ok, ok. I'll remember to pray next time. :)
I need to store unicode strings(declared in files) in ZODB, but i don't
want to use u"" around all my strings (cause most of them are latin-1),
so i think storing the file as unicode will work. Is there a better way
for doing this?

Not that I'm aware of.
I didn't understand this (even after some pray) :)
My file is encoded in utf-8, look:
$ file bar.py
bar.py: UTF-8 Unicode text

Why should i declare it as latin1 encoded though?

Sorry, I was confused by your words "with some iso8859-1 strings".
I thought you were using simple (unaware of encodings) editor and
just added #-*- coding: utf-8 -*- with hope that it will work. You're
right the coding should stay utf-8. After that you have two options:
either use -U option or put u before every string.

-- Serge.
 
G

Guest

Serge said:
Sorry, I was confused by your words "with some iso8859-1 strings".
I thought you were using simple (unaware of encodings) editor and
just added #-*- coding: utf-8 -*- with hope that it will work. You're
right the coding should stay utf-8. After that you have two options:
either use -U option or put u before every string.

There is a third option: Programmatically convert the strings to
Unicode, e.g.

# -*- coding: utf-8 -*-
s = "ééééáááááííí"
s = unicode(s, 'utf-8')

This assumes that you know thy source encoding at the point of
conversion.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,500
Latest member
ArianneJsb

Latest Threads

Top