string u'hyv\xe4' to file as 'hyvä'

G

gintare

Could you please help me with special characters saving to file.

I need to write the string u'hyv\xe4' to file.
I would like to open file and to have line 'hyvä'

import codecs
word= u'hyv\xe4'
F=codecs.open(/opt/finnish.txt, 'w+','Latin-1')

F.writelines(item.encode('Latin-1'))
F.writelines(item.encode('utf8'))
F.writelines(item)

F.close()

All three writelines gives the same result in finnish.txt: hyv\xe4
i would like to find 'hyvä'.

regards,
gintare
 
M

MRAB

Could you please help me with special characters saving to file.

I need to write the string u'hyv\xe4' to file.
I would like to open file and to have line 'hyvä'

import codecs
word= u'hyv\xe4'
F=codecs.open(/opt/finnish.txt, 'w+','Latin-1')

This opens the file using the Latin-1 encoding (although only if you
put the filename in quotes).
F.writelines(item.encode('Latin-1'))

This encodes the Unicode item (did you mean 'word'?) to a bytestring
using the Latin-1 encoding. You opened the file using Latin-1 encoding,
so this is pointless. You should pass a Unicode string; it will encode
it for you.

You're also passing a bytestring to the .writelines method, which
expects a list of strings.

What you should be doing is this:

F.write(word)
F.writelines(item.encode('utf8'))

This encodes the Unicode item to a bytestring using the UTF-8 encoding.
This is also pointless. You shouldn't be encoding to UTF-8 and then
trying to write it to a file which was opened using Latin-1 encoding!
 
G

gintare

Hello,
STILL do not work. WHAT to be done.

import codecs
item=u'hyv\xe4'
F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
F.writelines(item.encode('utf8'))
F.close()

In file i find 'hyv\xe4' instead of hyvä.

(Sorry for mistyping in previous letter about 'latin-1'. I was making
all possible combinations, when normal example syntax did not work,
before writting to this forum.)

regards,
gintare
 
M

Mark Tolonen

gintare said:
Hello,
STILL do not work. WHAT to be done.

import codecs
item=u'hyv\xe4'
F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
F.writelines(item.encode('utf8'))
F.close()

In file i find 'hyv\xe4' instead of hyvä.

When you open a file with codecs.open(), it expects Unicode strings to be
written to the file. Don't encode them again. Also, .writelines() expects
a list of strings. Use .write():

import codecs
item=u'hyv\xe4'
F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
F.write(item)
F.close()

An additional comment, if you save the script in UTF8, you can inform Python
of that fact with a special comment, and actually use the correct characters
in your string constants (ä instead of \xe4). Make sure to use a text
editor that can save in UTF8, or use the correct coding comment for whatever
encoding in which you save the file.

# coding: utf8
import codecs
item=u'hyvä'
F=codecs.open('finnish.txt', 'w+', 'utf8')
F.write(item)
F.close()

-Mark
 
A

Alex Willmer

When you open a file with codecs.open(), it expects Unicode strings to be
written to the file.  Don't encode them again.  Also, .writelines() expects
a list of strings.  Use .write():

    import codecs
    item=u'hyv\xe4'
    F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
    F.write(item)
    F.close()

Gintare, Mark's code is correct. When you are reading the file back
make sure you understand what you are seeing:
u'hyv\xe4'

That might like as though item2 is 7 characters long, and it contains
a backslash followed by x, e, 4. However item2 is identical to item,
they both contain 4 characters - the final one being a-umlaut. Python
has shown the string using a backslash escape, because printing a non-
ascii character might fail. You can see this directly, if your Python
session is running in a terminal (or GUI) that can handle non-ascii
characters:
hyvä
 
M

MRAB

Hello,
STILL do not work. WHAT to be done.

import codecs
item=u'hyv\xe4'
F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
F.writelines(item.encode('utf8'))
> F.close()

As I said in my previous post, you shouldn't be using .writelines, and
you shouldn't encode it when writing it to the file because codecs.open
will do that for you, that's its purpose:

import codecs
item = u'hyv\xe4'
F = codecs.open('/opt/finnish.txt', 'w+', 'utf8')
F.write(item)
F.close()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top