Ubunu - Linux - Unicode - encoding

  • Thread starter Franz Steinhaeusler
  • Start date
F

Franz Steinhaeusler

Hello NG, a little longer question,

I'm working on our project DrPython and try fix bugs in Linux,
(on windows, it works very good now with latin-1 encoding).

On Windows, it works good now, using setappdefaultencoding and the right
encoding for open with styled text control with the right encoding the
files. (I see the german Umlauts äöü and the "strong 's'" "ß")

The case:
I have a file on a WindowsXP partition which has as contents german
umlauts and the filename itself has umlauts like iÜüäßk.txt

If I want to append this file to a list, I get somehow latin-1, cannot
decode 'utf-8'.

sys.setappdefaultencoding(self.prefs.defaultencoding) would be the
easiest solution which should be the same aus sys.setdefaultencoding in
linux.

Why is there a setappdefaultencoding on Windows and
sys.setdefaultencoding on linux.

I googled, and I found a strange solution (sys.setdefaultencoding is not
available)

import sys
reload (sys)

only then this function is available.
Why is this setdefaultencoding otherwise not working on linux?

(Also Filemanagers like Nautilus or Krusader cannot display the files
correctly).

Is there a system wide linux language setting (encoding), which I have
to install and adjust?

I know, there are the methods encode, unicode, decode, but how do I
know, when they are needed, I don't want to replace all the source for
encode, ... for string access.
So setappdefaultencoding would be the easiest way.

Should I use also/instead the wx.SetDefaultPyEncoding in DrPython?

This would be the easiest solution, setappdefaultencoding, (getting it
from preferences) but it doesn't work.

Beside I tried other editors like spe, pype, boa, ulipad, but none of
them displayed the file, which have german umlauts in the filesnames,
correctly.

Thank you verrrry much in advance for a possible solution.
 
A

Alan Franzoni

Il Thu, 01 Feb 2007 16:02:52 +0100, Franz Steinhaeusler ha scritto:
The case:
I have a file on a WindowsXP partition which has as contents german
umlauts and the filename itself has umlauts like iÜüäßk.txt

Could you please tell us a) which filesystem is that partition using (winxp
may be installed on fat32 or ntfs partitions) and b) which driver are you
using to read that partition (may be vfat, ntfs or fuse/ntfs-3g) and, last
but not least, c) which options are passed to that driver?

--
Alan Franzoni <[email protected]>
-
Togli .xyz dalla mia email per contattarmi.
Remove .xyz from my address in order to contact me.
-
GPG Key Fingerprint (Key ID = FE068F3E):
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
 
P

Paul Boddie

The case:
I have a file on a WindowsXP partition which has as contents german
umlauts and the filename itself has umlauts like iÜüäßk.txt

If I want to append this file to a list, I get somehow latin-1, cannot
decode 'utf-8'.

You mean that you expect the filename in UTF-8, but it arrives as
ISO-8859-1 (Latin1)? How do you get the filename? Via Python standard
library functions or through a GUI toolkit? What does
sys.getfilesystemencoding report?

[...]
Why is this setdefaultencoding otherwise not working on linux?

My impression was that you absolutely should not change the default
encoding. Instead, you should react to encoding information provided
by your sources of data. For example, sys.stdin.encoding tells you
about the data from standard input.
(Also Filemanagers like Nautilus or Krusader cannot display the files
correctly).

This sounds like a locale issue...
Is there a system wide linux language setting (encoding), which I have
to install and adjust?

I keep running into this problem when installing various
distributions. Generally, the locale needs to agree with the encoding
of the filenames in your filesystem, so that if you've written files
with UTF-8 filenames, you'll only see them with their proper names if
the locale you're using is based on UTF-8 - things like en_GB.utf8 and
de_AT.utf8 would be appropriate. Such locales are often optional
packages, as I found out very recently, and you may wish to look at
the language-pack-XX and language-pack-XX-base packages for Ubuntu
(substituting XX for your chosen language). Once they are installed,
typing "locale -a" will let you see available locales, and I believe
that changing /etc/environment and setting the LANG variable there to
one of the available locales may offer some kind of a solution.

Another thing I also discovered very recently, after doing a
debootstrap installation of Ubuntu, was that various terminals
wouldn't reproduce non-ASCII characters without an appropriate (UTF-8)
locale being set up, even though other desktop applications were happy
to accept and display the characters. I thought this was a keyboard
issue, compounded by the exotic nested X server plus User Mode Linux
solution I was experimenting with, but I think locales were the main
problem.

Paul
 
F

Franz Steinhäusler

Il Thu, 01 Feb 2007 16:02:52 +0100, Franz Steinhaeusler ha scritto:


Could you please tell us a) which filesystem is that partition using (winxp
may be installed on fat32 or ntfs partitions) and b) which driver are you
using to read that partition (may be vfat, ntfs or fuse/ntfs-3g) and, last
but not least, c) which options are passed to that driver?

Hallo Alan, thank you for answering.

a) FAT32
b) hm, don't know, I mounted it in fstab as FAT32#
in fstab there is:
c)/dev/hda1 /winxp auto rw,user,auto 0

One problem is still, and is a little OT perhaps here, but
nevertheless,:

If I copy files with german umlauts (äöü and strong 's' ß), these
filenames are not copied properly, and that characters are replaces
by little square symbols.

Is there anythin to set up on ubuntu to copy this properly.
I have only installed the english language, maybe that is the problem.
 
F

Franz Steinhäusler

You mean that you expect the filename in UTF-8, but it arrives as
ISO-8859-1 (Latin1)? How do you get the filename? Via Python standard
library functions or through a GUI toolkit? What does
sys.getfilesystemencoding report?

Hello Paul,

I set the sysencoding already to 'latin-1', but obviously the value
is ignored and it takes 'utf-8' (?)

I get it with
thelist = os.listdir(directory) and the directory is a string, not
unicode.
[...]
Why is this setdefaultencoding otherwise not working on linux?

My impression was that you absolutely should not change the default
encoding.
Aha.


Instead, you should react to encoding information provided
by your sources of data. For example, sys.stdin.encoding tells you
about the data from standard input.
(Also Filemanagers like Nautilus or Krusader cannot display the files
correctly).

This sounds like a locale issue...

Hm, a setting in linux.
I keep running into this problem when installing various
distributions. Generally, the locale needs to agree with the encoding
of the filenames in your filesystem, so that if you've written files
with UTF-8 filenames, you'll only see them with their proper names if
the locale you're using is based on UTF-8 - things like en_GB.utf8 and
de_AT.utf8 would be appropriate. Such locales are often optional
packages, as I found out very recently, and you may wish to look at
the language-pack-XX and language-pack-XX-base packages for Ubuntu
(substituting XX for your chosen language). Once they are installed,
typing "locale -a" will let you see available locales, and I believe
that changing /etc/environment and setting the LANG variable there to
one of the available locales may offer some kind of a solution.

Ah thank you very much for that enlightment!
Another thing I also discovered very recently, after doing a
debootstrap installation of Ubuntu, was that various terminals
wouldn't reproduce non-ASCII characters without an appropriate (UTF-8)
locale being set up, even though other desktop applications were happy
to accept and display the characters.

That sound familar to me! ;)
I thought this was a keyboard
issue, compounded by the exotic nested X server plus User Mode Linux
solution I was experimenting with, but I think locales were the main
problem.

Paul

So that is not exactly simple. :)

Thank you very much for this precise answer!
 
A

Alan Franzoni

Il Thu, 01 Feb 2007 20:57:53 +0100, Franz Steinhäusler ha scritto:
If I copy files with german umlauts (äöü and strong 's' ß), these
filenames are not copied properly, and that characters are replaces
by little square symbols.

Yes... I, myself, am italian, and I found no problem in using accented
letter (òèàìù). Since you say there's a problem as well in Nautilus and
other Ubuntu software, I suppose there's something wrong with your linux
setup, not with Python.

Or, at least: you should try solving that problem first, then check what
happens with python.

Try appending this options in your fstab as hda1 mount options:

iocharset=iso8859-15

unmount & remount and check what does happen.

--
Alan Franzoni <[email protected]>
-
Togli .xyz dalla mia email per contattarmi.
Remove .xyz from my address in order to contact me.
-
GPG Key Fingerprint (Key ID = FE068F3E):
5C77 9DC3 BD5B 3A28 E7BC 921A 0255 42AA FE06 8F3E
 
F

Franz Steinhäusler

Il Thu, 01 Feb 2007 20:57:53 +0100, Franz Steinhäusler ha scritto:


Yes... I, myself, am italian, and I found no problem in using accented
letter (òèàìù). Since you say there's a problem as well in Nautilus and
other Ubuntu software, I suppose there's something wrong with your linux
setup, not with Python.

Or, at least: you should try solving that problem first, then check what
happens with python.

Try appending this options in your fstab as hda1 mount options:

iocharset=iso8859-15

unmount & remount and check what does happen.


Thank you again, I will give it a try!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top