sorting slovak utf

S

Stano Paska

Hi,

I have one problem.
In file aaa.txt I have slovak letters in utf-8.

zcaron
scaron
aacute
ocircumflex
tcaron
yacute
ccaron
eacute
lcaron
iacute
dcaron
uacute
adiaeresis
oacute
lacute
ncaron
racute

with this script (output is redirected to file bbb.txt):

import fileinput
riadky = []
a = fileinput.input("aaa.txt")
for i in a:
riadky.append(i.strip())
a.close()
riadky.sort()
for i in riadky:
print i

I have this result:

aacute
adiaeresis
eacute
iacute
oacute
ocircumflex
uacute
yacute
ccaron
dcaron
lacute
lcaron
ncaron
racute
scaron
tcaron
zcaron

and corrent result would be:

aacute
adiaeresis
ccaron
dcaron
eacute
iacute
lacute
lcaron
ncaron
oacute
ocircumflex
racute
scaron
tcaron
uacute
yacute
zcaron

I have set utf-8 in sitecustomize.py

I tried:
import locale
locale.setlocale(locale.LC_CTYPE, 'sk_SK.utf-8')
and
locale.setlocale(locale.LC_CTYPE, ('sk_SK', 'utf-8'))
but i got "unsupported locale" error

What I must do to get correct sorting result?

Stano.

P.S. lower, upper works correct
 
R

Radovan Garabik

Stano Paska said:
Hi,

I have one problem.
In file aaa.txt I have slovak letters in utf-8.

....


I tried:
import locale
locale.setlocale(locale.LC_CTYPE, 'sk_SK.utf-8')
and
locale.setlocale(locale.LC_CTYPE, ('sk_SK', 'utf-8'))
but i got "unsupported locale" error

What I must do to get correct sorting result?

you probably do not have sk_SK.UTF-8 locale generated
what OS, version are you using?
What is the output of locale -a ?
In some linux distributions, e.g. debian, you have to
generate the locale beforehaned, with locale-gen
(according to /etc/locale.gen file)


--
-----------------------------------------------------------
| Radovan Garabík http://melkor.dnp.fmph.uniba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
 
M

Martin v. =?iso-8859-15?q?L=F6wis?=

Stano Paska said:
import locale
locale.setlocale(locale.LC_CTYPE, 'sk_SK.utf-8')
and
locale.setlocale(locale.LC_CTYPE, ('sk_SK', 'utf-8'))
but i got "unsupported locale" error

What I must do to get correct sorting result?

You don't need to operate in a UTF-8 locale. Instead, any Slovak
locale will do, provided your system offers locale.strcoll for Unicode
objects (try locale.strcoll(u"", u"")).

In this case, you can convert all strings to Unicode, and then collate
using locale.strcoll.

Alternatively, you could set the locale to any Slovak locale, and use
locale.getpreferredencoding() to find the locale's encoding. Then you
could convert all input strings to that encoding, and use
locale.strcoll to collate them as byte strings.

Regards,
Martin
 
S

Serge Orlov

Stano Paska said:
I have windows xp, python 2.3.2
In this case you need to pass 'slovak' parameter instead of 'sk' to
locale.setlocale(). It's not written in the docs but locale name is
system dependant. I wonder maybe it's bug?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,172
Messages
2,570,934
Members
47,474
Latest member
AntoniaDea

Latest Threads

Top