P
peter pilsl
sorry, but I dont get this unicode-thingy right. I dont even know if it is
a perl-problem, cause I've three applications interacting.
The task is to enter text via a webinterface, store it to a sql-database
(postgres) and print it out to a webpage again. The text can be anything
from russian, german, english, spanish ...
Everything seems to run ok: The text entered in the webinterface is
processed in perl correct (The unicode-chars appears as two bytes in the
string) and stored in the database correct (as two-byte again). Even the
way back works like a charm and all the text is printed out correct.
The problem starts when doing the sorting.
If I let the SQL-database do the sorting I get all "exotic" chars sorted
wrong. (german umlaut-O is between A and B ...). I'll ask the
postgres-people about this.
If I let perl-sort do the job I get all the "exotics" at the end. (umlaut-O
is after Z)
I expect german umlaut-O to occure right between O and P.
Of course I could implement my own sorting-algorithm that deals with these
special problems, but this would slow down things and I dont think I should
do it, cause perl should be able to do it.
I think the problem (with sql-sort and perl-sort) is that obviously only
the first byte of the two-byte is taking into account when sorting. Is this
because I do something wrong or should to something or is this a very
common problem ?
To illustrate my problem, I put a small sample-script online:
http://www.goldfisch.at/cgi-bin/unicodetest4.pl
if you enter some text, the text will be inserted in the database and all
existing entries will be printed out, sorted by perl.
see the source at http://www.goldfisch.at/test/unicodetest4.txt
I've read about Unicode::Collate to change the collating/sorting-behaviour,
but I didnt get any clue how to use this to make "default"-latin sorting ..
any help is appretiated ..
thnx,
peter
a perl-problem, cause I've three applications interacting.
The task is to enter text via a webinterface, store it to a sql-database
(postgres) and print it out to a webpage again. The text can be anything
from russian, german, english, spanish ...
Everything seems to run ok: The text entered in the webinterface is
processed in perl correct (The unicode-chars appears as two bytes in the
string) and stored in the database correct (as two-byte again). Even the
way back works like a charm and all the text is printed out correct.
The problem starts when doing the sorting.
If I let the SQL-database do the sorting I get all "exotic" chars sorted
wrong. (german umlaut-O is between A and B ...). I'll ask the
postgres-people about this.
If I let perl-sort do the job I get all the "exotics" at the end. (umlaut-O
is after Z)
I expect german umlaut-O to occure right between O and P.
Of course I could implement my own sorting-algorithm that deals with these
special problems, but this would slow down things and I dont think I should
do it, cause perl should be able to do it.
I think the problem (with sql-sort and perl-sort) is that obviously only
the first byte of the two-byte is taking into account when sorting. Is this
because I do something wrong or should to something or is this a very
common problem ?
To illustrate my problem, I put a small sample-script online:
http://www.goldfisch.at/cgi-bin/unicodetest4.pl
if you enter some text, the text will be inserted in the database and all
existing entries will be printed out, sorted by perl.
see the source at http://www.goldfisch.at/test/unicodetest4.txt
I've read about Unicode::Collate to change the collating/sorting-behaviour,
but I didnt get any clue how to use this to make "default"-latin sorting ..
any help is appretiated ..
thnx,
peter