Perl: Read French

micropentium · Feb 3, 2010

Hi,

I have a perl script that needs to read plain text from database that
may contain French. My script failed to interpret French characters
but those French characters look OK in the database.

My question is: How does Perl handle the unicode?

Many thanks

Jürgen Exner · Feb 3, 2010

micropentium said:
I have a perl script that needs to read plain text from database that
may contain French. My script failed to interpret French characters
but those French characters look OK in the database.

My question is: How does Perl handle the unicode?

Just fine, no problems.

Maybe that database is not in Unicode but in some other character set?
Have you tried ISO-8859-1 or -15 or Windopws-1252?
Those are the most likely candidates but there are others, too.

Or maybe your are simply using the wrong encoding, e.g. UTF-16 when the
database returns UTF-8?

jue

micropentium · Feb 3, 2010

Just fine, no problems.

Maybe that database is not in Unicode but in some other character set?
Have you tried ISO-8859-1 or -15 or Windopws-1252?
Those are the most likely candidates but there are others, too.

Or maybe your are simply using the wrong encoding, e.g. UTF-16 when the
database returns UTF-8?

jue

Hi JE,

I am actually a newbie to Perl and not familiar with Perl's unicde
processing. Would you mind to provide a small piece of codes on
unicode handling? So I can take them as the start point.

Cordially,

Helmut Richter · Feb 3, 2010

That's exaggerated.

It is the user who has to keep track which of his strings are meant as
bytes and which are meant as text characters. The details are explained
in http://perldoc.perl.org/perlunitut.html .

Problems may arise when subroutines of unknown modules are used and it is not
specified which kind of strings are expected.

This should be seriously considered as a possible source of problems.
The code used in the data must be known; it cannot be inferred from the
contents read.

(That's the theory. In practice, it is highly improbable that a string of
bytes is meant as anything other the UTF-8 if it is a correct UTF-8 string.)

I am actually a newbie to Perl and not familiar with Perl's unicde
processing. Would you mind to provide a small piece of codes on
unicode handling? So I can take them as the start point.

You should start with thoroughly understanding the tutorial cited above and
then understand other people's code.

Jim Gibson · Feb 3, 2010

micropentium said:
I am actually a newbie to Perl and not familiar with Perl's unicde
processing. Would you mind to provide a small piece of codes on
unicode handling? So I can take them as the start point.

Check out the documentation that comes with Perl:

perldoc perlunicode

Jürgen Exner · Feb 3, 2010

Helmut Richter said:
That's exaggerated.

Well, each and every Perl text is in Unicode already. So there really
_is_ no problem. The problems appear when you start mugging around and
interfacing with other character sets and encodings, Then you really
need to keep track of if you have (Perl) text (in Unicode) or some
binary data in some other format and when and how to convert between
those. Not to mention to use the right encoding settings when reading
from such files as was discussed very recently here.

On the plus side there are some really great conversion tools and years
ago it was Perl that helped me to save a very large software product by
being able to automatically convert text into numerous local email
encodings.

It is the user who has to keep track which of his strings are meant as
bytes and which are meant as text characters. The details are explained
in http://perldoc.perl.org/perlunitut.html .

Yikes! The term "string" usually implies text, therefore may I rephrase
that as "... has to keep track which of his scalars are meant to contain
binary data (e.g. pictures, hex dumps, file images, yenc-encoded data,
shift-JIS encoded email, ...) and which are meant as text"? This way
you can avoid the awkward "byte string".

Problems may arise when subroutines of unknown modules are used and it is not
specified which kind of strings are expected.

It should (emphais being on should) be clear if they expect binary data
or text.

You should start with thoroughly understanding the tutorial cited above and
then understand other people's code.

Thanks, should have mentioned that myself.

jue

UTF-8 read & print?	6	Nov 25, 2012
FOSS or Freeware, Prefferably Runs on Linux Mint: Search US Goverment Records, Legally to Find Literarary Work	8	Apr 5, 2023
French Accents appear incorrectly...	4	Jan 29, 2007
Perl to SQLite bridge is not working, database connect fails ....	3	Apr 8, 2014
French characters and Perl	2	Jun 1, 2004
Survey details won't go through using php, ajax, Mysql	3	Oct 26, 2023
Peculiar issue with French characters	6	Jan 30, 2006
How to force formatted date (month) language ?	8	Jul 7, 2006

Perl: Read French

micropentium

Jürgen Exner

micropentium

Helmut Richter

Jim Gibson

Jürgen Exner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads