Perls use of Unicode

R

roddy

Context: Perl 5.008 running on Win98

I wonder if anyone can help:
I have a script which opens a large text file (iso-latin 1) does some
processing and writes the results to another large text file (which,on
examination looks like a double byte code and has 'FF FE' marker bytes at
the begining of the file - which is the marker for utf-16 (little endian)
unicode encoding).

The conversion is causing problems with international characters (in this
case umlauts) used and I would like the output file to be iso-latin 1.

All I can findout about switching off unicode is the 'no uft8' pragma - I've
tried this but it doesn't seem to make any difference - any pointers would
be much appreciated - thanks.

Rod Digges
Wiener Library, London

(e-mail address removed)
 
A

Alan J. Flavell

Context: Perl 5.008 running on Win98

I wonder if anyone can help: I have a script which opens a large
text file (iso-latin 1) does some processing and writes the results
to another large text file (which,on examination looks like a double
byte code and has 'FF FE' marker bytes at the begining of the file -
which is the marker for utf-16 (little endian) unicode encoding).

I think we need a bit more detail than this. Perl would not, by
itself, unilaterally decide to write a file in utf-16LE format, but
that -is- the native Unicode format in Windows(NT). Are you saying
that you are trying to write to a pre-existing file that happens to be
in this format, or are you saying that your Perl program creates a
new file and then you find to your surprise that utf-16LE has been
written to it? If the latter, then I can't help feeling that the
reason is to be found somewhere in your code.

Have you perhaps got a locale defined in your environment which is
having the effect of telling Perl to use utf-16LE ?
The conversion is causing problems with international characters (in
this case umlauts)

If you're writing iso-8859-1 -encoded data into a file that already
contains utf-16LE -encoded data then indeed the result will make
little sense. But I still don't have a clear picture of why you're
doing that.
All I can findout about switching off unicode is the 'no uft8'
pragma

Have you read (at least) perldoc perluniintro (if not also
perlunicode) ?
- I've tried this but it doesn't seem to make any difference - any
pointers would be much appreciated - thanks.

I'm sure someone here will have the answer (even if it doesn't turn
out to be me), but I'd rather understand the question better before
suggesting what the answer might be.

Can you show us a bare-bones stripped-down but complete script which
demonstrates the behaviour?
 
P

Peter Michael

Roddy,

roddy said:
Context: Perl 5.008 running on Win98
[snip]

All I can findout about switching off unicode is the 'no uft8' pragma - I've
tried this but it doesn't seem to make any difference -

This is certainly not what you want since the utf8
pragma effects the interpretation of your source
code.

You should have a look at PerlIO::encoding maybe you are
looking for something like

open my $fh, ">:encoding(latin1)", "file" or die $!;

HTH,

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top