Problem open'ing a file to read utf16

A

alex

I have a (little endian) UTF-16 unicode file which i want
to read. I use code looking like:


open(F, "<:encoding(utf16)", $file)
or die "can't open $file: $!\n";
while (<F>) {
print;
}

This works fine for the first few lines of the file,
before it throws an exception:

UTF-16:Unrecognised BOM 2400

What appears to be happening is that chunks of 1024 bytes
are being passed to Encode::Unicode::decode to break into
characters, and that on the second chunk there isn't (of
course!) a BOM and so it throws an exception.
The same also happens with

open(F, "<$file")
or die "can't open $file: $!\n";
binmode(F, ":encoding(utf16)");


So, what is the correct incantation of open'ing utf16 files?

TIA
 
B

Ben Morrow

I have a (little endian) UTF-16 unicode file which i want
to read. I use code looking like:

open(F, "<:encoding(utf16)", $file)
or die "can't open $file: $!\n";
while (<F>) {
print;
}

This works fine for the first few lines of the file,
before it throws an exception:

UTF-16:Unrecognised BOM 2400

What appears to be happening is that chunks of 1024 bytes
are being passed to Encode::Unicode::decode to break into
characters, and that on the second chunk there isn't (of
course!) a BOM and so it throws an exception.

Hmmmm.... try ':encoding(utf16le)': you'll have to strip the BOM
yourself.

Ben
 
A

alex

Ben Morrow said:
Hmmmm.... try ':encoding(utf16le)': you'll have to strip the BOM
yourself.

Ben

Thanks. This works as i had hoped.

One question remains - what is the point of the plain 'utf16'
encoding when opening files if it expects a BOM at the start
of every chunk. Surely it should remember the endiannes from
the initial BOM. Feels like a bug to me.
 
B

Ben Morrow

One question remains - what is the point of the plain 'utf16'
encoding when opening files if it expects a BOM at the start
of every chunk. Surely it should remember the endiannes from
the initial BOM. Feels like a bug to me.

It does indeed... I may take a look at it later.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top