Unicode string conversion

  • Thread starter Alexandre Rosenfeld
  • Start date
A

Alexandre Rosenfeld

I'm reading a binary file in my program. It contains strings in the
Windows Unicode format, which it says is stored as little-endian in the
spefication. I'm loading it and trying to convert using Iconv, but I'm
getting a invalid character exception, on any string. Now I'm just
stripping the \000 character from it and it works, but I know it's not
an ideal solution and it only works in some cases.
So, how can I get the string in a format Ruby can understand? By the
way, I'll load these string in GTK (with Ruby bindings), anyone knows if
it can show Unicode strings?
 
J

John Joyce

I'm reading a binary file in my program. It contains strings in the
Windows Unicode format, which it says is stored as little-endian in
the
spefication. I'm loading it and trying to convert using Iconv, but I'm
getting a invalid character exception, on any string. Now I'm just
stripping the \000 character from it and it works, but I know it's not
an ideal solution and it only works in some cases.
So, how can I get the string in a format Ruby can understand? By the
way, I'll load these string in GTK (with Ruby bindings), anyone
knows if
it can show Unicode strings?
Stripping the BOM? (byte order mark)
Should be fine. Unicode works just as well w/ no BOM, actually better
with no BOM.
The first thing you should check for though is the presence of the
BOM and read the BOM.
 
A

Alexandre Rosenfeld

John said:
Stripping the BOM? (byte order mark)
Should be fine. Unicode works just as well w/ no BOM, actually better
with no BOM.
The first thing you should check for though is the presence of the
BOM and read the BOM.

There is no BOM. The specifications clearly states it "uses UTF-16,
little endian, and the Byte-Order Marker (BOM) character is not present"

What I'm confused is to why Iconv coudlnt convert it. Does Iconv expects
for the BOM, even when I specify UTF16LE, which would make it explicit
the byte order?
 
N

Nobuyoshi Nakada

Hi,

At Sun, 6 May 2007 23:05:37 +0900,
Alexandre Rosenfeld wrote in [ruby-talk:250503]:
What I'm confused is to why Iconv coudlnt convert it. Does Iconv expects
for the BOM, even when I specify UTF16LE, which would make it explicit
the byte order?

BOM is a "ZERO WIDTH NON-BREAKING SPACE" at the beginning of
a text. Almost iconv(3) should be possible to deal with it.
Can't you show minimal data to reproduce the error?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,139
Messages
2,570,805
Members
47,351
Latest member
LolaD32479

Latest Threads

Top