UTF-8 and Spreadsheet::ParseExcel

roberto0 · Aug 17, 2005

Hello,

I'm trying to parse a large number of multilingual Excel sheets such
that I can load much of the data into an Oracle database. The problem
is that there are a number of UTF-8 characters that are not recognized
as "chars" by the DB and we need those fields to be searchable. The DB
requirement is for my script to generate ASCII characters and/or
transliterations from those UTF-8 characters. In other words, the DB
people want "alpha" to replace the UTF-8 {GREEK SMALL LETTER ALPHA}.

This is all fine and good and I have scripts that do this rather well
for Unicode or other UTF-8 files. The problem arises when I use
Spreadsheet:

arseExcel to read MS Excel files. It seems that the
parser only picks up the last half of the character. (last 4 bytes of
the 8-byte character, I think) It then becomes impossible to
differentiate between certain UTF8 characters since many have the same
second half.

for example the UTF8 symbols for {MICRO SYMBOL} and {GREEK SMALL LETTER
EPSILON} are gleaned from ParseExcel as <B5>. When I parse the same
symbols from a plain unicode text file, each character is reported as
<A3><B5> and <21><B5> respectively.

I know ParseExcel uses OLE::Storage as its interface. Could the
problem lie there?

roberto0 · Aug 17, 2005

acutally, the MICRO SIGN is just <B5> and and GREEK SMALL LETTER
EPSILON is <CE><B5>.

Someone suggested that the context of the files I'm parsing may be the
key to determining the answer to my problem. However, the files I'm
parsing aren't perfect, and the less I rely on the context, the better.

Thanks in advance for any tips or advice,

roberto0

UTF-8 vs w_char	48	Nov 3, 2013
CGI and UTF-8	14	Sep 28, 2009
UTF-8 and strings	44	Jun 7, 2011
Unicode (UTF-8) in C	13	Mar 16, 2014
Problems of Perl program with Spreadsheet::ParseExcel module	3	May 9, 2006
UTF-8 problem	8	Aug 21, 2007
MeCab UTF-8 Decoding Problem	6	Jun 29, 2013
Encoding of surrogate code points to UTF-8	14	Oct 8, 2013

UTF-8 and Spreadsheet::ParseExcel

roberto0

roberto0

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads