H
Hemant Shah
Folks,
I am having problem writing Japanese characters.
I am parsing an XML document that is in utf-8, actually it is a
content.xml file from Open Office. It contains Japanese text along
with english text. (english text and it's japanese translation).
I want to write the the english and japanese text into individual
files.
Another process will read these individual files and insert the it
into DB2 database which is also in utf-8.
I am having problem writing japanese text to a file.
I am running perl 5.8.3 on AIX 5.2.
Here are the code fragments from my script:
use Encode;
use encoding utf8, STDOUT => "utf8", STDIN => "utf8";
use XML:arser;
$ContentParser = new XML:arser(Handlers => {Start => \&HandleContentStart,
End => \&HandleContentEnd,
Default => \&DefaultContentHandler,
Char => \&HandleContentChar});
$ContentParser->parsefile ("content.xml", ProtocolEncoding => 'UTF-8');
# In HandleContentChar() subroutine
open (TEMPFILE, ">:encoding(utf8)", $TmpFile) ||
die "Cannot open temporary file for write $TmpFile. $!";
# Code to print XML tags
print TEMPFILE "$JapaneseText";
# Code to print XML tags
close(TEMPFILE);
When I look at the Japanese text in content.xml file and $TmpFile (hex dump),
they are different.
Also is there a way to split the Japanese text at unicode character
boundary. I would like to store lines of 100 (single byte) characters or
less per line. I do not have any problem with english and spanish text,
but japanese characters are double byte, so I would like to split the
line at 50 japanese characters.
Thanks in advance.
--
Hemant Shah /"\ ASCII ribbon campaign
E-mail: (e-mail address removed) \ / ---------------------
X against HTML mail
TO REPLY, REMOVE NoJunkMail / \ and postings
FROM MY E-MAIL ADDRESS.
-----------------[DO NOT SEND UNSOLICITED BULK E-MAIL]------------------
I haven't lost my mind, Above opinions are mine only.
it's backed up on tape somewhere. Others can have their own.
I am having problem writing Japanese characters.
I am parsing an XML document that is in utf-8, actually it is a
content.xml file from Open Office. It contains Japanese text along
with english text. (english text and it's japanese translation).
I want to write the the english and japanese text into individual
files.
Another process will read these individual files and insert the it
into DB2 database which is also in utf-8.
I am having problem writing japanese text to a file.
I am running perl 5.8.3 on AIX 5.2.
Here are the code fragments from my script:
use Encode;
use encoding utf8, STDOUT => "utf8", STDIN => "utf8";
use XML:arser;
$ContentParser = new XML:arser(Handlers => {Start => \&HandleContentStart,
End => \&HandleContentEnd,
Default => \&DefaultContentHandler,
Char => \&HandleContentChar});
$ContentParser->parsefile ("content.xml", ProtocolEncoding => 'UTF-8');
# In HandleContentChar() subroutine
open (TEMPFILE, ">:encoding(utf8)", $TmpFile) ||
die "Cannot open temporary file for write $TmpFile. $!";
# Code to print XML tags
print TEMPFILE "$JapaneseText";
# Code to print XML tags
close(TEMPFILE);
When I look at the Japanese text in content.xml file and $TmpFile (hex dump),
they are different.
Also is there a way to split the Japanese text at unicode character
boundary. I would like to store lines of 100 (single byte) characters or
less per line. I do not have any problem with english and spanish text,
but japanese characters are double byte, so I would like to split the
line at 50 japanese characters.
Thanks in advance.
--
Hemant Shah /"\ ASCII ribbon campaign
E-mail: (e-mail address removed) \ / ---------------------
X against HTML mail
TO REPLY, REMOVE NoJunkMail / \ and postings
FROM MY E-MAIL ADDRESS.
-----------------[DO NOT SEND UNSOLICITED BULK E-MAIL]------------------
I haven't lost my mind, Above opinions are mine only.
it's backed up on tape somewhere. Others can have their own.