G
Guest
Dear wizards,
I use XML::Simple to parse an XML file and
also to write it out. The problem lies in the
utf8 character data contained in the XML
source. While the XMLin() function seems
to read them properly, the XMLout() function
tries to replace utf8 material by multibyte
nonsense.
Below is my minimal example, run under perl 5.8.5
on a Fedora C3 box. Just compare the output
of the script (in w.xml) with its input, in DATA.
Please advice on how to fix the broken utf8 output.
Thanks in advance,
Oliver.
#!/usr/bin/perl
use XML::Simple;
print "Reading data from XML source...\n";
$data=XMLin(\*DATA,
ForceArray=>[manju,hauer],
ContentKey=>'-content',
KeyAttr=>[name],
);
print "Retrieve and display data example:\n";
$k='0004.1';
print $k.": ".
$data->{lemma}->{$k}->{manju}->[0].
"\n";
print "Writing data to XML file...\n";
XMLout($data,
NumericEscape=>0,
RootName=>'wuti',
XMLDecl=>1,
OutputFile=>'w.xml',
);
__DATA__
<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<wuti>
<lemma name="0004.1">
<hauer>in der Morgendämmerung (H).</hauer>
<manju>farhûn suwaliyame</manju>
</lemma>
<lemma name="0004.2">
<hauer>Morgendämmerung.</hauer>
<manju>gersi fersi</manju>
</lemma>
</wuti>
I use XML::Simple to parse an XML file and
also to write it out. The problem lies in the
utf8 character data contained in the XML
source. While the XMLin() function seems
to read them properly, the XMLout() function
tries to replace utf8 material by multibyte
nonsense.
Below is my minimal example, run under perl 5.8.5
on a Fedora C3 box. Just compare the output
of the script (in w.xml) with its input, in DATA.
Please advice on how to fix the broken utf8 output.
Thanks in advance,
Oliver.
#!/usr/bin/perl
use XML::Simple;
print "Reading data from XML source...\n";
$data=XMLin(\*DATA,
ForceArray=>[manju,hauer],
ContentKey=>'-content',
KeyAttr=>[name],
);
print "Retrieve and display data example:\n";
$k='0004.1';
print $k.": ".
$data->{lemma}->{$k}->{manju}->[0].
"\n";
print "Writing data to XML file...\n";
XMLout($data,
NumericEscape=>0,
RootName=>'wuti',
XMLDecl=>1,
OutputFile=>'w.xml',
);
__DATA__
<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<wuti>
<lemma name="0004.1">
<hauer>in der Morgendämmerung (H).</hauer>
<manju>farhûn suwaliyame</manju>
</lemma>
<lemma name="0004.2">
<hauer>Morgendämmerung.</hauer>
<manju>gersi fersi</manju>
</lemma>
</wuti>