V
Vlajko Knezic
Not so sure what is going on here but is something to do with the way UTF8
is handled in Perl and/or LibXML
The sctript below:
- accepts a value from a form text field;
- builds XML document around it,
- deparses the document to the string using toString(),
- parses the string into the XML document using parse_string()
- transforms XML document into HTML document using XSL
transformation
Everything works well until UTF8 character is entered in the text field (for
example é) . In that case when trying to run parse_string() code crashes
with the message:
=====================================================================
:2: parser error : Input is not proper UTF-8, indicate encoding
!<test><test_text>abcé</test_text></test> ^:2: error:
Bytes: 0xE9 0x3C 0x2F 0x74<test><test_text>abcé</test_text></test>
^ at C:/_work/vsurvey/site/test1.cgi line
24=====================================================================
I know that the code below does not make much sense but this is an
abstraction of the much more complex code. Environment is Perl 5.8; Apache;
Windows XP.
Hints and/or explanation what was coded wrong and how should it be fixed are
very much appreciated.
Vlajko Knezic,
Toronto, Ontario
---------------------------------------------------------------------------------------------------------------------
test.cgi
#! c:/Perl/bin/Perl.exe
use CGI;
use XML::LibXML;
use XML::LibXSLT;
use CGI::Carp qw( fatalsToBrowser );
use Encode;
my $mDocument = XML::LibXML:ocument-> new();
my $parser = XML::LibXML->new();
$mDocument->setEncoding("UTF8");
my $mCGI = new CGI;
print $mCGI->header;
my $mTest_text = $mCGI->param('test');;
my $mTest = $mDocument-> createElement("test");
my $mTestText = $mDocument-> createElement("test_text");
$mTestText->appendTextNode($mTest_text);
$mTest->appendChild($mTestText);
$mDocument->setDocumentElement( $mTest );
$mDocument->setEncoding("UTF8");
my $mTestXML = $mDocument->toString();
my $mParsedTestXML = $parser->parse_string($mTestXML);
my $mParsedXMLXSL = $parser->parse_file('test.xsl');
my $mParserXSL = XML::LibXSLT->new();
my $mParsedXSL = $mParserXSL->parse_stylesheet($mParsedXMLXSL);
my $mPageHTML = $mParsedXSL->transform($mParsedTestXML);
my $mPrintPageHTML = $mParsedXSL->output_string($mPageHTML);
print $mPrintPageHTML;
test.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xslutput method="html" encoding="UTF-8" indent="yes"
omit-xml-declaration="yes"/>
<xsl:template match="//test">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<html>
<body>
<xsl:value-of select="test_text"/>
<form name="test" type="post" target="_self">
<input type="text" name="test" /><input type="submit" name="button"/>
</form>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
is handled in Perl and/or LibXML
The sctript below:
- accepts a value from a form text field;
- builds XML document around it,
- deparses the document to the string using toString(),
- parses the string into the XML document using parse_string()
- transforms XML document into HTML document using XSL
transformation
Everything works well until UTF8 character is entered in the text field (for
example é) . In that case when trying to run parse_string() code crashes
with the message:
=====================================================================
:2: parser error : Input is not proper UTF-8, indicate encoding
!<test><test_text>abcé</test_text></test> ^:2: error:
Bytes: 0xE9 0x3C 0x2F 0x74<test><test_text>abcé</test_text></test>
^ at C:/_work/vsurvey/site/test1.cgi line
24=====================================================================
I know that the code below does not make much sense but this is an
abstraction of the much more complex code. Environment is Perl 5.8; Apache;
Windows XP.
Hints and/or explanation what was coded wrong and how should it be fixed are
very much appreciated.
Vlajko Knezic,
Toronto, Ontario
---------------------------------------------------------------------------------------------------------------------
test.cgi
#! c:/Perl/bin/Perl.exe
use CGI;
use XML::LibXML;
use XML::LibXSLT;
use CGI::Carp qw( fatalsToBrowser );
use Encode;
my $mDocument = XML::LibXML:ocument-> new();
my $parser = XML::LibXML->new();
$mDocument->setEncoding("UTF8");
my $mCGI = new CGI;
print $mCGI->header;
my $mTest_text = $mCGI->param('test');;
my $mTest = $mDocument-> createElement("test");
my $mTestText = $mDocument-> createElement("test_text");
$mTestText->appendTextNode($mTest_text);
$mTest->appendChild($mTestText);
$mDocument->setDocumentElement( $mTest );
$mDocument->setEncoding("UTF8");
my $mTestXML = $mDocument->toString();
my $mParsedTestXML = $parser->parse_string($mTestXML);
my $mParsedXMLXSL = $parser->parse_file('test.xsl');
my $mParserXSL = XML::LibXSLT->new();
my $mParsedXSL = $mParserXSL->parse_stylesheet($mParsedXMLXSL);
my $mPageHTML = $mParsedXSL->transform($mParsedTestXML);
my $mPrintPageHTML = $mParsedXSL->output_string($mPageHTML);
print $mPrintPageHTML;
test.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xslutput method="html" encoding="UTF-8" indent="yes"
omit-xml-declaration="yes"/>
<xsl:template match="//test">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<html>
<body>
<xsl:value-of select="test_text"/>
<form name="test" type="post" target="_self">
<input type="text" name="test" /><input type="submit" name="button"/>
</form>
</body>
</html>
</xsl:template>
</xsl:stylesheet>