T
tuser
I have finally found a solution for my long-standing problem
with Xslt-transformation under Windows ActiveState Perl and
I thought that other people might have the same problem so I
would like to share my solution with the group. I hope you
don't mind this long post, here is the story:
I had read an article by Shawn Ribordy on
http://www.perl.com/pub/a/2001/04/17/msxml.html
('MSXML, It's Not Just for VB Programmers Anymore')
in which he described how to do Xslt-transform on XML-files
using the "transformNodeToObject" method of a Win32::OLE
object.
The following lines are copied straight from his article:
"Great...", I thought, "...let's try this at home".
So I sat down at my Windows XP computer (with Activestate
v5.8.7 and the latest Msxml2.DOMDocument.4.0/SP2 installed),
fired up notepad.exe and pasted Shawn's example straight
into my perl program, and his example worked -- but that
was as far as it got!
When I started to use my own xslt-stylesheet, things went
seriously wrong. Well, I knew that my own xslt-stylesheets
had some problems, but I hoped (and expected) that the
transformNodeToObject() method would throw something useful
at me (which unfortunately it did not!) The problem was
that Shawn's example did not have any error handling
whatsoever.
I googled every possible combination of (perl, xslt, msxml,
win32, errorhandling) under the sun and I searched CPAN to
destruction, but to no avail.
Finally, after months of "pulling out my hair" I finally
stumbled upon the following variables/functions which
allowed me to correctly and reliably test for (almost)
every possible error condition.
- Win32::OLE::LastError()
- $doc->{parseError}->{reason}
- $doc->{parseError}->{line};
- $doc->{parseError}->{linePos};
- $doc->{parseError}->{srcText};
With the improved error-handling, I was now able to
experiment with different situations in my xslt-stylesheets.
Here is what I experienced:
XML-input-files: Use <?xml version='1.0' encoding='...'?>
=========================================================
In your XML-Input-Files, always specify the encoding in the
first line <?xml version='1.0' encoding='...'?>. This is
'ISO-8859-1' for plain old ASCII, but could also be 'UTF-8'
or 'UTF-16' if your XML-Input-File is set-up this way.
If you don't respect the correct encoding, you will end up
with an error ("An invalid character was found in text
content")
XSLT-files: Use <?xml version='1.0' encoding='...'?>
====================================================
In your XSLT-Files, always specify the encoding in the first
line <?xml version='1.0' encoding='...'?>.
Strictly speaking it is not necessary to specify the encoding
in the first line of the XSLT-file, a simple
<?xml version='1.0'?> is enough. but by doing so, you let
Microsoft guess the encoding, which it does correctly in 95%
of the cases. However, in the remaining 5% of the cases,
Microsoft gets it wrong and you end up with an error
("Switch from current encoding to specified encoding not
supported"). Consequently, I suggest to always specify the
actual encoding directly in the first line of the XSLT-file.
XSLT-files: Use <xslutput encoding='ISO-8859-1'/>
===================================================
It is more convenient to use
<xslutput encoding='ISO-8859-1'/> in your XSLT-file. This
works very well, even with accented characters and Umlaute.
You can use other encodings (such as
<xslutput encoding='UTF-8'/>), and the XML-Output-File
will be displayed correctly in Internet Explorer, but then
you will find it inconvenient that Notepad does not display
the XML-Output-file correctly any more.
XSLT-files: Use <xslutput method='xml'/>
==========================================
If you want to generate Html, you can do so easily by
generating an XML file with its tags in Html-syntax
(such as <p>, <table>, <hr/>, etc...). However, do not
attempt to use <xslutput method='html'/> in your XSLT-file,
use <xslutput method='xml'/> instead (even if you want to
generate 'Html', think of 'XHtml' and use
<xslutput method='xml'/>). You may end in up tears when you
discover that by using <xslutput method='html'/>, your
encoding does not work the way you want to. And you might
even discover that ' ' and/or ' ' will cause an error
after having erased your output-file! - Why is that so? - I
don't know.
The ultimate rule is: Never use 'html' as your method in
<xslutput method='...'/>, you must use
<xslutput method='xml'/> at all times.
XSLT-files: Use <xslutput indent='yes'/>
==========================================
This advice is more for convenience than anything else. If
you specify <xslutput indent='yes'/> and you look at your
XML-Output-file with Notepad, you will find that its
linebreaks are more conveniently located than they would
have been without <xslutput indent='yes'/>. It is still
not perfect, but it is better. So finally, the
<xslutput... /> line in your XSLT-file should look like
this:
<xslutput method='xml' indent='yes' encoding='ISO-8859-1'/>
In XSLT-files: Use ' ' instead of ' '
===============================================
The instruction ' ' does not work with MSXML. If you
want your XSLT-file to generate a non-breaking space, use
' ' instead.
....that's the end of my list.
For those of you who want to try, here is a test program:
use strict;
use warnings;
use Win32::OLE;
my $MxErr;
testcase(1, 'transformation succeeds');
testcase(2, 'unbalanced tags in *.xml');
testcase(3, 'unbalanced tags in *.xsl');
testcase(4, 'syntax error in *.xsl');
testcase(5, 'output method=html fails');
sub testcase {
my ($Case, $Description) = @_;
makefiles($Case);
system('cls');
print "Testcase no $Case: $Description\n";
print "\n\nThis is the xml file 'test$Case.xml':\n";
print "=============================================\n";
system("type test$Case.xml");
print "=============================================\n";
system('pause');
print "\n\nThis is the xsl file 'trf$Case.xsl':\n";
print "=============================================\n";
system("type trf$Case.xsl");
print "=============================================\n";
system('pause');
my $success = TransformXslt(xml => "test$Case.xml",
xslt => "trf$Case.xsl",
out => "output$Case.html");
if ($success) {
print "\n\nTransformXslt succeeded, result:\n";
print "=========================================\n";
system("type output$Case.html");
print "=========================================\n";
}
else {
print "\n\nProblem with TransformXslt:\n";
print "=========================================\n";
print "$MxErr\n";
print "=========================================\n";
}
system('pause');
print "\n";
}
sub makefiles {
my ($Case) = @_;
my $XData = ($Case == 2 ? 'data1' : 'data');
my $XTitle = ($Case == 3 ? 'title1' : 'title');
my $XFunc = ($Case == 4 ? 'r([?' : '.');
my $XMethod = ($Case == 5 ? 'html' : 'xml');
open OFL, '>', "test$Case.xml"
or die "err write test$Case.xml: $!";
print OFL qq{<?xml version="1.0"}.
qq{ encoding="ISO-8859-1"?>\n};
print OFL qq{<index>\n};
print OFL qq{ <data>aaaa</$XData>\n};
print OFL qq{ <data>bbbb</data>\n};
print OFL qq{</index>\n};
close OFL;
open OFL, '>', "trf$Case.xsl"
or die "err write trf$Case.xsl: $!";
print OFL qq{<?xml version="1.0"}.
qq{ encoding="ISO-8859-1"?>\n};
print OFL qq{<xsl:stylesheet version="1.0"\n};
print OFL qq{xmlns:xsl="http://www.w3.org/1999}.
qq{/XSL/Transform">\n};
print OFL qq{ <xslutput method="$XMethod" indent=}.
qq{"yes" encoding="ISO-8859-1"/>\n};
print OFL qq{ <xsl:template match="/">\n};
print OFL qq{ <html>\n};
print OFL qq{ <body>\n};
print OFL qq{ <title>Test</$XTitle>\n};
print OFL qq{ <p>nonbreaking space</p>\n};
print OFL qq{ <hr/>\n};
print OFL qq{ <xsl:for-each select="index/data">\n};
print OFL qq{ <p>Test: *** <xsl:value-of}.
qq{ select="$XFunc"/> ***</p>\n};
print OFL qq{ </xsl:for-each>\n};
print OFL qq{ </body>\n};
print OFL qq{ </html>\n};
print OFL qq{ </xsl:template>\n};
print OFL qq{</xsl:stylesheet>\n};
close OFL;
}
sub TransformXslt {
my ($xml_input_file, $xslt_file, $xml_output_file)
= ($_[1], $_[3], $_[5]);
$MxErr = '';
my $DomDocument = 'Msxml2.DOMDocument.4.0';
# Load the document (Xml-Input-File)
my $xml_input_doc = Win32::OLE->new($DomDocument);
unless ($xml_input_doc) {
$MxErr = qq{Mx-0040: Couldn't create Win32::OLE}.
qq{ $DomDocument for XML-Input-File}.
qq{ "$xml_input_file"};
return undef;
}
$xml_input_doc->{async} = 'False';
$xml_input_doc->{validateOnParse} = 'True';
if (!$xml_input_doc->Load($xml_input_file)) {
my $Rs = $xml_input_doc->{parseError}->{reason};
$Rs =~ s/\r//; chomp $Rs;
my $Ln = $xml_input_doc->{parseError}->{line};
my $Ps = $xml_input_doc->{parseError}->{linePos};
my $Tx = $xml_input_doc->{parseError}->{srcText};
$MxErr = qq{Mx-0060: XML-Input-File}.
qq{ "$xml_input_file"}.
qq{ did not load for $DomDocument at line}.
qq{ $Ln, pos $Ps, reason: $Rs, text: '$Tx'};
return undef;
}
# create Output-object
my $xml_output_doc = Win32::OLE->new($DomDocument);
unless ($xml_output_doc) {
$MxErr = qq{Mx-0055: Couldn't create Win32::OLE}.
qq{ $DomDocument for XML-Output-File}.
qq{ "$xml_output_file"};
return undef;
}
# Load the Stylesheet (Xsl-File)
my $xslt_doc = Win32::OLE->new($DomDocument);
unless ($xslt_doc) {
$MxErr = qq{Mx-0050: Couldn't create Win32::OLE}.
qq{ $DomDocument for XSLT-File "$xslt_file"};
return undef;
}
$xslt_doc->{async} = 'False';
$xslt_doc->{validateOnParse} = 'True';
if (!$xslt_doc->Load($xslt_file)) {
my $Rs = $xslt_doc->{parseError}->{reason};
$Rs =~ s/\r//; chomp $Rs;
my $Ln = $xslt_doc->{parseError}->{line};
my $Ps = $xslt_doc->{parseError}->{linePos};
my $Tx = $xslt_doc->{parseError}->{srcText};
$MxErr = qq{Mx-0070: XSLT-file "$xslt_file" did not}.
qq{ load for $DomDocument at line}.
qq{ $Ln, pos $Ps, reason: $Rs, text: '$Tx'};
return undef;
}
# Do the work: transform xml using an xslt stylesheet
$xml_input_doc->transformNodeToObject($xslt_doc,
$xml_output_doc);
if (Win32::OLE::LastError()) {
my $Rs = Win32::OLE::LastError(); $Rs =~s/\s+/ /g;
$MxErr = qq{Mx-0080: XSLT-file "$xslt_file" has}.
qq{ syntax-errors for $DomDocument, }.
qq{reason: $Rs};
return undef;
}
# Save the done work to the output-file
$xml_output_doc->save($xml_output_file);
if (Win32::OLE::LastError()) {
my $Rs = Win32::OLE::LastError(); $Rs =~s/\s+/ /g;
$MxErr = qq{Mx-0090: Can't save to output-file}.
qq{ "$xml_output_file" for $DomDocument, }.
qq{reason: $Rs};
return undef;
}
# "-z" tests for empty file, which is considered to be
# a fatal error
if (-z $xml_output_file) {
$MxErr = qq{Mx-0100: A fatal error occured in either}.
qq{ your XSLT-file "$xslt_file", or in}.
qq{ your XML-input-file "$xml_input_file",}.
qq{ the output-file "$xml_output_file" will}.
qq{ be empty.};
return undef;
}
return 1;
}
with Xslt-transformation under Windows ActiveState Perl and
I thought that other people might have the same problem so I
would like to share my solution with the group. I hope you
don't mind this long post, here is the story:
I had read an article by Shawn Ribordy on
http://www.perl.com/pub/a/2001/04/17/msxml.html
('MSXML, It's Not Just for VB Programmers Anymore')
in which he described how to do Xslt-transform on XML-files
using the "transformNodeToObject" method of a Win32::OLE
object.
The following lines are copied straight from his article:
"Great...", I thought, "...let's try this at home".
So I sat down at my Windows XP computer (with Activestate
v5.8.7 and the latest Msxml2.DOMDocument.4.0/SP2 installed),
fired up notepad.exe and pasted Shawn's example straight
into my perl program, and his example worked -- but that
was as far as it got!
When I started to use my own xslt-stylesheet, things went
seriously wrong. Well, I knew that my own xslt-stylesheets
had some problems, but I hoped (and expected) that the
transformNodeToObject() method would throw something useful
at me (which unfortunately it did not!) The problem was
that Shawn's example did not have any error handling
whatsoever.
I googled every possible combination of (perl, xslt, msxml,
win32, errorhandling) under the sun and I searched CPAN to
destruction, but to no avail.
Finally, after months of "pulling out my hair" I finally
stumbled upon the following variables/functions which
allowed me to correctly and reliably test for (almost)
every possible error condition.
- Win32::OLE::LastError()
- $doc->{parseError}->{reason}
- $doc->{parseError}->{line};
- $doc->{parseError}->{linePos};
- $doc->{parseError}->{srcText};
With the improved error-handling, I was now able to
experiment with different situations in my xslt-stylesheets.
Here is what I experienced:
XML-input-files: Use <?xml version='1.0' encoding='...'?>
=========================================================
In your XML-Input-Files, always specify the encoding in the
first line <?xml version='1.0' encoding='...'?>. This is
'ISO-8859-1' for plain old ASCII, but could also be 'UTF-8'
or 'UTF-16' if your XML-Input-File is set-up this way.
If you don't respect the correct encoding, you will end up
with an error ("An invalid character was found in text
content")
XSLT-files: Use <?xml version='1.0' encoding='...'?>
====================================================
In your XSLT-Files, always specify the encoding in the first
line <?xml version='1.0' encoding='...'?>.
Strictly speaking it is not necessary to specify the encoding
in the first line of the XSLT-file, a simple
<?xml version='1.0'?> is enough. but by doing so, you let
Microsoft guess the encoding, which it does correctly in 95%
of the cases. However, in the remaining 5% of the cases,
Microsoft gets it wrong and you end up with an error
("Switch from current encoding to specified encoding not
supported"). Consequently, I suggest to always specify the
actual encoding directly in the first line of the XSLT-file.
XSLT-files: Use <xslutput encoding='ISO-8859-1'/>
===================================================
It is more convenient to use
<xslutput encoding='ISO-8859-1'/> in your XSLT-file. This
works very well, even with accented characters and Umlaute.
You can use other encodings (such as
<xslutput encoding='UTF-8'/>), and the XML-Output-File
will be displayed correctly in Internet Explorer, but then
you will find it inconvenient that Notepad does not display
the XML-Output-file correctly any more.
XSLT-files: Use <xslutput method='xml'/>
==========================================
If you want to generate Html, you can do so easily by
generating an XML file with its tags in Html-syntax
(such as <p>, <table>, <hr/>, etc...). However, do not
attempt to use <xslutput method='html'/> in your XSLT-file,
use <xslutput method='xml'/> instead (even if you want to
generate 'Html', think of 'XHtml' and use
<xslutput method='xml'/>). You may end in up tears when you
discover that by using <xslutput method='html'/>, your
encoding does not work the way you want to. And you might
even discover that ' ' and/or ' ' will cause an error
after having erased your output-file! - Why is that so? - I
don't know.
The ultimate rule is: Never use 'html' as your method in
<xslutput method='...'/>, you must use
<xslutput method='xml'/> at all times.
XSLT-files: Use <xslutput indent='yes'/>
==========================================
This advice is more for convenience than anything else. If
you specify <xslutput indent='yes'/> and you look at your
XML-Output-file with Notepad, you will find that its
linebreaks are more conveniently located than they would
have been without <xslutput indent='yes'/>. It is still
not perfect, but it is better. So finally, the
<xslutput... /> line in your XSLT-file should look like
this:
<xslutput method='xml' indent='yes' encoding='ISO-8859-1'/>
In XSLT-files: Use ' ' instead of ' '
===============================================
The instruction ' ' does not work with MSXML. If you
want your XSLT-file to generate a non-breaking space, use
' ' instead.
....that's the end of my list.
For those of you who want to try, here is a test program:
use strict;
use warnings;
use Win32::OLE;
my $MxErr;
testcase(1, 'transformation succeeds');
testcase(2, 'unbalanced tags in *.xml');
testcase(3, 'unbalanced tags in *.xsl');
testcase(4, 'syntax error in *.xsl');
testcase(5, 'output method=html fails');
sub testcase {
my ($Case, $Description) = @_;
makefiles($Case);
system('cls');
print "Testcase no $Case: $Description\n";
print "\n\nThis is the xml file 'test$Case.xml':\n";
print "=============================================\n";
system("type test$Case.xml");
print "=============================================\n";
system('pause');
print "\n\nThis is the xsl file 'trf$Case.xsl':\n";
print "=============================================\n";
system("type trf$Case.xsl");
print "=============================================\n";
system('pause');
my $success = TransformXslt(xml => "test$Case.xml",
xslt => "trf$Case.xsl",
out => "output$Case.html");
if ($success) {
print "\n\nTransformXslt succeeded, result:\n";
print "=========================================\n";
system("type output$Case.html");
print "=========================================\n";
}
else {
print "\n\nProblem with TransformXslt:\n";
print "=========================================\n";
print "$MxErr\n";
print "=========================================\n";
}
system('pause');
print "\n";
}
sub makefiles {
my ($Case) = @_;
my $XData = ($Case == 2 ? 'data1' : 'data');
my $XTitle = ($Case == 3 ? 'title1' : 'title');
my $XFunc = ($Case == 4 ? 'r([?' : '.');
my $XMethod = ($Case == 5 ? 'html' : 'xml');
open OFL, '>', "test$Case.xml"
or die "err write test$Case.xml: $!";
print OFL qq{<?xml version="1.0"}.
qq{ encoding="ISO-8859-1"?>\n};
print OFL qq{<index>\n};
print OFL qq{ <data>aaaa</$XData>\n};
print OFL qq{ <data>bbbb</data>\n};
print OFL qq{</index>\n};
close OFL;
open OFL, '>', "trf$Case.xsl"
or die "err write trf$Case.xsl: $!";
print OFL qq{<?xml version="1.0"}.
qq{ encoding="ISO-8859-1"?>\n};
print OFL qq{<xsl:stylesheet version="1.0"\n};
print OFL qq{xmlns:xsl="http://www.w3.org/1999}.
qq{/XSL/Transform">\n};
print OFL qq{ <xslutput method="$XMethod" indent=}.
qq{"yes" encoding="ISO-8859-1"/>\n};
print OFL qq{ <xsl:template match="/">\n};
print OFL qq{ <html>\n};
print OFL qq{ <body>\n};
print OFL qq{ <title>Test</$XTitle>\n};
print OFL qq{ <p>nonbreaking space</p>\n};
print OFL qq{ <hr/>\n};
print OFL qq{ <xsl:for-each select="index/data">\n};
print OFL qq{ <p>Test: *** <xsl:value-of}.
qq{ select="$XFunc"/> ***</p>\n};
print OFL qq{ </xsl:for-each>\n};
print OFL qq{ </body>\n};
print OFL qq{ </html>\n};
print OFL qq{ </xsl:template>\n};
print OFL qq{</xsl:stylesheet>\n};
close OFL;
}
sub TransformXslt {
my ($xml_input_file, $xslt_file, $xml_output_file)
= ($_[1], $_[3], $_[5]);
$MxErr = '';
my $DomDocument = 'Msxml2.DOMDocument.4.0';
# Load the document (Xml-Input-File)
my $xml_input_doc = Win32::OLE->new($DomDocument);
unless ($xml_input_doc) {
$MxErr = qq{Mx-0040: Couldn't create Win32::OLE}.
qq{ $DomDocument for XML-Input-File}.
qq{ "$xml_input_file"};
return undef;
}
$xml_input_doc->{async} = 'False';
$xml_input_doc->{validateOnParse} = 'True';
if (!$xml_input_doc->Load($xml_input_file)) {
my $Rs = $xml_input_doc->{parseError}->{reason};
$Rs =~ s/\r//; chomp $Rs;
my $Ln = $xml_input_doc->{parseError}->{line};
my $Ps = $xml_input_doc->{parseError}->{linePos};
my $Tx = $xml_input_doc->{parseError}->{srcText};
$MxErr = qq{Mx-0060: XML-Input-File}.
qq{ "$xml_input_file"}.
qq{ did not load for $DomDocument at line}.
qq{ $Ln, pos $Ps, reason: $Rs, text: '$Tx'};
return undef;
}
# create Output-object
my $xml_output_doc = Win32::OLE->new($DomDocument);
unless ($xml_output_doc) {
$MxErr = qq{Mx-0055: Couldn't create Win32::OLE}.
qq{ $DomDocument for XML-Output-File}.
qq{ "$xml_output_file"};
return undef;
}
# Load the Stylesheet (Xsl-File)
my $xslt_doc = Win32::OLE->new($DomDocument);
unless ($xslt_doc) {
$MxErr = qq{Mx-0050: Couldn't create Win32::OLE}.
qq{ $DomDocument for XSLT-File "$xslt_file"};
return undef;
}
$xslt_doc->{async} = 'False';
$xslt_doc->{validateOnParse} = 'True';
if (!$xslt_doc->Load($xslt_file)) {
my $Rs = $xslt_doc->{parseError}->{reason};
$Rs =~ s/\r//; chomp $Rs;
my $Ln = $xslt_doc->{parseError}->{line};
my $Ps = $xslt_doc->{parseError}->{linePos};
my $Tx = $xslt_doc->{parseError}->{srcText};
$MxErr = qq{Mx-0070: XSLT-file "$xslt_file" did not}.
qq{ load for $DomDocument at line}.
qq{ $Ln, pos $Ps, reason: $Rs, text: '$Tx'};
return undef;
}
# Do the work: transform xml using an xslt stylesheet
$xml_input_doc->transformNodeToObject($xslt_doc,
$xml_output_doc);
if (Win32::OLE::LastError()) {
my $Rs = Win32::OLE::LastError(); $Rs =~s/\s+/ /g;
$MxErr = qq{Mx-0080: XSLT-file "$xslt_file" has}.
qq{ syntax-errors for $DomDocument, }.
qq{reason: $Rs};
return undef;
}
# Save the done work to the output-file
$xml_output_doc->save($xml_output_file);
if (Win32::OLE::LastError()) {
my $Rs = Win32::OLE::LastError(); $Rs =~s/\s+/ /g;
$MxErr = qq{Mx-0090: Can't save to output-file}.
qq{ "$xml_output_file" for $DomDocument, }.
qq{reason: $Rs};
return undef;
}
# "-z" tests for empty file, which is considered to be
# a fatal error
if (-z $xml_output_file) {
$MxErr = qq{Mx-0100: A fatal error occured in either}.
qq{ your XSLT-file "$xslt_file", or in}.
qq{ your XML-input-file "$xml_input_file",}.
qq{ the output-file "$xml_output_file" will}.
qq{ be empty.};
return undef;
}
return 1;
}