XML::Xerces questions

Arvin Portlock · Apr 20, 2004

I'm using the XML::Xerces module to validate batches of
XML documents against a schema. The module is still under
development so there is little documentation that I can
find, but I'm still finding it incredibly useful. I have
4 questions that would enhance my Xerces experience greatly.

1. The way to get validation errors seems incredibly odd
to me:

eval {$parser->parse ($file)};
print $@;

Is this the only way to get at error messages? Via $@?
Does this wrapper provide a more direct method? Does this
seem odd to anybody else in the perl community or is
it just me?

2. Is there any way to use local copies of the schemas
rather than have Xerces fetch them from the web? In my
XML documents the referenced schemas have the form:

xsi:schemaLocation="http://www.loc.gov/standards/mets/mets.xsd"

I.e., they are all URLs. I think this is why Xerces is so
slow. As I'd like to use this module to validate batches
of thousands of documents, it would be nice if Xerces didn't
have to go out and fetch the schemas for every single document.

3. Xerces stops validating after the first error encountered.
Is there any way to get it to report all the errors in the
documents. I understand what the standard says about parsers
and errors, but evry other validator I know about has an option
to continue validation after an error. Is there a similar option
for Xerces?

4. Lastly, $@ reports errors in this form:

ERROR:
FILE: D:\sgml\mets\tei2mets/test.mets.xml
LINE: 34
COLUMN: 27
MESSAGE: Unknown element 'mods:namePPart'
at validate.pl line 13

So I need to parse out the various pieces using regular expressions
to compose messages in the form I want. So I guess this is a repeat
of the first question: is there a way to get direct access to the
pieces of the error message?

I'd prefer to not change or extend Xerces.pm itself.

Thanks!

Tad McClellan · Apr 20, 2004

Arvin Portlock said:
1. The way to get validation errors seems incredibly odd
to me:

eval {$parser->parse ($file)};
print $@;

Is this the only way to get at error messages? Via $@?
Does this wrapper provide a more direct method? Does this
seem odd to anybody else in the perl community or is
it just me?

perldoc -f eval

...
It is also Perl's exception trapping mechanism

"eval BLOCK" and "if $@" is Perl's "try" and "catch" mechanism.

pkent · Apr 20, 2004

Arvin Portlock said:
1. The way to get validation errors seems incredibly odd
to me:

eval {$parser->parse ($file)};
print $@;

It looks like parse() throws a fatal error, i.e. a die(), when it hits
an error. A die() will basically exit the program unless you catch the
exception in an eval block. And the thing that was caught in the eval
block is held in the special variable $@.

Now, sometimes the sensible thing to do when you encounter an
unrecoverable error is to throw a fatal exception... sometimes it's
sensible to return 'undef' and allow the caller to interrogate the
object using a method such as lastError() or something... or maybe some
other approach.

Sometimes the user and the module-writer have different ideas, and you
end up thinking "this is a stupid way to detect an error in an XML
document".

One underused (IME) feature of perl >5.005 are exception objects. This
is where you call die() with an object, not a string. The object then
ends up in $@ and you can call methods on it to examine the error. While
this doesn't have Java's stricter model exceptions, it can still help
out in cases like yours where currently you're just getting an error
_string_ and you want to parse that string in some way or get other
information.

Some discussion at
http://www.perl.com/pub/a/2002/11/14/exception.html

P

Arvin Portlock · Apr 20, 2004

Oh I know what eval {} and $@ are all about. I'm just used
to seeing it as a way to catch runtime errrors, not as a built
in interface within a module to record messages. In fact
Xerces itself will experience runtime errors for certain
conditions. So it's still important to trap them in an eval.
But reporting validation errors doesn't seem the best use
for this, especially in a program which has as one of its
main functions the ability to validate a document. I was
hoping for something more along the lines of:

my $status = $parser->parse ($file);
if ($status->errors) {
until ($status->errors->EOF) {
print $status->errors->error;
$status->errors->move_next();
}
}

or something along those lines.

Steven N. Hirsch · Apr 22, 2004

pkent said:
One underused (IME) feature of perl >5.005 are exception objects. This
is where you call die() with an object, not a string. The object then
ends up in $@ and you can call methods on it to examine the error. While
this doesn't have Java's stricter model exceptions, it can still help
out in cases like yours where currently you're just getting an error
_string_ and you want to parse that string in some way or get other
information.

XML::Xerces uses exception objects.

Arvin Portlock · Apr 22, 2004

and you can call methods on it to examine the error.

Thanks! This makes things conceptually clearer for me.

Xerces 2.1 cpp disable external entity dereferencing	1	May 30, 2014
XML::Xerces: Schema caching and XML catalog files questions	0	Jun 20, 2005
getNodeValue Problem -Xerces Perl	0	Jul 6, 2004
Xerces c++ xml log	1	Apr 4, 2007
Xerces-C++ Schema validation	2	Oct 25, 2006
XML::Xerces : Schema for ignoring unknown tags ?	0	Aug 16, 2004
validate xml against xsd using xerces C++ lib	0	May 16, 2011
How do I save information from an GUI into a XML-file?	0	Aug 17, 2022

XML::Xerces questions

Arvin Portlock

Tad McClellan

pkent

Arvin Portlock

Steven N. Hirsch

Arvin Portlock

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads