A
Arvin Portlock
I'm using the XML::Xerces module to validate batches of
XML documents against a schema. The module is still under
development so there is little documentation that I can
find, but I'm still finding it incredibly useful. I have
4 questions that would enhance my Xerces experience greatly.
1. The way to get validation errors seems incredibly odd
to me:
eval {$parser->parse ($file)};
print $@;
Is this the only way to get at error messages? Via $@?
Does this wrapper provide a more direct method? Does this
seem odd to anybody else in the perl community or is
it just me?
2. Is there any way to use local copies of the schemas
rather than have Xerces fetch them from the web? In my
XML documents the referenced schemas have the form:
xsi:schemaLocation="http://www.loc.gov/standards/mets/mets.xsd"
I.e., they are all URLs. I think this is why Xerces is so
slow. As I'd like to use this module to validate batches
of thousands of documents, it would be nice if Xerces didn't
have to go out and fetch the schemas for every single document.
3. Xerces stops validating after the first error encountered.
Is there any way to get it to report all the errors in the
documents. I understand what the standard says about parsers
and errors, but evry other validator I know about has an option
to continue validation after an error. Is there a similar option
for Xerces?
4. Lastly, $@ reports errors in this form:
ERROR:
FILE: D:\sgml\mets\tei2mets/test.mets.xml
LINE: 34
COLUMN: 27
MESSAGE: Unknown element 'mods:namePPart'
at validate.pl line 13
So I need to parse out the various pieces using regular expressions
to compose messages in the form I want. So I guess this is a repeat
of the first question: is there a way to get direct access to the
pieces of the error message?
I'd prefer to not change or extend Xerces.pm itself.
Thanks!
XML documents against a schema. The module is still under
development so there is little documentation that I can
find, but I'm still finding it incredibly useful. I have
4 questions that would enhance my Xerces experience greatly.
1. The way to get validation errors seems incredibly odd
to me:
eval {$parser->parse ($file)};
print $@;
Is this the only way to get at error messages? Via $@?
Does this wrapper provide a more direct method? Does this
seem odd to anybody else in the perl community or is
it just me?
2. Is there any way to use local copies of the schemas
rather than have Xerces fetch them from the web? In my
XML documents the referenced schemas have the form:
xsi:schemaLocation="http://www.loc.gov/standards/mets/mets.xsd"
I.e., they are all URLs. I think this is why Xerces is so
slow. As I'd like to use this module to validate batches
of thousands of documents, it would be nice if Xerces didn't
have to go out and fetch the schemas for every single document.
3. Xerces stops validating after the first error encountered.
Is there any way to get it to report all the errors in the
documents. I understand what the standard says about parsers
and errors, but evry other validator I know about has an option
to continue validation after an error. Is there a similar option
for Xerces?
4. Lastly, $@ reports errors in this form:
ERROR:
FILE: D:\sgml\mets\tei2mets/test.mets.xml
LINE: 34
COLUMN: 27
MESSAGE: Unknown element 'mods:namePPart'
at validate.pl line 13
So I need to parse out the various pieces using regular expressions
to compose messages in the form I want. So I guess this is a repeat
of the first question: is there a way to get direct access to the
pieces of the error message?
I'd prefer to not change or extend Xerces.pm itself.
Thanks!