M
marek jedlinski
I'm using xml.sax to extract certain content from xml files. (Background:
my job is software localization; these are bilingual xml files, from which
I need to extract translated text, e.g. for spellchecking).
It works fine, unless a particular file has a doctype directive that
specifies a DTD. The parser then bails out, because the dtd is not
available (IOError, file not found). Since I don't have the DTDs, I need to
tell the SAX parser to ignore the doctype directive. Is this possible,
please?
I've noticed that I can eliminate the error if I create 0-byte dtd files
and put them where the parser expects to find them, but this is a little
tedious, since there are plenty of different DTDs expected at different
locations.
Or is there another SAX parser for Python I could use instead?
Kind thanks for any suggestions,
..marek
my job is software localization; these are bilingual xml files, from which
I need to extract translated text, e.g. for spellchecking).
It works fine, unless a particular file has a doctype directive that
specifies a DTD. The parser then bails out, because the dtd is not
available (IOError, file not found). Since I don't have the DTDs, I need to
tell the SAX parser to ignore the doctype directive. Is this possible,
please?
I've noticed that I can eliminate the error if I create 0-byte dtd files
and put them where the parser expects to find them, but this is a little
tedious, since there are plenty of different DTDs expected at different
locations.
Or is there another SAX parser for Python I could use instead?
Kind thanks for any suggestions,
..marek