xml.minidom and user defined entities

  • Thread starter Nick Craig-Wood
  • Start date
N

Nick Craig-Wood

I'm using xml.minidom to parse some of our XML files. Some of these
have entities like "°" in which aren't understood by xml.minidom.
These give this error.

xml.parsers.expat.ExpatError: undefined entity: line 12, column 1

Does anyone know how to add entities when using xml.minidom?

I've spend some time searching the docs/code/google but I haven't
found the answer to this question!

Thanks
 
F

Fredrik Lundh

Nick said:
I'm using xml.minidom to parse some of our XML files. Some of these
have entities like "°" in which aren't understood by xml.minidom.

° is not a standard entity in XML (see below).
These give this error.

xml.parsers.expat.ExpatError: undefined entity: line 12, column 1

Does anyone know how to add entities when using xml.minidom?

the document is supposed to contain the necessary entity declarations
as an inline DTD, or contain a reference to an external DTD. (iirc, mini-
dom only supports inline DTDs, but that may have been fixed in recent
versions).

if you don't have a DTD, your document is broken (if so, and the set of
entities is known, you can use re.sub to fix replace unknown entities with
the corresponding characters before parsing. let me know if you want
sample code).

</F>
 
N

Nick Craig-Wood

Fredrik Lundh said:
&deg; is not a standard entity in XML (see below).

No probably not...
the document is supposed to contain the necessary entity declarations
as an inline DTD, or contain a reference to an external DTD. (iirc, mini-
dom only supports inline DTDs, but that may have been fixed in recent
versions).

The document doesn't define the entitys either internally or
externally. I don't fancy adding an inline definition either as there
are 100s of documents I need to process!
if you don't have a DTD, your document is broken (if so, and the set of
entities is known, you can use re.sub to fix replace unknown entities with
the corresponding characters before parsing. let me know if you want
sample code).

I was kind of hoping I could poke my extra entities into some dict or
other in the guts of xml.minidom...

However the job demands quick and nasty rather than elegant so I'll go
for the regexp solution I think, as the list of entities is well
defined.

Thanks for your help

Nick
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top