xml bug?

Imbaud Pierre · Dec 28, 2006

I am using the standard xml library to create another library able to
read, and maybe write,
xmp files.
Then an xml library bug popped out:
xml.dom.minidom was unable to parse an xml file that came from an
example provided by an official organism.(http://www.iptc.org/IPTC4XMP)
The parsed file was somewhat hairy, but I have been able to reproduce
the bug with a simplified
version, that goes:

<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 3.0-28,
framework 1.6'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:iX='http://ns.adobe.com/iX/1.0/'>

<rdf

escription rdf:about='uuid:f5b64178-9394-11d9-bb8e-a67e6693b6e9'
xmlns:xmpPLUS='XMP Photographic Licensing Universal System (xmpPLUS,
http://ns.adobe.com/xap/1.0/PLUS/)'>
<xmpPLUS:CreditLineReq>False</xmpPLUS:CreditLineReq>
<xmpPLUS:ReuseAllowed>False</xmpPLUS:ReuseAllowed>
</rdf

escription>

</rdf:RDF>
</x:xmpmeta>
<?xpacket end='w'?>

The offending part is the one that goes: xmpPLUS='....'
it triggers an exception: ValueError: too many values to unpack,
in _parse_ns_name. Some debugging showed an obvious mistake
in the scanning of the name argument, that goes beyond the closing
" ' ".

Im aware I dont give here enough matter to allow full understanding
of the bug. But thats not the place for this, and thats not my point.

Now my points are:
- how do I spot the version of a given library? There is a __version__
attribute of the module, is that it?
- How do I access to a given library buglist? Maybe this one is known,
about to be fixed, it would then be useless to report it.
- How do I report bugs, on a standard lib?
- I tried to copy the lib somewhere, put it BEFORE the official lib in
"the path" (that is:sys.path), the stack shown by the traceback
still shows the original files being used. Is there a special
mechanism bypassing the sys.path search, for standard libs? (I may
be wrong on this, it seems hard to believe...)

- does someone know a good tool to validate an xml file?

btw, my code:

from nxml.dom import minidom
....
class whatever:
def __init__(self, inStream):
xmldoc = minidom.parse(inStream)

Thanks for any help...

Erik Johnson · Dec 28, 2006

Imbaud Pierre said:
Now my points are:
- how do I spot the version of a given library? There is a __version__
attribute of the module, is that it?

Yes, the module maintainer should be incrementing this version for each new
release and so it should properly correspond to the actual revision of code.

- How do I access to a given library buglist? Maybe this one is known,
about to be fixed, it would then be useless to report it.

Not exactly sure, but this is probably a good place to start:
http://docs.python.org/modindex.html

- How do I report bugs, on a standard lib?

I found this link:

http://sourceforge.net/tracker/?group_id=5470&atid=105470

by looking under the "help" item at www.python.org (an excellent starting
place for all sorts of things).

- I tried to copy the lib somewhere, put it BEFORE the official lib in
"the path" (that is:sys.path), the stack shown by the traceback
still shows the original files being used. Is there a special
mechanism bypassing the sys.path search, for standard libs? (I may
be wrong on this, it seems hard to believe...)

My understanding is sys.path is searched in order. The first entry is
usually the empty string, interpreted to mean the current directory. If you
modify sys.path to put the directory containing your modified code in front
of where the standard library is found, your code should be the one used.
That is not the case?

- does someone know a good tool to validate an xml file?

Typing "XML validator" into google returns a bunch. I think I would start
with the one at w3.org: http://validator.w3.org/

Imbaud Pierre · Dec 28, 2006

Erik Johnson a écrit :

Yes, the module maintainer should be incrementing this version for each new
release and so it should properly correspond to the actual revision of code.

Not exactly sure, but this is probably a good place to start:
http://docs.python.org/modindex.html

But python.org was the right entry point, it sent me to the bug
tracker: http://sourceforge.net/tracker/?group_id=5470&atid=105470
Its a bit short on explanations... And I found unsolved issues,
3 years old! this indexes the modules, not the buglist!

I found this link:

http://sourceforge.net/tracker/?group_id=5470&atid=105470

Right! Same place to fetch and to submit. Fair.

by looking under the "help" item at www.python.org (an excellent starting
place for all sorts of things).

My understanding is sys.path is searched in order. The first entry is
usually the empty string, interpreted to mean the current directory. If you
modify sys.path to put the directory containing your modified code in front
of where the standard library is found, your code should be the one used.
That is not the case?

I put it in front, as for the unix PATH...

Typing "XML validator" into google returns a bunch. I think I would start
with the one at w3.org: http://validator.w3.org/

Ill try this. Thanks a lot, my friend!

Gabriel Genellina · Dec 29, 2006

The offending part is the one that goes: xmpPLUS='....'
it triggers an exception: ValueError: too many values to unpack,
in _parse_ns_name. Some debugging showed an obvious mistake
in the scanning of the name argument, that goes beyond the closing
" ' ".

Now my points are:
- how do I spot the version of a given library? There is a __version__
attribute of the module, is that it?

Usually, yes. But it's not required at all, and may have another
name. Look at the offending module.

- I tried to copy the lib somewhere, put it BEFORE the official lib in
"the path" (that is:sys.path), the stack shown by the traceback
still shows the original files being used. Is there a special
mechanism bypassing the sys.path search, for standard libs? (I may
be wrong on this, it seems hard to believe...)

When the module is inside a package -as in this case- it's a bit
harder. Code says `import xml.dom.modulename`, not `import
modulename`. So even if you put modulename.py earlier in the path, it
won't be found.
Some alternatives:
- modify the library in-place. It's the easiest way if you don't
redistribute your code.
- same as above but using an installer (checking version numbers, of course)
- "monkey patching". That is, in a new module of your own, imported
early on your application, write the corrected version of the offending method:

def _parse_ns_name(...):
...doing the right thing...

from xml.dom import modulename
modulename._parse_ns_name = _parse_ns_name

(maybe checking version numbers too)

--
Gabriel Genellina
Softlab SRL

__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Dec 30, 2006

Imbaud said:
- how do I spot the version of a given library? There is a __version__
attribute of the module, is that it?

Contrary to what others have said: for modules included in the standard
library (and if using these modules, rather than using PyXML), you
should use sys.version_info to identify a version.

- How do I access to a given library buglist? Maybe this one is known,
about to be fixed, it would then be useless to report it.

Others have already pointed you to SF.

- How do I report bugs, on a standard lib?
Likewise.

- I tried to copy the lib somewhere, put it BEFORE the official lib in
"the path" (that is:sys.path), the stack shown by the traceback
still shows the original files being used. Is there a special
mechanism bypassing the sys.path search, for standard libs? (I may
be wrong on this, it seems hard to believe...)

Which lib? "minidom.py"? Well, you are likely importing
"xml.dom.minidom", not "minidom". So adding another minidom.py
to a directory in sys.path won't help.

Regards,
Martin

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Dec 30, 2006

Imbaud said:
But python.org was the right entry point, it sent me to the bug
tracker: http://sourceforge.net/tracker/?group_id=5470&atid=105470
Its a bit short on explanations... And I found unsolved issues,
3 years old!

That's true, and likely to grow. Contributions are welcome!

Regards,
Martin

Imbaud Pierre · Dec 30, 2006

Martin v. Löwis a écrit :

Contrary to what others have said: for modules included in the standard
library (and if using these modules, rather than using PyXML), you
should use sys.version_info to identify a version.

Others have already pointed you to SF.

Which lib? "minidom.py"? Well, you are likely importing
"xml.dom.minidom", not "minidom". So adding another minidom.py
to a directory in sys.path won't help.

Regards,
Martin

I did import xml!
Maybe my mistake came from copying the whole tree from the standard
lib: comprising .pyc, .pyo... maybe the .pyc contained references to
previous sources?
Got rid of these, did reload ALL the modules, then exited/re-entered
the interpreter (ipython, btw...), and it eventually accessed the new
modules...

Btw, I pushed debugging further, the bug seem to stem from C code,
hence nothing easy to fix... Ill indeed submit a bug.
Thanks for your help! I obviously screamed for help before being
helpless, apologies...

Imbaud Pierre · Jan 8, 2007

I submitted a bug, to sourceforge. Was answered (pretty fast) the file
I dealt with was the buggy part. I then submitted a bug to the file
author, who agreed, and fixed. End of the story.
All I could complain about, with the xml.dom library, is how obscure
the exception context was: I did violate SOME xml rule, ideally the
exception should show the rule, and the faulty piece of data. But I
know this has a cost, both runtime cost and developper-s time cost.

Imbaud Pierre a écrit :

Embedding custom XML into XMP	8	Jan 31, 2007
I'm getting an extra, unwanted attribute	2	Feb 18, 2007
import bug	15	Oct 31, 2009
Bug in Python 2.6 urlencode	5	Sep 7, 2010
Multiprocessing.Array bug / shared numpy array	1	Oct 8, 2009
Multiprocessing bug, is information ever omitted from a traceback?	7	Dec 9, 2011
Multiprocessing bug, is my editor (SciTE) impeding my progress?	10	Dec 6, 2011
problem parsing utf-8 encoded xml - minidom	2	Jul 4, 2008

xml bug?

Imbaud Pierre

Erik Johnson

Imbaud Pierre

Gabriel Genellina

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Imbaud Pierre

Imbaud Pierre

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads