xml bug?

I

Imbaud Pierre

I am using the standard xml library to create another library able to
read, and maybe write,
xmp files.
Then an xml library bug popped out:
xml.dom.minidom was unable to parse an xml file that came from an
example provided by an official organism.(http://www.iptc.org/IPTC4XMP)
The parsed file was somewhat hairy, but I have been able to reproduce
the bug with a simplified
version, that goes:

<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 3.0-28,
framework 1.6'>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:iX='http://ns.adobe.com/iX/1.0/'>

<rdf:Description rdf:about='uuid:f5b64178-9394-11d9-bb8e-a67e6693b6e9'
xmlns:xmpPLUS='XMP Photographic Licensing Universal System (xmpPLUS,
http://ns.adobe.com/xap/1.0/PLUS/)'>
<xmpPLUS:CreditLineReq>False</xmpPLUS:CreditLineReq>
<xmpPLUS:ReuseAllowed>False</xmpPLUS:ReuseAllowed>
</rdf:Description>

</rdf:RDF>
</x:xmpmeta>
<?xpacket end='w'?>

The offending part is the one that goes: xmpPLUS='....'
it triggers an exception: ValueError: too many values to unpack,
in _parse_ns_name. Some debugging showed an obvious mistake
in the scanning of the name argument, that goes beyond the closing
" ' ".

Im aware I dont give here enough matter to allow full understanding
of the bug. But thats not the place for this, and thats not my point.

Now my points are:
- how do I spot the version of a given library? There is a __version__
attribute of the module, is that it?
- How do I access to a given library buglist? Maybe this one is known,
about to be fixed, it would then be useless to report it.
- How do I report bugs, on a standard lib?
- I tried to copy the lib somewhere, put it BEFORE the official lib in
"the path" (that is:sys.path), the stack shown by the traceback
still shows the original files being used. Is there a special
mechanism bypassing the sys.path search, for standard libs? (I may
be wrong on this, it seems hard to believe...)

- does someone know a good tool to validate an xml file?


btw, my code:

from nxml.dom import minidom
....
class whatever:
def __init__(self, inStream):
xmldoc = minidom.parse(inStream)



Thanks for any help...
 
E

Erik Johnson

Imbaud Pierre said:
Now my points are:
- how do I spot the version of a given library? There is a __version__
attribute of the module, is that it?

Yes, the module maintainer should be incrementing this version for each new
release and so it should properly correspond to the actual revision of code.
- How do I access to a given library buglist? Maybe this one is known,
about to be fixed, it would then be useless to report it.

Not exactly sure, but this is probably a good place to start:
http://docs.python.org/modindex.html
- How do I report bugs, on a standard lib?

I found this link:

http://sourceforge.net/tracker/?group_id=5470&atid=105470

by looking under the "help" item at www.python.org (an excellent starting
place for all sorts of things).
- I tried to copy the lib somewhere, put it BEFORE the official lib in
"the path" (that is:sys.path), the stack shown by the traceback
still shows the original files being used. Is there a special
mechanism bypassing the sys.path search, for standard libs? (I may
be wrong on this, it seems hard to believe...)

My understanding is sys.path is searched in order. The first entry is
usually the empty string, interpreted to mean the current directory. If you
modify sys.path to put the directory containing your modified code in front
of where the standard library is found, your code should be the one used.
That is not the case?
- does someone know a good tool to validate an xml file?

Typing "XML validator" into google returns a bunch. I think I would start
with the one at w3.org: http://validator.w3.org/
 
I

Imbaud Pierre

Erik Johnson a écrit :
Yes, the module maintainer should be incrementing this version for each new
release and so it should properly correspond to the actual revision of code.




Not exactly sure, but this is probably a good place to start:
http://docs.python.org/modindex.html
But python.org was the right entry point, it sent me to the bug
tracker: http://sourceforge.net/tracker/?group_id=5470&atid=105470
Its a bit short on explanations... And I found unsolved issues,
3 years old! this indexes the modules, not the buglist!
Right! Same place to fetch and to submit. Fair.
by looking under the "help" item at www.python.org (an excellent starting
place for all sorts of things).




My understanding is sys.path is searched in order. The first entry is
usually the empty string, interpreted to mean the current directory. If you
modify sys.path to put the directory containing your modified code in front
of where the standard library is found, your code should be the one used.
That is not the case?
I put it in front, as for the unix PATH...
Typing "XML validator" into google returns a bunch. I think I would start
with the one at w3.org: http://validator.w3.org/
Ill try this. Thanks a lot, my friend!
 
G

Gabriel Genellina

The offending part is the one that goes: xmpPLUS='....'
it triggers an exception: ValueError: too many values to unpack,
in _parse_ns_name. Some debugging showed an obvious mistake
in the scanning of the name argument, that goes beyond the closing
" ' ".

Now my points are:
- how do I spot the version of a given library? There is a __version__
attribute of the module, is that it?

Usually, yes. But it's not required at all, and may have another
name. Look at the offending module.
- I tried to copy the lib somewhere, put it BEFORE the official lib in
"the path" (that is:sys.path), the stack shown by the traceback
still shows the original files being used. Is there a special
mechanism bypassing the sys.path search, for standard libs? (I may
be wrong on this, it seems hard to believe...)

When the module is inside a package -as in this case- it's a bit
harder. Code says `import xml.dom.modulename`, not `import
modulename`. So even if you put modulename.py earlier in the path, it
won't be found.
Some alternatives:
- modify the library in-place. It's the easiest way if you don't
redistribute your code.
- same as above but using an installer (checking version numbers, of course)
- "monkey patching". That is, in a new module of your own, imported
early on your application, write the corrected version of the offending method:

def _parse_ns_name(...):
...doing the right thing...

from xml.dom import modulename
modulename._parse_ns_name = _parse_ns_name

(maybe checking version numbers too)


--
Gabriel Genellina
Softlab SRL






__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Imbaud said:
- how do I spot the version of a given library? There is a __version__
attribute of the module, is that it?

Contrary to what others have said: for modules included in the standard
library (and if using these modules, rather than using PyXML), you
should use sys.version_info to identify a version.
- How do I access to a given library buglist? Maybe this one is known,
about to be fixed, it would then be useless to report it.

Others have already pointed you to SF.
- How do I report bugs, on a standard lib?
Likewise.

- I tried to copy the lib somewhere, put it BEFORE the official lib in
"the path" (that is:sys.path), the stack shown by the traceback
still shows the original files being used. Is there a special
mechanism bypassing the sys.path search, for standard libs? (I may
be wrong on this, it seems hard to believe...)

Which lib? "minidom.py"? Well, you are likely importing
"xml.dom.minidom", not "minidom". So adding another minidom.py
to a directory in sys.path won't help.

Regards,
Martin
 
I

Imbaud Pierre

Martin v. Löwis a écrit :
Contrary to what others have said: for modules included in the standard
library (and if using these modules, rather than using PyXML), you
should use sys.version_info to identify a version.




Others have already pointed you to SF.




Which lib? "minidom.py"? Well, you are likely importing
"xml.dom.minidom", not "minidom". So adding another minidom.py
to a directory in sys.path won't help.

Regards,
Martin
I did import xml!
Maybe my mistake came from copying the whole tree from the standard
lib: comprising .pyc, .pyo... maybe the .pyc contained references to
previous sources?
Got rid of these, did reload ALL the modules, then exited/re-entered
the interpreter (ipython, btw...), and it eventually accessed the new
modules...

Btw, I pushed debugging further, the bug seem to stem from C code,
hence nothing easy to fix... Ill indeed submit a bug.
Thanks for your help! I obviously screamed for help before being
helpless, apologies...
 
I

Imbaud Pierre

I submitted a bug, to sourceforge. Was answered (pretty fast) the file
I dealt with was the buggy part. I then submitted a bug to the file
author, who agreed, and fixed. End of the story.
All I could complain about, with the xml.dom library, is how obscure
the exception context was: I did violate SOME xml rule, ideally the
exception should show the rule, and the faulty piece of data. But I
know this has a cost, both runtime cost and developper-s time cost.

Imbaud Pierre a écrit :
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,807
Latest member
ryef

Latest Threads

Top