P
Peter Maas
Hi,
I have a problem parsing html text with xmldom. The following code
runs well:
--------------------------------------------
from xml.dom.ext.reader import HtmlLib
from xml.dom.ext import PrettyPrint
r = HtmlLib.Reader()
doc = r.fromString(
'''
<html>
<head>
</head>
<body>
<p>hallo welt
</body>
</html>
''')
PrettyPrint(doc)
--------------------------------------------
but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml">
I get the error
Traceback (most recent call last):
File "xhtml.py", line 5, in ?
doc = r.fromString(
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 69, in fromString
return self.fromStream(stream, ownerDoc, charset)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 27, in fromStream
self.parser.parse(stream)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 57, in parse
self._parser.parse(stream.read())
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 160, in finish_starttag
unicode(value, self._charset))
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Element.py", line 177, in setAttributeNS
attr = self.ownerDocument.createAttributeNS(namespaceURI, qualifiedName)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Document.py", line 139, in createAttributeNS
raise NamespaceErr()
xml.dom.NamespaceErr: Invalid or illegal namespace operation
A lot of HTML documents on Internet have this xmlns=.... Are
they wrong or is this a PyXML bug?
Mit freundlichen Gruessen,
Peter Maas
I have a problem parsing html text with xmldom. The following code
runs well:
--------------------------------------------
from xml.dom.ext.reader import HtmlLib
from xml.dom.ext import PrettyPrint
r = HtmlLib.Reader()
doc = r.fromString(
'''
<html>
<head>
</head>
<body>
<p>hallo welt
</body>
</html>
''')
PrettyPrint(doc)
--------------------------------------------
but if I replace <html> by <html xmlns="http://www.w3.org/1999/xhtml">
I get the error
Traceback (most recent call last):
File "xhtml.py", line 5, in ?
doc = r.fromString(
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 69, in fromString
return self.fromStream(stream, ownerDoc, charset)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\HtmlLib.py", line 27, in fromStream
self.parser.parse(stream)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 57, in parse
self._parser.parse(stream.read())
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\ext\reader\Sgmlop.py", line 160, in finish_starttag
unicode(value, self._charset))
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Element.py", line 177, in setAttributeNS
attr = self.ownerDocument.createAttributeNS(namespaceURI, qualifiedName)
File "C:\PROGRA~1\Python23\lib\site-packages\_xmlplus\dom\Document.py", line 139, in createAttributeNS
raise NamespaceErr()
xml.dom.NamespaceErr: Invalid or illegal namespace operation
>Exit code: 1
A lot of HTML documents on Internet have this xmlns=.... Are
they wrong or is this a PyXML bug?
Mit freundlichen Gruessen,
Peter Maas