K
kyosohma
Hi,
I am attempting to extract some XML from an HTML document that I get
returned from a form based web page. For some reason, I cannot figure
out how to do this. I thought I could use the minidom module to do it,
but all I get is a screwy traceback:
Traceback (most recent call last):
File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy
\xml_parser.py", line 69, in ?
inst = ApptParser(url)
File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy
\xml_parser.py", line 19, in __init__
xml = self.getXml(url)
File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy
\xml_parser.py", line 30, in getXml
doc = xml.dom.minidom.parse(f)
File "C:\Python24\lib\xml\dom\minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "C:\Python24\lib\xml\dom\expatbuilder.py", line 928, in parse
result = builder.parseFile(file)
File "C:\Python24\lib\xml\dom\expatbuilder.py", line 207, in
parseFile
parser.Parse(buffer, 0)
ExpatError: mismatched tag: line 1, column 357
Here's a sample of the html:
<html>
<body>
lots of screwy text including divs and spans
<Row status="o">
<RecordNum>1126264</RecordNum>
<Make>Mitsubishi</Make>
<Model>Mirage DE</Model>
</Row>
</body>
</html>
What's the best way to get at the XML? Do I need to somehow parse it
using the HTMLParser and then parse that with minidom or what?
Thanks a lot!
Mike
I am attempting to extract some XML from an HTML document that I get
returned from a form based web page. For some reason, I cannot figure
out how to do this. I thought I could use the minidom module to do it,
but all I get is a screwy traceback:
Traceback (most recent call last):
File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy
\xml_parser.py", line 69, in ?
inst = ApptParser(url)
File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy
\xml_parser.py", line 19, in __init__
xml = self.getXml(url)
File "\\mcisnt1\repl$\Scripts\PythonPackages\Development\clippy
\xml_parser.py", line 30, in getXml
doc = xml.dom.minidom.parse(f)
File "C:\Python24\lib\xml\dom\minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "C:\Python24\lib\xml\dom\expatbuilder.py", line 928, in parse
result = builder.parseFile(file)
File "C:\Python24\lib\xml\dom\expatbuilder.py", line 207, in
parseFile
parser.Parse(buffer, 0)
ExpatError: mismatched tag: line 1, column 357
Here's a sample of the html:
<html>
<body>
lots of screwy text including divs and spans
<Row status="o">
<RecordNum>1126264</RecordNum>
<Make>Mitsubishi</Make>
<Model>Mirage DE</Model>
</Row>
</body>
</html>
What's the best way to get at the XML? Do I need to somehow parse it
using the HTMLParser and then parse that with minidom or what?
Thanks a lot!
Mike