A
Alessio Pace
Hi, I need to get a sort of DOM from an HTML page that is declared as XHTML
but unfortunately is *not* xhtml valid.. If I try to parse it with
xml.dom.minidom I get error with expat (as I supposed), so I was told to
try in this way, with a "forgiving" html parser:
from xml.dom.ext.reader import HtmlLib
reader = HtmlLib.Reader()
dom = reader.fromUri(url) # 'url' the web page
FIRST ISSUE:
It seemed to me, reading the source code in
$MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
that these are 4DOM APIs , so from what I know of python distributions, they
are extra packages, or not? I would like to use *only* libs that are
available in the python2.2 suite, not any extra.
SECOND ISSUE:
If the above libs were included in python (and so I would continue using
them), how do I print a string representation of a (sub) tree of the DOM? I
tried with .toxml() as in the XML tutorial but that method does not exist
for the FtNode objects that are involved there... Any idea??
Thanks so much for who can help me
but unfortunately is *not* xhtml valid.. If I try to parse it with
xml.dom.minidom I get error with expat (as I supposed), so I was told to
try in this way, with a "forgiving" html parser:
from xml.dom.ext.reader import HtmlLib
reader = HtmlLib.Reader()
dom = reader.fromUri(url) # 'url' the web page
FIRST ISSUE:
It seemed to me, reading the source code in
$MY_PYTHON_INSTALLATION_DIR/site-packages/_xmlplus/dom/ext/reader/ ,
that these are 4DOM APIs , so from what I know of python distributions, they
are extra packages, or not? I would like to use *only* libs that are
available in the python2.2 suite, not any extra.
SECOND ISSUE:
If the above libs were included in python (and so I would continue using
them), how do I print a string representation of a (sub) tree of the DOM? I
tried with .toxml() as in the XML tutorial but that method does not exist
for the FtNode objects that are involved there... Any idea??
Thanks so much for who can help me