How to get the DOM from a XML page

novostik · Nov 27, 2006

Hello guys,
I want to get the DOM of an XML page.for eg:an XML
page, being converted from HTML using Tidy,is:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Windows (vers 14 February 2006), see www.w3.org">
<title></title>
</head>
<body>
</body>
</html>

should print out html---head---meta ----title.

I have used the following code in perL....
-------------------------------------------------------------------------------------------------------------------------------------
use XML:

OM;
my $parser = new XML:

OM:

arser;
my $doc = $parser->parsefile ("ig.xml");
my $nodes=$doc->getDocumentElement();
print "\n";
print $nodes->getNodeName();
print "--";
@x=$nodes->getChildNodes();

&find(@x);

sub find
{
my (@z)=@_;
foreach $z(@z)
{
@y=$z->getChildNodes();
if($z->getNodeType == ELEMENT_NODE)
{

print $z->getNodeName();
print"--";
}
&find(@y);
}
}

# Avoid memory leaks - cleanup circular references for garbage
collection
$doc->dispose;
---------------------------------------------------------------------------------------------------------------------------------------------

The problem is that it gives an output for some files but gives some
error message for other like the google and yahoo hompages.
could you please help me out on this as I was not able to rectify
it.Why does it work for some page and why not for others?
Could you please provide me a solution for this....

John Bokma · Nov 27, 2006

[email protected] said:
The problem is that it gives an output for some files but gives some
error message for other like the google and yahoo hompages.
could you please help me out on this as I was not able to rectify
it.Why does it work for some page and why not for others?
Could you please provide me a solution for this....

I am guessing here, but XHTML is widely used, but wrong. Most people using
it have no clue what XHTML means, and hence use it like HTML and end up
with documents that are not well-formed. If you want to parse stuff that's
out on the web, use something like HTML::TreeBuilder.

If you make your own XHTML pages, you might want to think again, twice
even.

Brian McCauley · Nov 28, 2006

Hello guys,
I want to get the DOM of an XML page.for eg:an XML
page, being converted from HTML using Tidy,is:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<meta name="generator" content=
"HTML Tidy for Windows (vers 14 February 2006), seewww.w3.org">
<title></title>
</head>
<body>
</body>
</html>

Excuse me stating the obvious but that's not XML, it's HTML. It's tidy
HTML but still HTML. IIRC it's possible to instruct "tidy" to emit
xhtml (which is XML).

How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
How to position the tooltip comment on these buttons?	9	Nov 4, 2023
Image shifts to the right when export the page to pdf	4	May 5, 2023
How to remove an empty line which is created when i deleted a element from my xml file?	0	Oct 1, 2016
problem of python whitespace XML dom	0	Jan 13, 2016
How to store data from a sign up form on a website into an sql databse	1	Sep 9, 2022
How to check the validation of js files or html files including js?	6	Jan 12, 2020
How do I fix this issue in sqaurespace code block?	1	Jul 2, 2024

How to get the DOM from a XML page

novostik

John Bokma

Brian McCauley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads