process XMLHTTP response returning poorly formed html

D

dandiebolt

Using xmlhttp I am accessing a document from the web that is not xml
and is in fact not even proper html even though it is supposed to be
(unbalanced tags). Here is the type of code I am using:

url="http://www.domain.com/page.html";
var xmlhttp = new ActiveXObject("Msxml2.XMLHTTP");
xmlhttp.open("GET", url, false);
xmlhttp.send();
var xmlResp = xmlhttp.responseXML;

I want to create an array that holds the contents of every paragraph
<P> tag. The paragraphs are well formed with both opening and closing
tags <P> and </P> but the document as a whole is not valid xml or html.
How can I process the so-called xml response to extract the contents of
each paragraph into a unique element of the array? I would like to use
DOM methods to extract the paragraphs rather than parse up the text
with xmlhttp.responseText. Any help would be appreciated.
 
D

dandiebolt

strout said:
It will be easy to parse by regular expression.

How? The only think I know about the document is that the information I
need is in between successive <P> and </P> tags. I was reluctant to
use regexp because I have any more structure than what is described and
I don't control the format of the page source.
 
T

Tim Williams

Why not automate IE to load the page and then grab your content once
IE has done it's job ? Presumably that will "fix" any irregularities
in the source.

Depends on where you want to do this, but is an option. Or just use
response.text and MSHTML ?

Tim.
 
J

Jim Ley

Using xmlhttp I am accessing a document from the web that is not xml
and is in fact not even proper html even though it is supposed to be
(unbalanced tags).

There's nothin inherent in unbalanced tags that would make something
not valid HTML - html fullly allows lots of closing elements as
optional.
How can I process the so-called xml response to extract the contents of
each paragraph into a unique element of the array? I would like to use
DOM methods to extract the paragraphs rather than parse up the text
with xmlhttp.responseText. Any help would be appreciated.

using the browser to parse the responseText as html will give you a
DOM of it, there's no other reasonable solution.

Jim.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top