Grab Data Displayed On Other Web Page Using document.write

6

'69 Camaro

Perhaps I'm Googling for the wrong terms. Does anyone have links to
examples of the syntax necessary to read the HTML on another Web page when
that HTML is produced from JavaScript using the document.write( ) method?

For a simplified example, I have two Web pages. Page 1 uses JavaScript with
the following:

htmlData = "<B>This is bold text.</B>";
document.write(htmlData);

Page 1 displays in bold text:

This is bold text.

Page 2 needs to get the markup for page 1, i.e., just "<B>This is bold
text.</B>" (which includes the tags), not the JavaScript code listed above.
I've tried using the responseText property of the MSXML2.XMLHTTP.3.0 object,
but it gives me the JavaScript used for rendering page 1, not the markup
(that's stored in the htmlData variable).

The ultimate goal is to grab the data displayed on a Web page and display
only the items needed on another Web page. I can parse the HTML based upon
the tags to target exactly the data I want. That's why I need to read the
tags.

Suggestions for other approaches are welcome. I'm the author of both Web
pages, so I have some leeway. Preference is for client-side JavaScript. I
have some experience in JavaScript, but I'm a C/Java programmer.
Cross-platform compatibility is preferred, and the majority of browsers will
be IE6. I can also run Perl scripts on the Web server, but I have little
experience with Perl, so this would be an opportunity to learn more. In
case it matters, the Web server is Apache on Linux.

Thanks.

Gunny
 
M

Martin Honnen

'69 Camaro wrote:

For a simplified example, I have two Web pages. Page 1 uses JavaScript with
the following:

htmlData = "<B>This is bold text.</B>";
document.write(htmlData);

Page 1 displays in bold text:

This is bold text.

Page 2 needs to get the markup for page 1, i.e., just "<B>This is bold
text.</B>" (which includes the tags), not the JavaScript code listed above.
I've tried using the responseText property of the MSXML2.XMLHTTP.3.0 object,
but it gives me the JavaScript used for rendering page 1, not the markup
(that's stored in the htmlData variable).

That script code is being executed by a browser/user agent when it
renders the HTML document so one way to access that contents with script
is to use script to automate a browser. On Windows Microsoft IE can be
automated so you use a Windows Script Host script to create an IE
instance, load a URL and then read out the innerHTML or outerHTML of an
element in the browser's document object model. Of course if you have
the complete document tree object model then I am not sure you need the
serialized markup to reparse it yourself. And you should be aware that
the browser object model will usually include both the contents created
by script and the script element itself.
That way you have a script application to be run on a system where IE is
installed and IE loads a remote URL.

Another approach might be HTTP Unit
<http://www.httpunit.org/>
although I don't know how good their script support is and whether they
allow access to elements and/or serialized markup.

None of that however allows you to have browser-side script in one HTML
document access the DOM created by script in another HTML document. With
the same origin policy that is only possible if you have two documents
on the same server, then you can use frames or windows and use cross
frame or cross window script techniques.
 
T

Thomas 'PointedEars' Lahn

Martin said:
None of that however allows you to have browser-side script in one HTML
document access the DOM created by script in another HTML document. With
the same origin policy that is only possible if you have two documents
on the same server, [...]

It is still only the same second-level domain, will you recognize that?


PointedEars
 
6

'69 Camaro

in message
On Windows Microsoft IE can be automated so you use a Windows Script Host
script to create an IE instance, load a URL and then read out the
innerHTML or outerHTML of an element in the browser's document object
model.

Thanks for that info. I now have an option for visitors with IE browsers.
And you should be aware that the browser object model will usually include
both the contents created by script and the script element itself.

So the contents created by the script on Web page 1 can be accessed using
the innerHTML property of the document.body element of Web page 1? If so,
does this apply just to IE or does it also apply to other common browsers,
such as Firefox, Netscape, and Opera?
Another approach might be HTTP Unit
<http://www.httpunit.org/>
although I don't know how good their script support is and whether they
allow access to elements and/or serialized markup.

Thanks for that info. It can help with the automated testing of Web
applications which I'd recently been wondering about, so thanks for
answering another question of mine!
With the same origin policy that is only possible if you have two
documents on the same server, then you can use frames or windows and use
cross frame or cross window script techniques.

Both HTML documents are in the same subdomain on the same Web server, so I
think this satisfies the same origin policy. I've been Googling on
cross-window scripting, looking for examples of cross-window scripting
syntax, but all I've seen so far are examples that spawn a new window (which
I'd rather avoid unless I can make it invisible to the user -- which I don't
know how to do), and then write to the new window, but I need to read its
contents instead.

Do you have any links to example syntax for accessing a Web page's DOM
without spawning a new window (if this is even possible), or for hiding a
spawned window, or for reading the window's contents if my guess on the
innerHTML property of the document.body element of Web page 1 is incorrect?

Thanks.

Gunny
 
R

Randy Webb

'69 Camaro said the following on 1/11/2006 3:45 PM:

Do you have any links to example syntax for accessing a Web page's DOM
without spawning a new window (if this is even possible), or for hiding a
spawned window, or for reading the window's contents if my guess on the
innerHTML property of the document.body element of Web page 1 is incorrect?

If all you are wanting to do is get the contents of a page, load it in a
hidden IFrame and access it from there using the Frames collection.

window.frames['IFrameNAMEnotID'].property
 
6

'69 Camaro

If all you are wanting to do is get the contents of a page, load it in a
hidden IFrame and access it from there using the Frames collection.

window.frames['IFrameNAMEnotID'].property

Thanks. I'm not familiar with Frames or IFrames, but I figured out how to
create an IFrame and make it invisible. Now I'm stuck on the syntax for
trying to read the HTML produced by the document.write( ) method of the Web
page loaded into that IFrame. To give a very simplified example, say the
following Web page was loaded into the IFrame:

<HTML>
<BODY>
<SCRIPT Language = "JavaScript" TYPE = "text/javascript">
<!--
var htmlData;
htmlData = "<B>This is bold text.</B>";
document.write(htmlData);
//-->
</SCRIPT>
</BODY>
</HTML>

.. . . and I want to read the HTML produced by the document.write( ) method.
In the main window, I'd like to use something like the following:

html = window.frames['StatsFrame'].whateverGoesHere;

.. . . where the variable, html, receives "<B>This is bold text.</B>".

Any suggestions on the correct syntax for "whateverGoesHere"?

Thanks.

Gunny
 
R

Randy Webb

'69 Camaro said the following on 1/11/2006 7:12 PM:
If all you are wanting to do is get the contents of a page, load it in a
hidden IFrame and access it from there using the Frames collection.

window.frames['IFrameNAMEnotID'].property


Thanks. I'm not familiar with Frames or IFrames, but I figured out how to
create an IFrame and make it invisible. Now I'm stuck on the syntax for
trying to read the HTML produced by the document.write( ) method of the Web
page loaded into that IFrame. To give a very simplified example, say the
following Web page was loaded into the IFrame:

<HTML>
<BODY>
<SCRIPT Language = "JavaScript" TYPE = "text/javascript">
<!--
var htmlData;
htmlData = "<B>This is bold text.</B>";
document.write(htmlData);
//-->
</SCRIPT>
</BODY>
</HTML>

.. . . and I want to read the HTML produced by the document.write( ) method.
In the main window, I'd like to use something like the following:

html = window.frames['StatsFrame'].whateverGoesHere;

.. . . where the variable, html, receives "<B>This is bold text.</B>".

Any suggestions on the correct syntax for "whateverGoesHere"?


IFrame Code:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"
"http://www.w3.org/TR/REC-html40/strict.dtd">
<html>
<head>
<title>Form Test Page</title>
<script type="text/javascript">
var htmlData;
htmlData = "<B>This is bold text.</B>";
document.write(htmlData);
</script>
</head>
<body>
</body>
</html>

Main Page Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>Frame test</title>
</head>
<body>
<iframe src="blank.html" name="myIFrame">Test text here</iframe>
<button
onclick="alert(window.frames['myIFrame'].document.body.innerHTML)">Show
Title</button>
</body>
</html>

Opera 8, Firefox and IE all give me simply the HTML that was generated.
Namely, "<B>This is bold text.</B>"

So it appears the above should be close to what you want or at least
close enough to get you started on it.
 
6

'69 Camaro

in message
So it appears the above should be close to what you want or at least close
enough to get you started on it.

Thank you! That gets me much closer to where I want to be. However, I'm
still trying to get over the hurdle of determining from the second Web page
what's in the the string variable used for the document.write( ) method on
the first Web page. While I gave a simplified example of the JavaScript on
my Web page, the value passed to document.write( ) is actually the value of
the property of an object, not a string literal. The object is returned
from a function that I pass variables to, and the function dynamically
creates the HTML needed to display a Web page with all of the data. For
example:

stats = getStats(uid, mid, gid, lg, crit, sUrl);
htmlData = stats.Text;
document.write(htmlData);

.. . . and the value in the stats.Text property would be the string
containing the HTML with the data and the tags I need to parse, like
"<B>This is bold text.</B>". This markup never actually appears on Web page
1, so I don't know how to read it from Web page 2 and extract only the data
I need.

Any suggestions on how to read from Web page 2 either the value stored in
htmlData or the Web page contents created by my script on Web Page 1? I've
tried using getElementByName( ) and getElementByID( ) to retrieve the
htmlData variable (or value) in the frame object, but I don't have the
correct syntax because it bombs.

Thanks.

Gunny
 
R

Randy Webb

'69 Camaro said the following on 1/12/2006 10:02 AM:
in message



Thank you! That gets me much closer to where I want to be. However, I'm
still trying to get over the hurdle of determining from the second Web page
what's in the the string variable used for the document.write( ) method on
the first Web page. While I gave a simplified example of the JavaScript on
my Web page, the value passed to document.write( ) is actually the value of
the property of an object, not a string literal. The object is returned
from a function that I pass variables to, and the function dynamically
creates the HTML needed to display a Web page with all of the data. For
example:

stats = getStats(uid, mid, gid, lg, crit, sUrl);
htmlData = stats.Text;
document.write(htmlData);

If the name of that variable is always htmlData, then you can access it
by window.frames['IFrameNAMEnotID'].htmlData
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top