Suppose I download a html file with javascript in it, for example, ....
<script language="JavaScript">
document.write("Hello");
</script> ....
How can I translate that to pure html, ....
Hello
....
In this, simple, case it is fairly easy. Remove the script tags and
change calls to document.write into its value. But that only works when
all the script element contains are calls to document.write.
If the Javascript is more complicated, you will have to execute it to
know what it does. The simplest solution would be to load a browser
and put the page into it, and then extract the generated HTML.
Otherwise, you will need an implementation of Javscript and a runtime
environment similar ot that provided by a browser in order to execute
the code correctly.
If you are aiming for the Windows platform, You can probably remotely
control IE, or make an instance of the HTML rendering widget, and put
the code into it. When (or if) it finishes parsing the page, including
the embedded Javascript, you can use
document.documentElement.innerHTML to get the contents of the
resulting page's html element.
If you are using this for anything critical, and the user supplies the
page, then they can make pages where the Javascript never finishes
executing. A simple "while(true){}" can do it, but you can check in
advance whether a script will loop forever or not, it can always be
disguised to fool your check. You will have to time-out the browser
if it runs for too long.
You will also be open to any security bugs in IE.
<URL:
http://www.pivx.com/larholm/unpatched/>
/L