JavaScript web scraping test cases?

J

John J. Lee

I've put together a Python package for scraping / testing pages that
depend on embedded JavaScript code (without depending on IE, Mozilla
or Konqueror, and with the DOM etc. all implemented in pure Python --
mostly a hacked 4DOM, with some bits from pxdom; the JavaScript
interpreter I'm using ATM is spidermonkey). It's still missing a lot
and is pre-alpha, but it works, just barely.

Anyway, the point of this post is that I'm looking for pages to test
it on, so if you have a page that you'd like scraped (one that uses
JavaScript in some non-trivial way, of course! -- for dynamically
modifying forms, setting cookies, or whatever), mail me the details:
better that than some randomly-selected site from the Internet.
Obviously, it should be something that doesn't violate any terms &
conditions of use or otherwise cause people trouble, and preferably
that doesn't require any signup.


[In fact, TBH, my completely ad-hoc methodology with this is to write
some web scraping code, discover that the JavaScript breaks things,
often by depending on some nonstandard DOM feature, hack the DOM a
bit, etc. Hopefully I'll reach a point in understanding where I can
rewrite the DOM from scratch ('scratch' here being 4DOM), properly, to
match some approximation of 'HTML DOM as deployed'...]


John
 
J

John J. Lee

Anyway, the point of this post is that I'm looking for pages to test
it on, so if you have a page that you'd like scraped (one that uses
JavaScript in some non-trivial way, of course! -- for dynamically
modifying forms, setting cookies, or whatever), mail me the details:
better that than some randomly-selected site from the Internet.
Obviously, it should be something that doesn't violate any terms &
conditions of use or otherwise cause people trouble, and preferably
that doesn't require any signup.
[...]

Nobody?

I'll get my coat. ;-)


John
 
J

John J Lee

John> Nobody?

Sorry, I couldn't think of anything off the top of my head. In my own pages
[...]

Oh, I'm sure I'll have no trouble finding test cases -- I just thought
that, rather than some random sites that are of no use to anyone, there is
bound to be somebody out there who actually wanted to scrape a particular
page in the past, and had not bothered previously thanks to the
inconvenience of having to read & reproduce the effect of the JS code
(particularly code that messes about with forms). It would be nice to be
doing something useful at the same time as writing tests!

Of course, I already have those sites that gave rise to the 'itch' to do
this in the first place, but I'm sure there's lots of the browser object
model that they don't exercise...


John
 
J

John J. Lee

Cousin Stanley said:
I'm not sure what types of applications
you're looking for,

The kind that people actually want to use <wink>.

As I said, there's no problem finding test cases, I just thought that
while I was about this, somebody might happen be reading who was
actually trying to scrape a JS page.

but I have some JavaScript plots
that might be interesting to test ...

http://fastq.com/~sckitching/JS/Circle_MH.htm
[...]

Konqueror 3.1 didn't show anything, Mozilla 1.4 printed some pretty
circles, then froze!


John
 
C

Cousin Stanley

John ...

Although it's been a while since I tested these scripts
I thought I remembered testing successfully in both
Mozilla 0.95 and IE 5.1 at the time ...

I tested this morning using Moz 1.3.1 and 2 out of 3 failed,
but all 3 worked in IE 6 ...

The JS used in these scripts, although a bit hackish,
doesn't use any particular IE magic ...

I zipped up all 3 scripts for convenience,
if you want to look at the sources ...

http://fastq.com/~sckitching/JS/JS_Plots.zip

Differences in JS/DOM implementations from browser to browser
hurt my head and seem to be an endless source of problems
for web developers ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top