Parsing a web page

H

HMS Surprise

Greetings,

I am using a tool called maxq (maxq.tigris.org) that acts a proxy and
is used for web page testing. It generates jython scripts for playback
of web sessions and testing. It lacks a tool for parsing a web page,
it only provides the return code. I need to parse the page to get some
timestamp information. Since the tool has access to java routines I
was wondering if there is one I could use for this purpose. Searching
the archive I found a thread with the following:

URL url = new URL("http://.......");
URLConnection URLconnect = url.openConnection();
BufferedReader br = new BufferedReader(new
InputStreamReader(URLconnect.getInputStream()))

This gives me hope, but I do not know what the mechanism is for
importing java libs.

Thanks,

jh
 
N

NathanIEI

Greetings,

I am using a tool called maxq (maxq.tigris.org) that acts a proxy and
is used for web page testing. It generates jython scripts for playback
of web sessions and testing. It lacks a tool for parsing a web page,
it only provides the return code. I need to parse the page to get some
timestamp information. Since the tool has access to java routines I
was wondering if there is one I could use for this purpose. Searching
the archive I found a thread with the following:

URL url = new URL("http://.......");
URLConnection URLconnect = url.openConnection();
BufferedReader br = new BufferedReader(new
InputStreamReader(URLconnect.getInputStream()))

This gives me hope, but I do not know what the mechanism is for
importing java libs.

Thanks,

jh


If I understand your problem correctly, there is a library I use for
this called HTTPUnit (http://httpunit.sourceforge.net/). This library
is made specifically for unit testing, but can be used for much more.

This library allows you to create a 'WebConversation' which will
maintain a state, get WebResponses which have methods to access forms
(ex getFirstMatchingForm(WebForm.MATCH_NAME, "fooForm") ), methods to
get Elements in the form (or page) (ex.
getElementWithID("fooElement").getText() ), etc..

I've used HttpUnit to build a simple webcrawler (quickly) that i'm
quite happy with...

Nate.
 
H

HMS Surprise

Many thanks Nate.

I have already bookmarked it I will certainly look into this in depth
as soon as the current fire drill is over.

jh
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top