Java HTML Parser

A

Adam

Hello all,

I would like to get some advise from someone that knows a lot more than
I. I need a special purpose Java HTML parser. I have seen several out there,
but none meet my needs. What I need to do is GET a web page, find some items
with check boxes, set the appropriate selections and post the data back.
Most of the parsers let one search for HTML tags, links, etc, but not items
like check boxes. I need the name and value of the check boxes in question
so I can POST the desired values.

TIA,
Adam

P.S. As you can probably tell from my E-Mail, I am fairly new to web
programming.
 
C

Chris Smith

Adam said:
I would like to get some advise from someone that knows a lot more than
I. I need a special purpose Java HTML parser. I have seen several out there,
but none meet my needs. What I need to do is GET a web page, find some items
with check boxes, set the appropriate selections and post the data back.
Most of the parsers let one search for HTML tags, links, etc, but not items
like check boxes. I need the name and value of the check boxes in question
so I can POST the desired values.

Pretty much any HTML parser will do. Two that come to mind are JTidy
(which was historically a formatter, but includes a parser with an API
that can be used stand-alone) or Xerces with NekoHTML (Xerces by itself
is an XML parser, Neko extends it via XNI to cover HTML as well).
There's even a parser in javax.swing.text.html, although the output
format is horrid.

Any of the above will be sufficient for your task. The posting part is
pretty trivial with Jakarta Commons HttpClient, or could be done with
URLConnection (although that's a little messy for anything beyond just
retrieving a file).

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
M

Missaka Wijekoon

Adam said:
Hello all,

I would like to get some advise from someone that knows a lot more than
I. I need a special purpose Java HTML parser. I have seen several out there,
but none meet my needs. What I need to do is GET a web page, find some items
with check boxes, set the appropriate selections and post the data back.
Most of the parsers let one search for HTML tags, links, etc, but not items
like check boxes. I need the name and value of the check boxes in question
so I can POST the desired values.

Adam,

You might want to take a look at
http://sourceforge.net/projects/jtidy
The code that lets you look at HTML as DOM is probablt what you need.

-Misk
 
C

Chris Uppal

Adam said:
What I need to do is GET a web page,
find some items with check boxes, set the appropriate selections and post
the data back.

If you need to be able to do this with arbitrary web pages then you are
probably hosed. People use JavaScript to generate all sort of stuff
dynamically on the client-side which makes the problem of recovering forms by
parsing the HTML a little... tricky.

Of course, you may not have a requirement to parse /arbitrary/ web pages, in
which case you can probably[*] use one of the parsers already mentioned to look
for INPUT fields inside FORM elements.

-- chris

([*] I say "probably" because I haven't looked at any of the suggested parsers
myself.)
 
Z

zero

Hello all,

I would like to get some advise from someone that knows a lot more
than
I. I need a special purpose Java HTML parser. I have seen several out
there, but none meet my needs. What I need to do is GET a web page,
find some items with check boxes, set the appropriate selections and
post the data back. Most of the parsers let one search for HTML tags,
links, etc, but not items like check boxes. I need the name and value
of the check boxes in question so I can POST the desired values.

TIA,
Adam

P.S. As you can probably tell from my E-Mail, I am fairly new to web
programming.

If you're a brave man with a lot of time you could do the world a favour
and create a good Java HTML parser. I have yet to find one that can
compete with top(*) products like the parsers of IE, Opera or Mozilla.

Of course, if you just want a quick answer with basic functionality, look
at what the others posted :)

(*) by top I mean they work in most cases, not that they're actually that
great. Each of those products has its failings. But they still do a lot
better than any Java product I've seen.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top