Regular Expression Validator (screen-scrape)

Sparky Arbuckle · Apr 21, 2005

I'm trying to scrape the news from a page on my University's server.
The HTML below is what I need to get. The only problem is that I am not
that good with Regular Expressions so I was wondering if someone could
help me by either telling me how to go about figuring it out or
suggesting a good tutorial site that will make sense of this issue?
Thanks in advance!

<td width="70">4/12/2005</td><td> </td><td><a
href="/ucomm_news/articles/822.asp">Fairhaven College Hosts Human
Rights Film Festival April 13-17; April
20-24</a></td></tr><tr><td> </td><td> </td><td>BELLINGHAM
- Western Washington University's Fairhaven College will host the
Human Rights Film Festival on April 13-17 and April 20-24.
</td></tr><tr><td colspan="3">

Sparky Arbuckle · Apr 21, 2005

I've simplified the code above.

I need everything from:

<td width="70"> to <td colspan="3">

Wouldn't it look something like:

<td width="70">(.\n)*?td colspan="3">

??

Juan T. Llibre · Apr 21, 2005

Hi, Sparky,

Try this :

<td width="70"[^>]*>(.*?)<td colspan="3">
or this:
<td width="70">[^>]*>(.*?)<td colspan="3">
( not exactly sure which of the 2... )

Take notice that the *first* <td colspan="3"> tag will close the search.

The general rule is :

<tag[^>]*>(.*?)<endtag>

Sparky Arbuckle · Apr 21, 2005

Thanks Juan! I got it to work by using:

lblOutput.text = funScrape(strHTML, "<td width=(.)*?<td colspan=")

Now I am trying to take it a step further. I have
created a FOR NEXT in my code to try and edit the following HTML so
that I can remove everything from the to tags.

<td width="70">4/20/2005</td><td> </td><td><a
href="/ucomm_news/articles/834.asp">Fairhaven College to Host World
Issues Forums April 25,
27</

a></td></tr><tr><td> </td><td> </td><td>BELLINGHAM

Western Washington

Universitys Fairhaven College will host two World
Issues Forums on April 25 and April 27. The forums are free and open to

the public.
</td>

Ultimately I want to display only the Date <td width="70">DATE</TD> and

the hyperlink. I'm trying to use this FOR NEXT Loop:

IF objMatchCollection.Count > 0 THEN
FOR EACH objMatch in objMatchCollection
iStart = inStr(objMatch.Value,">")
iEnd = inStr(iStart,objMatch.Value,"")
iStart2 = inStr(objMatch.Value, "")
iEnd2 = len(objMatch.Value)
Response.write(objMatch.Index & objMatch.Value)
NEXT
ELSE
Response.write("No matches for " & strPattern)
END IF

Do I need to clarify or does this make sense?

Sparky Arbuckle · Apr 22, 2005

still no luck ;-(

Problems with Drop Down List Control	0	Nov 20, 2006
If Then within Subroutine that uses Parameters	0	Sep 13, 2007
JS Code Advice (greatly appreciated)	5	Feb 22, 2008
headache	1	Feb 23, 2007
Help with my responsive home page	2	Dec 14, 2022
Help with code	0	Jun 12, 2022
Problem with regex using mx flags	1	Dec 21, 2005
drop down box not working in FireFox when on DIV	2	Dec 15, 2005

Regular Expression Validator (screen-scrape)

Sparky Arbuckle

Sparky Arbuckle

Juan T. Llibre

Sparky Arbuckle

Sparky Arbuckle

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads