Gathering Links

J

joey__

Hello,
I am looking for some help on a regex expression. I would like a regexp
that matches against Html Links. I have tried, but I can't seem to get
anything working. I would appreciate help.

Thanks
Joey
 
E

Eero Saynatkari

joey__ said:
Hello,
I am looking for some help on a regex expression. I would like a regexp
that matches against Html Links. I have tried, but I can't seem to get
anything working. I would appreciate help.

You might just want to run the HTML through htmltidy
to generate an XML document and parse that or then use
the htree library for the same purpose, it would probably
be the more robust solution.

On the other hand, if you want to use regexps,
something like this would work (though not tested).

First you have to match the beginning tag
(there might be some whitespace:

/<\s*a

Next, gather any attributes in the opening tag:

([^>]*)>

The link text comes next:

(.*?)

The text section is ended by the closing anchor
tag (no other tags are appropriate):

<\s*\/\s*a[^>]*>

Finally, we want to match case-insensitively
(A vs. a) and over multiple lines:

/im

So, $1 will be the attributes and $2 the link text.
Thanks
Joey


E
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,289
Messages
2,571,435
Members
48,122
Latest member
GeneBiddle

Latest Threads

Top