Regexp: Negation with backreference?

j.vimal · May 30, 2006

Hi
I would like to extract the anchors from a page. This is the simple
pattern I wrote:
/(<[aA]\\s[^>]*>[^<]*<\/a>)/

Note that it is to be used with a programming language, say php, but
the syntax is same that of Perl (almost) except for escape sequences.

Now, after I have got all the anchors, I want to parse them, to get the
href and title attributes.
For the href, I wrote

\\bhref\\s*=\\s*(["'])([^\\1])\\1

I search for href at the start of a word boundary, then skip spaces,
then the equal to, then skip spaces, then, I get the quotes. This is
reference 1. Now, I want to continue till I dont encounter the same
reference 1. Then, the last character is again reference 1.

So, is this syntax right? It doesnt seem to work for me ...

And, ofcourse, the quotes need not be there. I will change it

Thanks!

Paul Lalli · May 30, 2006

j.vimal said:
Hi
I would like to extract the anchors from a page. This is the simple
pattern I wrote:
/(<[aA]\\s[^>]*>[^<]*<\/a>)/

Wrong approach. Use an HTML Parsing module to parse HTML.

Note that it is to be used with a programming language, say php, but
the syntax is same that of Perl (almost) except for escape sequences.

Wow, coincidentally, this is almost a group that deals with languages
other than Perl!

comp.lang.php is over there---->

Paul Lalli

j.vimal · May 30, 2006

Ok ... But say I really want to do it this way, to learn Regexp,

Then ?

But why do you say that this is a wrong way? Are there performance
issues?

Paul Lalli · May 30, 2006

j.vimal said:
Ok ... But say I really want to do it this way, to learn Regexp,

There is no such thing. Regexps are not a universal concept. You can
not take on regular expression for Perl and just assume it will work
the same way in any other language.

Then ?

But why do you say that this is a wrong way? Are there performance
issues?

No, there are ability issues. Regular expressions cannot (correctly)
parse HTML.

Paul Lalli

j.vimal · May 30, 2006

Ok. Then, I think, or my purpose, it suits.
My purpose is just to visualize the various links in a given wikipedia
article. Since they follow a common method to address their links,
Regular expressions would serve my purpose without much overhead of a
HTML parser

Thanks
Vimal

Gunnar Hjalmarsson · May 30, 2006

Paul said:
No, there are ability issues. Regular expressions cannot (correctly)
parse HTML.

Whether a regex is sufficient or not is reasonably up to the programmer
to evaluate, given the circumstances of the particular task.

http://faq.perl.org/perlfaq9.html#How_do_I_extract_URL

Xicheng Jia · May 30, 2006

j.vimal said:
Hi
I would like to extract the anchors from a page. This is the simple
pattern I wrote:
/(<[aA]\\s[^>]*>[^<]*<\/a>)/

Note that it is to be used with a programming language, say php, but
the syntax is same that of Perl (almost) except for escape sequences.

Now, after I have got all the anchors, I want to parse them, to get the
href and title attributes.
For the href, I wrote

\\bhref\\s*=\\s*(["'])([^\\1])\\1

this pattern matches only "one" character between two quotes or in
$2.

And I guess [^\\1] does not work as you thought it might be [^"] or
[^']. you can try the non-greedy form of dot* which will immediately
jump to the next \1 and then backtrack:

\bhref\s*=\s*(["'])(.*?)\1

or you may use conditional construct if two balanced quotes are
optional:

\bhref\s*=\s*(["'])?(.*?)(?(1)\1|\s)
(untested)

BTW. why would you use double backslashes to escape those special
characters??

Xicheng

help with regexp	5	Feb 7, 2013
regexp help - substring of a backreference	4	Aug 7, 2010
Pater matching and backreference	4	Aug 22, 2005
Replace an occurrence of a regexp with a function call on a substringof the match, multiple times on	4	Sep 16, 2013
Help with Visual Lightbox: Scripts	2	May 3, 2023
understanding regexp, Text::ParseWords	2	Nov 5, 2010
small regexp help	1	Oct 30, 2013
regexp	9	May 9, 2006

Regexp: Negation with backreference?

j.vimal

Paul Lalli

j.vimal

Paul Lalli

j.vimal

Gunnar Hjalmarsson

Xicheng Jia

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads