Regex help please

T

Tim Nash (aka TMN)

Hi

Can anyone help me match this div below - my regex does not work - if
you could tell me why I would appreciate it.

var aStr = "<div class='feedflare'>dfgdg dg</div>";
var reg = new RegExp("<div class='feedflare'.*?</div>'","gim");


thanks
Tim
 
P

pr

Tim said:
Can anyone help me match this div below - my regex does not work - if
you could tell me why I would appreciate it.

var aStr = "<div class='feedflare'>dfgdg dg</div>";
var reg = new RegExp("<div class='feedflare'.*?</div>'","gim");

-------------------------------------------------------^
That apostrophe shouldn't be there.

The 'm' flag is unnecessary.
 
T

Tim Nash (aka TMN)

After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');

Tim
 
T

Thomas 'PointedEars' Lahn

Tim said:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');

Single-escaping the apostrophe within a double-quoted string literal is
useless ("\'" == "'"), and attr=['"]...['"]* is pointless (the star repeats
the previous expression zero or more times; here: ['"]). It would also be a
lot easier to maintain if you used a RegExp literal instead.

var reg = /<div[^>]class\s*=\s*['"]feedflare['"]>(.*?)<\/div>/gi;

That still does not exclude the possibility of e.g.

<divaclass="feedflare'>...</div>

which is not Valid. As for the element type identifier followed by optional
attributes, you should use

<ident(|\s+attr...)>

because whitespace after the identifier is required if there are attributes.
As for the matching quotes, you should use

('foo'|"foo")

However, RegExp literals and non-greedy matching (`.*?') are not universally
supported, with the latter being the more important fact here. See also:

<http://pointedears.de/scripts/es-matrix/>

Also note that a single regular expression cannot be used to parse an
*arbitrary* fragment of an SGML-based markup language; either it is too
greedy or not greedy enough. For example, in

<div class="foo"><div>bar</div></div>

this non-greedy expression would match `<div class="foo"><div>bar</div>'.
with the outer `div' element not being closed.

So, for reliable parsing, you will need to implement a push-down automaton;
however, its parsing algorithm can be made more efficient with regular
expressions.

Unsurprisingly, all this has been discussed here before. Please search
before you post.

<http://jibbering.com/faq/>


PointedEars
 
P

pr

Tim said:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');

To match a string starting with any of the following common permutations:

<div class="feedflare">
<div style="color: red;" class="feedflare">
<div class="feedflare" id="div1">
<div class="class1 feedflare class3">

you will instead need something like:

/<div\b[^>]+\bclass\s*=\s*(['"])[\w\s]*\bfeedflare\b[\w\s]*\1[^>]*>(.*?)<\/div\s*>/gi

I have simplified it by presuming you won't use the characters '.-:' in
class names. But as PointedEars points out, '.*?' is a problem in old
browsers and you're in trouble if there's a nested div in your string.

Possibly you would be better served by reading the string into the DOM
(using a DOMParser or innerHTML, for e.g.) and extracting information
from it there.
 
T

Tim Nash (aka TMN)

Thank you PointedEars and pr for your input.

Tim
Tim said:
After a fresh start this morning I got this to work taking into
account white spaces around 'class' and '=' etc and also
al/ow ' or " to be used

var reg = new RegExp("<div[^>]class\\s*=\
\s*[\'\"]feedflare[\'\"]*>(.*?)</div>", 'gi');

To match a string starting with any of the following common permutations:

<div class="feedflare">
<div style="color: red;" class="feedflare">
<div class="feedflare" id="div1">
<div class="class1 feedflare class3">

you will instead need something like:

/<div\b[^>]+\bclass\s*=\s*(['"])[\w\s]*\bfeedflare\b[\w\s]*\1[^>]*>(.*?)<\/div\s*>/gi

I have simplified it by presuming you won't use the characters '.-:' in
class names. But as PointedEars points out, '.*?' is a problem in old
browsers and you're in trouble if there's a nested div in your string.

Possibly you would be better served by reading the string into the DOM
(using a DOMParser or innerHTML, for e.g.) and extracting information
from it there.
 
P

pr

Thomas said:
As for the matching quotes, you should use

('foo'|"foo")
Or

(['"])foo\1


However, RegExp literals and non-greedy matching (`.*?') are not universally
supported, with the latter being the more important fact here.

Does this seem a reasonable feature test to you?

var ngq = /.+?/.exec("ab");
var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;

I can only lay hands on one browser old enough to fail. I assume the
presence of literal notation, obviously.
 
T

Thomas 'PointedEars' Lahn

pr said:
Thomas said:
As for the matching quotes, you should use

('foo'|"foo")

Or

(['"])foo\1

Correct. To my surprise, this feature, standardized only with ECMAScript
Ed. 3 (like regular expressions in general), appears to be widely supported:

The bookmarklet

javascript:window.alert(/^(["'])a\1b$/.test("'a'b"));

shows `true' in all my test environments, which currently are:

- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.1)
Gecko/2008070208 Firefox/3.0.1
- Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)
Gecko/20080404 Firefox/2.0.0.14
- Mozilla/4.78 [de] (Windows NT 5.0; U)

- Mozilla/5.0 (Windows; U; Windows NT 5.1; de-DE)
AppleWebKit/525.19 (KHTML, like Gecko) Version/3.1.2 Safari/525.21
- Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10_4_11; de-de)
AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.22

- Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727) (IE 8 beta 1)
- Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.1; {...};
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0;
.NET CLR 1.1.4322; .NET CLR 2.0.50727)
- Mozilla/4.0 (compatible; MSIE 4.01; Windows NT 5.0; {...})

- Opera/9.52 (Windows NT 5.1; U; de)
- Opera/9.51 (Windows NT 5.1; U; de)
- Opera/9.27 (Windows NT 5.1; U; en)
- Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0
- Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1)
Opera 7.02 [en]
However, RegExp literals and non-greedy matching (`.*?') are not universally
supported, with the latter being the more important fact here.

Does this seem a reasonable feature test to you?

var ngq = /.+?/.exec("ab");
var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;

No, it could already throw a (non-catchable) SyntaxError when /.+?/ is
parsed, before execution (you can test that with IE 5.0, for example). And
I have yet to devise a bullet-proof test for possibly unsupported syntax (a
more sophisticated application of eval() comes to mind), one that does not
break the ECMAScript program then.

However,

var ngq = null;

try
{
ngq = new RegExp(".+?");
}
catch (e)
{
}

if (nqg)
{
// ...
}

would work for script engines that support basic exception handling but not
non-greedy quantifiers (such as JScript 5.1 in IE 5.01; tested positive).


PointedEars
 
P

pr

Thomas said:
pr said:
Does this seem a reasonable feature test to you?

var ngq = /.+?/.exec("ab");
var hasNonGreedyQuantifiers = ngq && ngq[0].length == 1;

No, it could already throw a (non-catchable) SyntaxError when /.+?/ is
parsed, before execution (you can test that with IE 5.0, for example).

You're right. IE 5 reports "Unexpected quantifier".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,141
Messages
2,570,812
Members
47,357
Latest member
sitele8746

Latest Threads

Top