regex match for same number of opening and closing brackets

S

Sascha Bendix

Hi,

I got a little parsing problem right here: I got some strings and want
to ensure, that there are as many opening as closing brackets via a
regex without specifying the exact number.

This regex would be part of a bigger one, so it can't be done in two steps.

Can anybody give me a hint how to do this?

Here are some boundary conditions of the strings I have:

* the number of occurrences differs between 0 an 67
* the strings always start with [a-z]
* there is always [a-z,]+ between two opening brackets

Thanks in advance for every hint.

Regards,

Sascha
 
J

Jürgen Exner

Sascha Bendix said:
I got a little parsing problem right here: I got some strings and want
to ensure, that there are as many opening as closing brackets via a
regex without specifying the exact number.

It cannot be done with pure regular expressions because balanced
brackets require (at least) a context-free language. And a context-free
language cannot be parsed with regular expressions. For more information
please see "Chomsky Hierarchy" in your favourite book about formal
languages.

However Perl's regular expressions are enhanced and there are some means
now.
Can anybody give me a hint how to do this?

See "perldoc -q balanced"

jue
 
S

sln

Hi,

I got a little parsing problem right here: I got some strings and want
to ensure, that there are as many opening as closing brackets via a
regex without specifying the exact number.

This regex would be part of a bigger one, so it can't be done in two steps.

Can anybody give me a hint how to do this?

Here are some boundary conditions of the strings I have:

* the number of occurrences differs between 0 an 67
* the strings always start with [a-z]
* there is always [a-z,]+ between two opening brackets

Thanks in advance for every hint.

Regards,

Sascha

- Try with(out) debug option.
- It can be set to (*FAIL) when first $2 is populated.
- Requires Perl 5.10
- Max 50 recursive levels (shouldn't be a problem).
- Is only based on char delimeters, can be tweeked for
string delimeters.


Good Luck!
-sln
================

use strict;
use warnings;

my $debug = 0;
my $string = " asdf[ ( () ) (((( ] ))) ( ) () a )";
my ($x,$y) =
(
quotemeta '(',
quotemeta ')'
);

my $rxbal = qr {
(?: ($x(?:(?>[^$x$y]+)|(?1))*$y) | ([$x$y]) | . )+
}x;

if ($debug)
{
use re 'eval';
$rxbal = qr {
(?:
( # group 1
$x
(?{ print "(",pos()," "; })
(?:
(?> [^$x$y]+ ) # not $x nor $y and no backtracking
|
(?{ print "-",pos(),"\n"; })
(?1) # Recurse to start of group 1
)*
(?{ print ")",pos()," "; })
$y
)
(?{ print "\n\$1 => $^N",pos()," \n"; })
|
([$x$y]) # $x or $y
(?{ print "\n\$2 => (!!!!!!!!!!bad) $^N",pos()," \n"; })
|
(.) # any char
(?{ print "\n\$3 => $^N",pos()," \n"; })
)+
}x;
}

print "$string\n";

if ($string =~ /$rxbal/)
{
if (defined $2)
{ print "** This is NOT balanced\n" }
else
{ print "** This is balanced\n" }
}
else
{ print "** Nothing to balance\n" }
 
S

sln

Try this modified version. This regex accepts strings
as open and close balanced text. It will still do single
character.
'$x,$y' are the open,close text variables below.

-sln

----------------------------
use strict;
use warnings;

my $debug = 1;

my $string = "ga <a>this<a>{s<a>{ds}</a>g}</a>that-this</a> g <a>dn gd</a> that{fn} asdf </a>ga";

my ($x,$y) =
(
'<a>',
'</a>'
);
my ($xp,$yp) = ($x,$y);
$_ = quotemeta for ($x,$y);


my $rxbal = qr {
(?:
($x (?:(?>(?:(?!$x|$y).)+) | (?1))* $y) # Group 1
| ($x|$y) # Group 2
| .
)+
}xs;


if ($debug)
{
use re 'eval';
$rxbal = qr {
(?:
( # Group 1
$x
(?{ print "\n x => '$xp'",pos(); })
(?:
(?> (?: (?!$x|$y) . )+ ) # no backtracking and not ($x|$y)
|
(?{ print "-",pos(); })
(?1) # Recurse to start of group 1
)*
$y
(?{ print "\n y => '$yp'",pos(); })
)
(?{ print "\n* \$1 => '$^N'",pos(); })
|
($x|$y) # Group 2, $x or $y
(?{ print "\n! \$2 => (____bad____) '$^N'",pos(); })
|
(.) # Group 3, any char
(?{ print "\n \$3 => '$^N'",pos(); })
)+
(?{ print "\n"; })
}xs;
}

print "\nx = '$xp'\ny = '$yp'\n",'-'x20,"\n'$string'\n";

if ($string =~ /$rxbal/)
{
print "\n";
if (defined $2)
{ print "** This is NOT balanced\n" }
elsif (not defined $1)
{ print "** Nothing to balance\n" }
else
{ print "** This is balanced\n" }
}
else
{ print "** The string is empty\n" }

__END__

Output:

x = '<a>'
y = '</a>'
--------------------
'ga <a>this<a>{s<a>{ds}</a>g}</a>that-this</a> g <a>dn gd</a> that{fn} asdf </a>
ga'

$3 => 'g'1
$3 => 'a'2
$3 => ' '3
x => '<a>'6-10
x => '<a>'13-15
x => '<a>'18-22
y => '</a>'26-28
y => '</a>'32-41
y => '</a>'45
* $1 => '<a>this<a>{s<a>{ds}</a>g}</a>that-this</a>'45
$3 => ' '46
$3 => 'g'47
$3 => ' '48
x => '<a>'51-56
y => '</a>'60
* $1 => '<a>dn gd</a>'60
$3 => ' '61
$3 => 't'62
$3 => 'h'63
$3 => 'a'64
$3 => 't'65
$3 => '{'66
$3 => 'f'67
$3 => 'n'68
$3 => '}'69
$3 => ' '70
$3 => 'a'71
$3 => 's'72
$3 => 'd'73
$3 => 'f'74
$3 => ' '75
! $2 => (____bad____) '</a>'79
$3 => 'g'80
$3 => 'a'81

** This is NOT balanced
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top