/(foo|)/ vs /(foo)?/

kj · Nov 17, 2005

While looking at someone else's code I came across a regular
expression that included a construct like /(foo|)/. As far as I
can tell, it should produce the same result as /(foo)?/. But the
author of the code knows a heck of a lot more Perl than I do, so
I'm wondering why she would have picked the former over the latter.
Any ideas?

Thanks!

kj

P.S. I'm aware of the fact that /(|foo)/ would produce very different
results from /(foo|)/ or /(foo)?/, but that's neither here nor
there.

Ingo Menger · Nov 17, 2005

kj said:
While looking at someone else's code I came across a regular
expression that included a construct like /(foo|)/. As far as I
can tell, it should produce the same result as /(foo)?/. But the
author of the code knows a heck of a lot more Perl than I do, so
I'm wondering why she would have picked the former over the latter.
Any ideas?

Hmmm....

A year ago I had to throw away my laptop, since the keyboard had given
up completely. But before it stopped working alltogether, there were
some weeks, where only certain keys did not work anymore. Perhaps the
author had a similar problem with his keyboard. After all, on some
keybords, the '?' key is in the upper right corner and the '|' key is
in the lower left, so it may well be that one of them works while the
other does not.

Anno Siegel · Nov 17, 2005

kj said:
While looking at someone else's code I came across a regular
expression that included a construct like /(foo|)/. As far as I
can tell, it should produce the same result as /(foo)?/. But the
author of the code knows a heck of a lot more Perl than I do, so
I'm wondering why she would have picked the former over the latter.
Any ideas?

The difference is in what is captured when no "foo" is found in the
string. /(foo|)/ matches an empty string, so $1 is an empty string
after the match. /(foo)?/ skips the match entirely, so $1 is undefined.
A subtle but relevant difference.

Anno

Dr.Ruud · Nov 17, 2005

kj:

While looking at someone else's code I came across a regular
expression that included a construct like /(foo|)/. As far as I
can tell, it should produce the same result as /(foo)?/.

(foo|) is short for ((?:foo)?)

Eric J. Roode · Nov 18, 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

$_ = "bar";
print /(foo|) (?(1)|(?!))/x ? "match\n" : "no match\n";
print /(foo)? (?(1)|(?!))/x ? "match\n" : "no match\n";
__END__
match
no match

Abigail, the master weaver of perl regexes, has once again confounded
me. It ain't the first time.

I had never seen (?(...)). I
looked in "perldoc perlre", and it wasn't much help:

"(?(condition)yes-pattern|no-pattern)"
"(?(condition)yes-pattern)"

Conditional expression. "(condition)" should be either an
integer in parentheses (which is valid if the corresponding
pair of parentheses matched), or look-ahead/look-behind/eval-
uate zero-width assertion.

For example:

m{ ( $ )?
[^()]+
(?(1) $ )
}x

matches a chunk of non-parentheses, possibly included in
parentheses themselves.

This.... is vague at best. What is "no-pattern"? What means
"valid"? ("matches", I assume, but perhaps one should not use it if
there's a chance that the numbered parentheses don't match?) Must
the look-ahead/look-behind/evaluate match at that point? If so, how
is it any different than having the assertion at that point *not*
within (?(...))?

If anyone can explain, or point me to a better explanation than
perlre, I would be grateful.

- --
Eric
`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32) - WinPT 0.7.96rc1

iD8DBQFDfSy1Y96i4h5M0egRAs0RAKCrawVLAl5CbY6NN8g1e2dLb6E0TQCfYhJN
yZukxNUWWVMPxSZRy3PlmJ4=
=NKDG
-----END PGP SIGNATURE-----

Brad Baxter · Nov 18, 2005

Ingo said:
Hmmm....

A year ago I had to throw away my laptop, since the keyboard had given
up completely. But before it stopped working alltogether, there were
some weeks, where only certain keys did not work anymore. Perhaps the
author had a similar problem with his keyboard. After all, on some
keybords, the '?' key is in the upper right corner and the '|' key is
in the lower left, so it may well be that one of them works while the
other does not.

LOL

Dr.Ruud · Nov 18, 2005

Eric J. Roode:

"(?(condition)yes-pattern|no-pattern)"
"(?(condition)yes-pattern)"

Conditional expression. "(condition)" should be either an
integer in parentheses (which is valid if the corresponding
pair of parentheses matched), or look-ahead/look-behind/eval-
uate zero-width assertion.

For example:

m{ ( $ )?
[^()]+
(?(1) $ )
}x

matches a chunk of non-parentheses, possibly included in
parentheses themselves.

This.... is vague at best. What is "no-pattern"?

The pattern that will be used if the test returned false.

What means
"valid"? ("matches", I assume, but perhaps one should not use it if
there's a chance that the numbered parentheses don't match?)

The (1) is only valid if there exists a corresponding (capturing?)
group.

m{ ( $ )? # optional opening paren
[^()]+ # 1 or more non-parens
(?(1) $ ) # if the 1st group matched, so there was
# an opening paren, then require a
# closing one
}x

I guess that if the "corresponding pair of parens" didn't match+capture,
$1 keeps its old value, because the ? comes after the 1st group.

Wade Whitaker · Nov 18, 2005

kj said:
While looking at someone else's code I came across a regular
expression that included a construct like /(foo|)/. As far as I
can tell, it should produce the same result as /(foo)?/. But the
author of the code knows a heck of a lot more Perl than I do, so
I'm wondering why she would have picked the former over the latter.
Any ideas?

Thanks!

kj

P.S. I'm aware of the fact that /(|foo)/ would produce very different
results from /(foo|)/ or /(foo)?/, but that's neither here nor
there.

I ran into a place where this caused me a problem with a regular expression so
I came to understand.

(foo)? says give me 1 or 0 occurances of foo.
(|foo) says give me 0 or 1 occurences of foo.
(foo|) says give me 1 or 0 occurences of foo which means that (foo)? is
redundant syntax in perl that should always be able to be replaced. i.e.
search for )? and replace it with a | before at the beginning or the end.
*? and +? are ok because they are saying don't be greedy.
Anyone want to argue against that? Show a case where it is not true?

I found this out while trying to write a regular expression to find the
matching quote while parsing files. If I found a '"' I wanted to find the
matching '"' without matching on '\"'. It took me a year to finally learn
enough to do this.

The answer is: m/(["'])(|.*?[^\\])(\1|^Z)/gs works where
m/(["'])(.*?[^\\])?(\1|^Z)/gs did not.

Previous attempts that did not work are:
# $$fptr =~ m/\G((?:.*?[^\\])?)($q|^Z)/gs; # find ",'
# $$fptr =~ m/\G(.*?(?!\\))($q|^Z)/gs; # find ",'
# $$fptr =~ m/\G((?:.*?(?!\\))?)($q|^Z)/gs; # find ",'
# $$fptr =~ m/\G((?:.*?(?!\\$q))?($q|^Z))/gs; # find ",'

Wade

Ingo Menger · Nov 18, 2005

Wade said:
(foo)? says give me 1 or 0 occurances of foo.
(|foo) says give me 0 or 1 occurences of foo.
(foo|) says give me 1 or 0 occurences of foo which means that (foo)? is
redundant syntax in perl that should always be able to be replaced. i.e.
search for )? and replace it with a | before at the beginning or the end.
*? and +? are ok because they are saying don't be greedy.
Anyone want to argue against that? Show a case where it is not true?

Did you read the thread?

"bar" =~ m/(foo|)/; print $1;
"bar" =~ m/(foo)?/; print $1;

It's the difference between undefined and ""

Paul Lalli · Nov 18, 2005

Wade said:
I ran into a place where this caused me a problem with a regular expression so
I came to understand.

(foo)? says give me 1 or 0 occurances of foo.
(|foo) says give me 0 or 1 occurences of foo.
(foo|) says give me 1 or 0 occurences of foo which means that (foo)? is
redundant syntax in perl that should always be able to be replaced. i.e.
search for )? and replace it with a | before at the beginning or the end.
*? and +? are ok because they are saying don't be greedy.
Anyone want to argue against that? Show a case where it is not true?

Er. See previous responses in this thread...

I found this out while trying to write a regular expression to find the
matching quote while parsing files. If I found a '"' I wanted to find the
matching '"' without matching on '\"'. It took me a year to finally learn
enough to do this.

That's unfortunate, because you seem to have spent a year reinventing a
wheel.

http://search.cpan.org/dist/Regexp-Common/lib/Regexp/Common/delimited.pm

The answer is: m/(["'])(|.*?[^\\])(\1|^Z)/gs works where
m/(["'])(.*?[^\\])?(\1|^Z)/gs did not.

The answer is m/$RE{delimited}{-delim=>q{'"}}/

Paul Lalli

Wade Whitaker · Nov 18, 2005

Ingo said:
Wade Whitaker schrieb:

Did you read the thread?

"bar" =~ m/(foo|)/; print $1;
"bar" =~ m/(foo)?/; print $1;

It's the difference between undefined and ""

Agreed. It does do that, But is the difference you state essential to your
programming needs or a side effect?

Unlike your example, most of these conditional matchs are used in the context
of other "things" that need to be matched as well so the whole regular
expression is true or not; And, the conditional match is there to expand the
ability of the whole to match.

There is a difference between 0 or 1 occurances of foo and 1 or 0 occurances
of foo. 0 or 1 occurances means try 0 first and then try ever other
combination afterward before coming back and trying 1 occurance of foo.
(foo)? always matches foo first.

Could you write a regular expression where you want 0 or 1 occurances of foo
and returns undef if there are 0 occurances? In terms of the whole? This is
the side effect you are defending.

My position is is the syntax needed, preferable, better than the other? Not
does it have different side effects.

This is a style question and I think (|foo) and (foo|) are greatly better than
(foo)?.

Regards,
Wade

Ingo Menger · Nov 25, 2005

Wade said:
Ingo Menger wrote:
Agreed. It does do that, But is the difference you state essential to your
programming needs or a side effect?

Both.
The difference between undefined and "" is fundamental.
And, since perl programs usually are imperative and rely heavily on
side effects (this is as it should be in imperative languages), yes,
side effects such as assigning a string or undef to $1 may be very
essential.

My position is is the syntax needed, preferable, better than the other? Not
does it have different side effects.

Very silly question, IMHO.
Consider the following question: is the syntax
if (expr) { expr }
better than
while (expr) { expr }
or not?

This is a style question and I think (|foo) and (foo|) are greatly better than
(foo)?.

No, as you pointed out before, it is a question of achieving a desired
side effect, namely a side effect on variable $1.

robic0 · Nov 26, 2005

On 25 Nov 2005 06:22:56 -0800, "Ingo Menger"

[snip]

Very silly question, IMHO.
Consider the following question: is the syntax
if (expr) { expr }
better than
while (expr) { expr }
or not?

$expr = 7;
if ($expr) { $expr }
while ($expr) { $expr }

I would say not! Maybe you need an if-while here...

robic0 · Nov 26, 2005

Eric J. Roode:

"(?(condition)yes-pattern|no-pattern)"
"(?(condition)yes-pattern)"

Conditional expression. "(condition)" should be either an
integer in parentheses (which is valid if the corresponding
pair of parentheses matched), or look-ahead/look-behind/eval-
uate zero-width assertion.

For example:

m{ ( $ )?
[^()]+
(?(1) $ )
}x

matches a chunk of non-parentheses, possibly included in
parentheses themselves.

This.... is vague at best. What is "no-pattern"?

Click to expand...

The pattern that will be used if the test returned false.

What means
"valid"? ("matches", I assume, but perhaps one should not use it if
there's a chance that the numbered parentheses don't match?)

Click to expand...

The (1) is only valid if there exists a corresponding (capturing?)
group.

m{ ( $ )? # optional opening paren
[^()]+ # 1 or more non-parens
(?(1) $ ) # if the 1st group matched, so there was
# an opening paren, then require a
# closing one
}x

This is just amazing, singular regex concepts agreggated in
a nested theorehtical proof. Random number generator plus
character code/stream will break this type of nesting proofs
every time. You can't depend on the regex module to unwind
its stack like this. The reason you think this is valid
is because your sole input matches an expected output.
Try the random generator for a few months, capture the
(failures if possible) sucesses. You may be in for a
shock!

Resolving Quantifiers of Sub-Expression Quantifiers: ie: /( ( ( foo\w* )+? \| ( fle\w+)* ){0,5} ) )?	0	Feb 18, 2009
What should I do Before I give up programming?	6	Jan 14, 2023
Variable declaration - C vs script style	2	Jul 31, 2007
ANN main-4.4.0	0	Nov 25, 2010
Java singletonMap in Python	5	Sep 24, 2012
use of assert in Java [vs. exceptions]	22	May 30, 2009
jQuery Attribute Summit--Latest Coverage	16	Dec 20, 2009
CreativeCommons RDF Permission vs. Prohbition?	7	Jun 22, 2009

/(foo|)/ vs /(foo)?/

kj

Ingo Menger

Anno Siegel

Dr.Ruud

Eric J. Roode

Brad Baxter

Dr.Ruud

Wade Whitaker

Ingo Menger

Paul Lalli

Wade Whitaker

Ingo Menger

robic0

robic0

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads