[regex] grep for chars in any order

viki · Jun 18, 2008

How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

Thanks
vkm

Ben Bullock · Jun 18, 2008

How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

I don't think a single regular expression can do that, because there is
some logic involved which doesn't fit the regular expression mentality -
you have to work out what the first character matched was, then change
the second character to match depending on that, and so on.

I would use the easy and fast method and then remove the false positives
by checking that the matched string contained all n characters, perhaps
using s or tr n times and dropping out of the loop if one of my
substitutions failed.

The following seems to work, although I haven't tested it extensively:

#!/usr/local/bin/perl
use warnings;
use strict;

sub matches
{
my ($s, $STR) = @_;
my %chars = map {$_ => 1} split ('', $STR);
my @chars = sort keys %chars;
my $anychar = join '', @chars;
my $matchany = join '.*',map "[$anychar]", @chars; # there's a better
way
if ($s =~ /$matchany/) {
my $copy = $s;
for my $c (@chars) {
return unless $copy =~ s/$c//g;
}
return 1;
}
return;
}

print "OK\n" if (matches('naninuneno','aeiou'));
print "OK\n" if (matches('naninunene','aeiou'));

Ben Bullock · Jun 18, 2008

print 'matched' if $STR =~ /a(?=.*b)(?=.*c)|b(?=.*a)(?=.*c)|c(?=.*a)

(?=.*b)/

Great! That is better than the solution I posted. But I have an
improvement:

/(?=.*a)(?=.*b)(?=.*c)/

without any actual matching string also works, reducing the regex length
from O(n^2) to O(n), where n is the number of characters.

So you don't have to create all the combinations but you do need all the
permutations (if I have my terminology correct)

You mean that you need all the combinations of initial characters, but
not all the permutations (which would be O(n!)).

nolo contendere · Jun 18, 2008

How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

use a math module to get permutations:
http://search.cpan.org/~allenday/Math-Combinatorics-0.09/lib/Math/Combinatorics.pm

then from those, build your regexes.

Paul Lalli · Jun 18, 2008

How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

#!/usr/bin/perl
use strict;
use warnings;
use List::MoreUtils qw/all/;

$STR = "whatever";

if (all { $STR =~ /$_/ } qw/a b c/) {
print "Matched all of a, b, c\n";
}

__END__

Paul Lalli

Jürgen Exner · Jun 18, 2008

Maybe I am missing something, but isn't that the same as the text begins
and ends with a character from $str and all the other characters of $str
are included somewhere in the text?
It should be fairly easy to find an algorithm to check for that. You
just need to be careful about how to handle duplicate characters in $STR
and/or the text.

jue

nolo contendere · Jun 18, 2008

Maybe I am missing something, but isn't that the same as the text begins
and ends with a character from $str and all the other characters of $str
are included somewhere in the text?
It should be fairly easy to find an algorithm to check for that. You
just need to be careful about how to handle duplicate characters in $STR
and/or the text.

Those are both great points. Perhaps the OP could further refine the
requirements, or state the larger goal.

nolo contendere · Jun 18, 2008

At said:
At said:

How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Click to expand...

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

Click to expand...

If you're not stuck on creating one regular expression:

sub has_all_chars {
my ($string, $chars) = @_;
my $matched = 1;
foreach my $char (split //, $chars) {
if (index($string, $char) == -1) {
$matched = 0;
last;
}
}
matched;

what is this? ^^^

}
has_all_chars("foobar","rb"); # ==> 1
has_all_chars("foobar","abc"); # ==> 0

This is pretty much what Paul suggested--this amounts to about the
same thing as List::MoreUtils's all() function.

sub all (&@) {
my $f = shift;
return if ! @_;
for (@_) {
return 0 if ! $f->();
}
return 1;
}

smallpond · Jun 18, 2008

At said:
At said:

How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Click to expand...

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

Click to expand...

If you're not stuck on creating one regular expression:

sub has_all_chars {
my ($string, $chars) = @_;
my $matched = 1;
foreach my $char (split //, $chars) {
if (index($string, $char) == -1) {
$matched = 0;
last;
}
}
matched;
}
has_all_chars("foobar","rb"); # ==> 1
has_all_chars("foobar","abc"); # ==> 0

This is a really clever solution. The only thing I would
do differently is to use chop instead of split. Why create
a list unless you need the list?

while (my $char = chop $chars) {

--S

Mario D'Alessio · Jun 18, 2008

viki said:
How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

Thanks
vkm

The way I see the solution, you can have any of the $STR characters,
followed by .*, followed by another of any of the $STR characters:

/[$STR].*[$STR]/

Or am I missing something?

Mario

Mario D'Alessio · Jun 18, 2008

Ignore my post. I realize my mistake. I missed the
part about the regex matching ALL of the characters.

Mario

Mario D'Alessio said:
viki said:

How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

Thanks
vkm

Click to expand...

The way I see the solution, you can have any of the $STR characters,
followed by .*, followed by another of any of the $STR characters:

/[$STR].*[$STR]/

Or am I missing something?

Mario

Bart Lateur · Jun 18, 2008

Glenn said:
Nice. I wonder (without bothering to benchmark it) if anchoring the
expression would be an optimization:

/^(?=.*a)(?=.*b)(?=.*c)/

It would, in case it doesn't match. The latter will only try a match at
the start of the string, the former will try again at every character
position, which is is dead stupid, of course.

Be aware of the possibility of the string containing newlines.

/^(?=.*a)(?=.*b)(?=.*c)/s

jl_post · Jun 19, 2008

How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Dear Viki,

If you don't mind using several regular expressions (one for each
letter), you can easily write:

/a/ and /b/ and /c/

You can even put it in a Perl grep() statement (which I presume is
what you intend to use it for) like this:

my @firstList = ('cab', 'back', 'cat', 'crab', 'dog', 'baby');
my @secondList = grep { /a/ and /b/ and /c/ } @firstList;

In this way, @secondList would contain 'cab', 'back', and 'crab',
but not 'baby' (which would have been a false positive in your
previous example).

Of course, this approach uses one regular expression for each
letter that you're looking for (instead of just one last regular
expression), but depending on how you're writing your code, that may
be acceptable.

I hope this helps, Viki.

-- Jean-Luc

John W. Krahn · Jun 19, 2008

viki said:
How can I build regex that matches all characters of the string $STR
in any order with .* added between any two characters: ?
And without generating all N! transpositions (where N is length of
$STR) ?
Example.
For $STR "abc", I want to match equivalent to:
/(a.*b.*c)|(a.*c.*b)|(b.*a.*c)|(b.*c.*a)|(c.*a.*b)|(c.*b.*a)/

Generating all transpositions is not feasible for larger legths of
$STR.
/[abc].*[abc].*[abc]/ is easy and fast but gives false positives.
What is good solution ?

I haven't tested this but this may do what you want:

( Assuming the data you are searching is in $data )

$data =~ s/[^\Q$STR\E]+//g;
print "matched!\n" if join( '', sort split //, $data ) eq join( '', sort
split //, $STR );

John

Ben Bullock · Jun 20, 2008

( Assuming the data you are searching is in $data )

$data =~ s/[^\Q$STR\E]+//g;
print "matched!\n" if join( '', sort split //, $data ) eq join( '', sort
split //, $STR );

This fails (gives a false negative) if $data = "abcabc" and $STR = "ab",
because the result of the first "join" is "aabb" and the second "join" is
"ab". You need to do some kind of unique sort.

Creating a regex to get multiple values and print	0	Jan 10, 2021
Help for my project in the last minute	0	Apr 23, 2022
Unicode Chars in Windows Path	12	Apr 3, 2014
How to multiply two matrices of size in using inline assembly in C++	3	Mar 3, 2024
Regex for special chars..	12	Apr 18, 2006
Java Regex execution order	4	Oct 15, 2010
Matching Multiple Patters In A Regex In Any Order	3	Sep 26, 2005
Data saving in condition of changing reality	0	Apr 29, 2022

[regex] grep for chars in any order

viki

Ben Bullock

Ben Bullock

nolo contendere

Paul Lalli

Jürgen Exner

nolo contendere

nolo contendere

smallpond

Mario D'Alessio

Mario D'Alessio

Bart Lateur

jl_post

John W. Krahn

Ben Bullock

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads