S
sln
Actually, he needs 'so' as well. If this is used in a multi-linedPilcrow said:Suppose I have a long string $a, and a test string $b.
I want to fine all the substrings in $a, whose length is the same as
$b with at most n mismatches.
For example, string 'abcdef' and string 'aacdxf' have two mismatches
at the 2nd character and the 5th character.
I'm wondering if this can be done easily in perl. Can I use regular
expression to solve this problem?
Not too hard the simple way.
-------------------------------------------------------------------
#!/usr/bin/perl
use strict; use warnings;
my $a =
"abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx";
my $b = "aacdxf";
my @a = split //,$a;
my @b = split //,$b;
my $limit = 2;
my $cnt;
my $lenb = length $b;
my @substrings = ();
print "\nmatching '$a' against '$b'\n";
OUTER:
for (my $i = 0; $i <= $#a-$lenb+1; $i++) {
$cnt = 0;
for (my $j = 0; $j <= $#b; $j++) {
$cnt++ unless $a[$i+$j] eq $b[$j];
next OUTER if $cnt > $limit;
}
my $sub = substr($a,$i,$lenb);
# push @substrings, $sub; # alternate output
print "match '$sub' at offset $i\n";
}
Or more simpler as:
#!/usr/bin/perl
use strict;
use warnings;
my $a =
'abcdefbacdefabbdefaaaaacdxfaaacdefcdefbacdefabbdefaaaaacdxfaaacdefaacdxfaacdfx';
my $b = 'aacdxf';
my $limit = 2;
my $lenb = length $b;
print "\nmatching '$a' against '$b'\n";
while ( $a =~ /(?=(.{$lenb}))/sog ) {
next if ( $1 ^ $b ) =~ tr/\0//c > $limit;
print "match '$1' at offset $-[0]\n";
}
__END__
Very elegant and fast, but you don't need all the modifiers /sog to the
regex, just /g will do.
way, the '.' needs the 's' modifier if it is possible a '\n' is encounterred
and could be part of the pattern.
The 'o' suggests to not recompile, which might happen, don't know, anything is
possible. By including both, no harm, ho foul.
sln