Hi,
I am looking for a neat way of trying a match of a series of tokens to
another string. E.g.:
$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc"
Because $qy1 contains the characters in $tg1, I want the match to be
true. Whereas:
$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc"
Now $qy2 has a middle token that is not compatible with $tg, so the
match should be false.
Any suggestions?
One way to look at this problem is through "Algorithm:
iff" glasses:
use strict;
use warnings;
use Algorithm:
iff qw(sdiff);
my $tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
my $qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc";
my $qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc";
print "case-a: first string : '$tg1'\n";
print "case-a: second string : '$qy1'\n";
print "case-a: degree of diff : ", degree_of_difference($tg1, $qy1),
"\n";
print "\n";
print "case-b: first string : '$tg1'\n";
print "case-b: second string : '$qy2'\n";
print "case-b: degree of diff : ", degree_of_difference($tg1, $qy2),
"\n";
print "\n";
sub degree_of_difference {
my ($string_x, $string_y) = @_;
s{\s}''xmsg for $string_x, $string_y;
# the longest string always comes first:
if (length($string_x) < length($string_y)) {
my $temp = $string_x;
$string_x = $string_y;
$string_y = $temp;
}
my @chain_x = split m{}xms, $string_x;
my @chain_y = split m{}xms, $string_y;
my @sd = sdiff(\@chain_x, \@chain_y);
my $inserts = () = grep {$_->[0] eq '+'} @sd;
my $deletes = () = grep {$_->[0] eq '-'} @sd;
my $changes = () = grep {$_->[0] eq 'c'} @sd;
my $unchanged = () = grep {$_->[0] eq 'u'} @sd;
$inserts + $changes;
}
The output is:
case-a: first string :
'abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc'
case-a: second string : 'abdca dadcbacb dbdcadbc
cbcad dbcadbc'
case-a: degree of diff : 0
case-b: first string :
'abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc'
case-b: second string : 'abdca dadcbacb aaaaaaaa
cbcad dbcadbc'
case-b: degree of diff : 5
One could argue that the "degree-of-diff" = 0 in case-a implies that
the match is true.
With the same argument we find that "degree-of-diff" = 5 in case-b
implies that the match is false.
This is only one way to look at the problem, I am sure that there are
many more different ways to look at the problem.