trying to match series of tokens to string

A

avilella

Hi,

I am looking for a neat way of trying a match of a series of tokens to
another string. E.g.:

$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc"

Because $qy1 contains the characters in $tg1, I want the match to be
true. Whereas:


$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc"

Now $qy2 has a middle token that is not compatible with $tg, so the
match should be false.

Any suggestions?

Cheers,

Albert.
 
J

J. Gleixner

avilella said:
Hi,

I am looking for a neat way of trying a match of a series of tokens to
another string. E.g.:

$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc"

Because $qy1 contains the characters in $tg1, I want the match to be
true. Whereas:


$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc"

Now $qy2 has a middle token that is not compatible with $tg, so the
match should be false.

Any suggestions?

Use a regular expression, instead of spaces, in $qy1. You could use ".*"
or '.'.

perldoc perlre
perldoc perlop
....
m/PATTERN/msixogc
/PATTERN/msixogc
Searches a string for a pattern match, and in scalar context
....
 
D

Dilbert

Hi,

I am looking for a neat way of trying a match of a series of tokens to
another string. E.g.:

$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy1 = "abdca     dadcbacb       dbdcadbc      cbcad      dbcadbc"

Because $qy1 contains the characters in $tg1, I want the match to be
true. Whereas:

$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy2 = "abdca     dadcbacb       aaaaaaaa      cbcad      dbcadbc"

Now $qy2 has a middle token that is not compatible with $tg, so the
match should be false.

Any suggestions?

One way to look at this problem is through "Algorithm::Diff" glasses:

use strict;
use warnings;
use Algorithm::Diff qw(sdiff);

my $tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
my $qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc";
my $qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc";

print "case-a: first string : '$tg1'\n";
print "case-a: second string : '$qy1'\n";
print "case-a: degree of diff : ", degree_of_difference($tg1, $qy1),
"\n";
print "\n";

print "case-b: first string : '$tg1'\n";
print "case-b: second string : '$qy2'\n";
print "case-b: degree of diff : ", degree_of_difference($tg1, $qy2),
"\n";
print "\n";

sub degree_of_difference {
my ($string_x, $string_y) = @_;

s{\s}''xmsg for $string_x, $string_y;

# the longest string always comes first:
if (length($string_x) < length($string_y)) {
my $temp = $string_x;
$string_x = $string_y;
$string_y = $temp;
}

my @chain_x = split m{}xms, $string_x;
my @chain_y = split m{}xms, $string_y;

my @sd = sdiff(\@chain_x, \@chain_y);

my $inserts = () = grep {$_->[0] eq '+'} @sd;
my $deletes = () = grep {$_->[0] eq '-'} @sd;
my $changes = () = grep {$_->[0] eq 'c'} @sd;
my $unchanged = () = grep {$_->[0] eq 'u'} @sd;

$inserts + $changes;
}

The output is:

case-a: first string :
'abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc'
case-a: second string : 'abdca dadcbacb dbdcadbc
cbcad dbcadbc'
case-a: degree of diff : 0

case-b: first string :
'abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc'
case-b: second string : 'abdca dadcbacb aaaaaaaa
cbcad dbcadbc'
case-b: degree of diff : 5

One could argue that the "degree-of-diff" = 0 in case-a implies that
the match is true.

With the same argument we find that "degree-of-diff" = 5 in case-b
implies that the match is false.

This is only one way to look at the problem, I am sure that there are
many more different ways to look at the problem.
 
S

sln

Hi,

I am looking for a neat way of trying a match of a series of tokens to
another string. E.g.:

$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc"

Because $qy1 contains the characters in $tg1, I want the match to be
true. Whereas:


$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc"

Now $qy2 has a middle token that is not compatible with $tg, so the
match should be false.

Any suggestions?
You could use index if the tokens are constant.

use strict;
use warnings;

my $String = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
my @Toks = qw(abdca dadcbacb dbdcadbc aaaaaaaa cbcad dbcadbc);

print "\n$String'\n\n";
for my $tok (@Toks) {
my $pos = index $String, $tok;
if ($pos >= 0) {
printf "found (%2d): %s\n", $pos, $tok;
}
else {
printf "not found : %s\n", $tok;
}
}

-sln
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top