Quotemeta & Regex question re-posted as plain text

J

Jürgen Exner

ela said:
I wish pasting the content won't make any character loss but my program and
data is like the following:

You can see data1 is simply the substring of data2 and therefore I pass the
file containing data2 as $file1 and then the longer one as $file1 to my perl
program. $file1 content will be split by the delimiter tab and to check
against whether it contains any pattern that exist in data1.

Is your data1 a regular expression? If not then there is no need to
wield the big RE stick: as simple call of index() will tell you if it is
a substring of some other string.

If you insist on using m//, then you need to escape all RE
meta-characters in your pattern ...

[...]
my $pattern = $aref1->[0];
$pattern = quotemeta $pattern ;
$aref2 = quotemeta $aref2;

.... which apparently you are doing here.
if ( $pattern !~ /$aref2/ ) {

But why are you calling the string "pattern" and the regular expression
pattern "aref". Are you trying to confuse your readers?

And why on earth are you doing a quotemeta on your string? Now your
string of maybe
(Hello-all)
has become
\(Hello\-all\)
and obviously that will not be matched by e.g. the quotemeta'ed
o\-a
which would be searching for a literal o, followed by a dash, followed
by an a.

jue
 
E

ela

I wish pasting the content won't make any character loss but my program and
data is like the following:

You can see data1 is simply the substring of data2 and therefore I pass the
file containing data2 as $file1 and then the longer one as $file1 to my perl
program. $file1 content will be split by the delimiter tab and to check
against whether it contains any pattern that exist in data1.

I appreciate your advice about what's going wrong as I cannot simply replace
all the special characters in the file in advance.


<DATA1>
NZ_AAJX02000024.1|_revcom_54779..55912|beta-lactamase

<DATA2>
NZ_AAJX02000024.1|_revcom_54779..55912|beta-lactamase precursor|identified
by match to protein family HMM PF00144 A


#PROGRAM

#!/usr/bin/perl
use warnings; use strict;

my ( $file1, $file2, $outname) = @ARGV;

my $name = "wholeline";
my $cmpsout = "";

if ($outname ne "") {
$cmpsout = $outname . ".xls";
} else {
$cmpsout = $file1 . "_AND_$name" . "_$file2.xls";
}
open( my $FP1, '<', $file1) or die "could not open '$file1' $!";
open( my $FP2, '<', $file2) or die "could not open '$file2' $!";

open my $CMPS, '>', $cmpsout or die "could not open ' $cmpsout' $!";

my $i=0;
my @row1s;
my $line;
#read file1 into row
while ( $line = <$FP1> ) {
chomp $line;
$row1s[$i]= [ split(/\t/, $line) ];
$i++;
}

my @row2s;
#read file2 into row
my $j=0;
while ( $line = <$FP2> ) {
chomp $line;
$row2s[$j]= $line;
$j++;
}

for my $aref1 (@row1s) {

FILE2: for my $aref2 (@row2s) {

my $match = 1;

my $pattern = $aref1->[0];
$pattern = quotemeta $pattern ;
$aref2 = quotemeta $aref2;

print "pattern: $pattern";<STDIN>;
print "aref2: $aref2"; <STDIN>;
if ( $pattern !~ /$aref2/ ) {
$match = 0;
next;
}

print "MATCH-$match\n";<STDIN>;
if ($match == 1) {
print $CMPS "$aref1->[0]\n$aref1->[1]\n";
last FILE2;
}
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top