Quotemeta & Regex question re-posted as plain text

Jürgen Exner · Jan 26, 2011

ela said:
I wish pasting the content won't make any character loss but my program and
data is like the following:

You can see data1 is simply the substring of data2 and therefore I pass the
file containing data2 as $file1 and then the longer one as $file1 to my perl
program. $file1 content will be split by the delimiter tab and to check
against whether it contains any pattern that exist in data1.

Is your data1 a regular expression? If not then there is no need to
wield the big RE stick: as simple call of index() will tell you if it is
a substring of some other string.

If you insist on using m//, then you need to escape all RE
meta-characters in your pattern ...

[...]

my $pattern = $aref1->[0];
$pattern = quotemeta $pattern ;
$aref2 = quotemeta $aref2;

.... which apparently you are doing here.

if ( $pattern !~ /$aref2/ ) {

But why are you calling the string "pattern" and the regular expression
pattern "aref". Are you trying to confuse your readers?

And why on earth are you doing a quotemeta on your string? Now your
string of maybe
(Hello-all)
has become
$Hello\-all$
and obviously that will not be matched by e.g. the quotemeta'ed
o\-a
which would be searching for a literal o, followed by a dash, followed
by an a.

jue

ela · Jan 26, 2011

I wish pasting the content won't make any character loss but my program and
data is like the following:

You can see data1 is simply the substring of data2 and therefore I pass the
file containing data2 as $file1 and then the longer one as $file1 to my perl
program. $file1 content will be split by the delimiter tab and to check
against whether it contains any pattern that exist in data1.

I appreciate your advice about what's going wrong as I cannot simply replace
all the special characters in the file in advance.

<DATA1>
NZ_AAJX02000024.1|_revcom_54779..55912|beta-lactamase

<DATA2>
NZ_AAJX02000024.1|_revcom_54779..55912|beta-lactamase precursor|identified
by match to protein family HMM PF00144 A

#PROGRAM

#!/usr/bin/perl
use warnings; use strict;

my ( $file1, $file2, $outname) = @ARGV;

my $name = "wholeline";
my $cmpsout = "";

if ($outname ne "") {
$cmpsout = $outname . ".xls";
} else {
$cmpsout = $file1 . "_AND_$name" . "_$file2.xls";
}
open( my $FP1, '<', $file1) or die "could not open '$file1' $!";
open( my $FP2, '<', $file2) or die "could not open '$file2' $!";

open my $CMPS, '>', $cmpsout or die "could not open ' $cmpsout' $!";

my $i=0;
my @row1s;
my $line;
#read file1 into row
while ( $line = <$FP1> ) {
chomp $line;
$row1s[$i]= [ split(/\t/, $line) ];
$i++;
}

my @row2s;
#read file2 into row
my $j=0;
while ( $line = <$FP2> ) {
chomp $line;
$row2s[$j]= $line;
$j++;
}

for my $aref1 (@row1s) {

FILE2: for my $aref2 (@row2s) {

my $match = 1;

my $pattern = $aref1->[0];
$pattern = quotemeta $pattern ;
$aref2 = quotemeta $aref2;

print "pattern: $pattern";<STDIN>;
print "aref2: $aref2"; <STDIN>;
if ( $pattern !~ /$aref2/ ) {
$match = 0;
next;
}

print "MATCH-$match\n";<STDIN>;
if ($match == 1) {
print $CMPS "$aref1->[0]\n$aref1->[1]\n";
last FILE2;
}
}
}

regex question	7	Jun 20, 2013
Image shifts to the right when export the page to pdf	4	May 5, 2023
Python pyPDF4 code to bookmark pdf based upon date text	1	Jan 18, 2023
Bind1st in plain C (cross-platform, thread-safe, re-enterable)	0	Dec 6, 2012
script to make font as a grid in plain text?	3	Nov 8, 2007
FAQ 6.24 How do I match a regular expression that's in a variable?	0	Apr 19, 2011
Tasks	1	Nov 29, 2022
Re-using copyrighted code	2	Jun 8, 2013

Quotemeta & Regex question re-posted as plain text

Jürgen Exner

ela

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads