T
Thomas Isenbarger
I have never posted to this group before, so please forgive me if I am
posting in the wrong place
in the perl script below, I am trying to construct a string, within a
regex, based on an earlier captured match.
i think something about the last elsif block doesn't work. the regex
within the while loop near the bottom (before the subroutine) returns no
matches to sequences that should work.
for molecular biologists out there, I am trying to find the reverse
complement on the fly inside the regex. the @pairing array is something
like @pairing = (AU, CG, GC, UA) and defines the substitutions to be made
in order to build a reverse complement.
for non molecular biologists out there, a sequence that should match is
'acgu' if you need a test case.
input for the program:
use "acgu" for the target sequence.
then, for the sequence elements enter "1s=ac", RETURN, "1r", RETURN, RETURN.
please contact me at isen (AT) mgh (DOT) molbio (DOT) harvard (DOT) edu
if you are willing to help and/or need more information.
thank-you,
Tom Isenbarger
#!/usr/bin/perl -w
use re 'eval'; #allows
execution of code within regex expressions (??{})
@pairing = (AU, GC, CG, UA, NN);
print "enter target sequence ";
$target = uc<STDIN>;
chomp($target);
if ($target =~ /[^ACGTU]/) {
die "invalid characters found in target sequence. exiting isenfind\n\n";
}
#ask to keep IUB codes or convert them?
#ask to leave N or not?
print "enter sequence elements, one line at a time (return to stop)\n";
$input = "";
$i = 0;
do {
$input = uc<STDIN>;
chomp ($input);
$input =~ s/\s+//g;
$element[$i] = $input;
$i++;
} until ($input eq "");
for ($i = 0; $i < @element; $i++) {
$item = $element[$i];
print "$item\n";
if ($item =~ /^{/) { #a base
pairing rule
$pairingstring = $item;
$pairingstring =~ s/[{}]//g;
@pairing = split /,/, $pairingstring;
#check for valid pair rule syntax
print "pairing now @pairing\n";
}
elsif ($item =~ /^(\d+)S=([ACGUBDHKMNRSVWY]+)/) { #a specific
sequence element to remember
$pattern = "(".$2.")";
$regex .= $pattern;
$regexpos[$1] = ($i+1); #record the
position of this sequence in the element list
}
elsif ($item =~ /^(\d+)S=(\d+)-(\d+)/) { #a
non-specific sequence element to remember
$pattern = "([ACGU]{$2,$3})";
$regex .= $pattern;
$regexpos[$1] = ($i+1);
}
elsif ($item =~/^(\d+)S/) {
$lookwhere = $regexpos[$1];
$pattern = "(\\".$lookwhere.")";
$regex .= $pattern;
}
elsif ($item =~ /^[ACGUBDHKMNRSVWY]+/) { #a specific
sequence element
$pattern = "(".$item.")";
$regex .= $pattern;
}
elsif ($item =~ /^(\d+)-(\d+)/) { #a
non-specific sequence element
$pattern = "([ACGU]{$1,$2})";
$regex .= $pattern;
}
elsif ($item =~ /^(\d+)P/) { #a palindrome
of an earlier saved element
$lookwhere = $regexpos[$1];
$pattern = "(??{reverse \$".$lookwhere."})";
$regex .= $pattern;
}
elsif ($item =~ /^(\d+)R/) { #a reverse
complement of an earlier saved element
$lookwhere = $regexpos[$1];
$pattern = "(??{revcomp (\$".$lookwhere.', @pairing)})'; #use '
quotes so that @pairing is not interpolated
$regex .= $pattern;
}
}
use re 'debug';
print "regex $regex\n";
print "target $target\n";
while ($target =~ /$regex/g) {
$position = pos $target;
print "$& $position\n";
}
### subroutines
sub revcomp {
my $sequence = shift (@_);
my @pairing = @_;
my $rc = undef;
foreach $pair (@pairing) {
($first, $second) = split //, $pair;
$match{$first} .= $second;
}
foreach $key (keys(%match)) {
if (length($match{$key}) > 1) {
$match{$key} = "[".$match{$key}."]";
}
}
@string = split (//, reverse ($sequence)); #process string
one char at a time using substitutions in %match
foreach $base (@string) {
$rc .= $match{$base};
}
print "in sub rc = $rc\n\n";
return $rc; #return reverse
complement of sequence
}
posting in the wrong place
in the perl script below, I am trying to construct a string, within a
regex, based on an earlier captured match.
i think something about the last elsif block doesn't work. the regex
within the while loop near the bottom (before the subroutine) returns no
matches to sequences that should work.
for molecular biologists out there, I am trying to find the reverse
complement on the fly inside the regex. the @pairing array is something
like @pairing = (AU, CG, GC, UA) and defines the substitutions to be made
in order to build a reverse complement.
for non molecular biologists out there, a sequence that should match is
'acgu' if you need a test case.
input for the program:
use "acgu" for the target sequence.
then, for the sequence elements enter "1s=ac", RETURN, "1r", RETURN, RETURN.
please contact me at isen (AT) mgh (DOT) molbio (DOT) harvard (DOT) edu
if you are willing to help and/or need more information.
thank-you,
Tom Isenbarger
#!/usr/bin/perl -w
use re 'eval'; #allows
execution of code within regex expressions (??{})
@pairing = (AU, GC, CG, UA, NN);
print "enter target sequence ";
$target = uc<STDIN>;
chomp($target);
if ($target =~ /[^ACGTU]/) {
die "invalid characters found in target sequence. exiting isenfind\n\n";
}
#ask to keep IUB codes or convert them?
#ask to leave N or not?
print "enter sequence elements, one line at a time (return to stop)\n";
$input = "";
$i = 0;
do {
$input = uc<STDIN>;
chomp ($input);
$input =~ s/\s+//g;
$element[$i] = $input;
$i++;
} until ($input eq "");
for ($i = 0; $i < @element; $i++) {
$item = $element[$i];
print "$item\n";
if ($item =~ /^{/) { #a base
pairing rule
$pairingstring = $item;
$pairingstring =~ s/[{}]//g;
@pairing = split /,/, $pairingstring;
#check for valid pair rule syntax
print "pairing now @pairing\n";
}
elsif ($item =~ /^(\d+)S=([ACGUBDHKMNRSVWY]+)/) { #a specific
sequence element to remember
$pattern = "(".$2.")";
$regex .= $pattern;
$regexpos[$1] = ($i+1); #record the
position of this sequence in the element list
}
elsif ($item =~ /^(\d+)S=(\d+)-(\d+)/) { #a
non-specific sequence element to remember
$pattern = "([ACGU]{$2,$3})";
$regex .= $pattern;
$regexpos[$1] = ($i+1);
}
elsif ($item =~/^(\d+)S/) {
$lookwhere = $regexpos[$1];
$pattern = "(\\".$lookwhere.")";
$regex .= $pattern;
}
elsif ($item =~ /^[ACGUBDHKMNRSVWY]+/) { #a specific
sequence element
$pattern = "(".$item.")";
$regex .= $pattern;
}
elsif ($item =~ /^(\d+)-(\d+)/) { #a
non-specific sequence element
$pattern = "([ACGU]{$1,$2})";
$regex .= $pattern;
}
elsif ($item =~ /^(\d+)P/) { #a palindrome
of an earlier saved element
$lookwhere = $regexpos[$1];
$pattern = "(??{reverse \$".$lookwhere."})";
$regex .= $pattern;
}
elsif ($item =~ /^(\d+)R/) { #a reverse
complement of an earlier saved element
$lookwhere = $regexpos[$1];
$pattern = "(??{revcomp (\$".$lookwhere.', @pairing)})'; #use '
quotes so that @pairing is not interpolated
$regex .= $pattern;
}
}
use re 'debug';
print "regex $regex\n";
print "target $target\n";
while ($target =~ /$regex/g) {
$position = pos $target;
print "$& $position\n";
}
### subroutines
sub revcomp {
my $sequence = shift (@_);
my @pairing = @_;
my $rc = undef;
foreach $pair (@pairing) {
($first, $second) = split //, $pair;
$match{$first} .= $second;
}
foreach $key (keys(%match)) {
if (length($match{$key}) > 1) {
$match{$key} = "[".$match{$key}."]";
}
}
@string = split (//, reverse ($sequence)); #process string
one char at a time using substitutions in %match
foreach $base (@string) {
$rc .= $match{$base};
}
print "in sub rc = $rc\n\n";
return $rc; #return reverse
complement of sequence
}