Subroutine on Mutation of Codons

C

carriehecker

I am trying to write a subroutine for the mutation of only the third
nucleotide in a series of three. I don't really know perl all that
well, so I was hoping someone would be able to tell me where I am
going wrong, and why I am getting a lot of errors when I edit my code.
Here is my subroutine:

sub strategy2
{
# split on the empty pattern; returns individual letters
my @nucleotides = split ( //, $origDNASeq );
my $nuc = (@nucleotides);
my $nucFile = "hsp70_nuc.fasta.txt";
open ($origDNASeq, '<', $nucFile);

# build new sequence
my $newDNASeq = "";

while ($origDNASeq = <$nucFile>)
{
if ($origDNASeq =~ m/^((.)(.)(.).*)/)
{
if (rand(1) < $mutationRate)
{
# randomly mutate $nuc to a new nucleotide (A, C, G, T)
$newDNASeq = $newDNASeq . $1 . $2 . randNuc($nuc);
}
else
{
# append original nucleotide to new sequence
$newDNASeq = $newDNASeq . $1 . $2 . $3;
}
}
else
{
}
}
#print "$newDNASeq";
# translate new DNreadInDNA( $aaFile )A sequence to protein sequence
my $newProtSeq = translate( $newDNASeq );

# calculate the percent identity
my $identity = calcIdentity( $newProtSeq, $origProtSeq );

return $identity;
}


Could anyone provide some helpful hints on how to get this thing
working correctly? I'd be happy to answer any more questions if I
didn't make anything clear enough. Thanks so much for your time!

Carrie H.
 
J

Jürgen Exner

I am trying to write a subroutine for the mutation of only the third
nucleotide in a series of three. I don't really know perl all that
well, so I was hoping someone would be able to tell me where I am
going wrong, and why I am getting a lot of errors when I edit my code.

[code snipped]

After declaring $origDNASeq, $mutationRate, and $origProtSeq your code
snippet compiles just fine and even runs without any error message. It
doesn't produce any output, but I guess that is to be expected
considering that I didn't have any input data.
Could anyone provide some helpful hints on how to get this thing
working correctly? I'd be happy to answer any more questions if I
didn't make anything clear enough. Thanks so much for your time!

For further investigation you will have to provide more information like
e.g. the exact error messages (Copy&paste!!! Don't paraphrase or
retype!!!) or preferably a minimal self-contained program that we can
run.

jue
 
J

John W. Krahn

I am trying to write a subroutine for the mutation of only the third
nucleotide in a series of three. I don't really know perl all that
well, so I was hoping someone would be able to tell me where I am
going wrong, and why I am getting a lot of errors when I edit my code.
Here is my subroutine:

sub strategy2
{
# split on the empty pattern; returns individual letters
my @nucleotides = split ( //, $origDNASeq );
my $nuc = (@nucleotides);

You don't need to split. Change that to:

my $nuc = length $origDNASeq;
my $nucFile = "hsp70_nuc.fasta.txt";
open ($origDNASeq, '<', $nucFile);

You should *always* verify that the file opened correctly. You are
using the file name for the filehandle in the while loop but here you
are assigning the filehandle to $origDNASeq.
# build new sequence
my $newDNASeq = "";

while ($origDNASeq = <$nucFile>)

open DNASEQ, '<', 'hsp70_nuc.fasta.txt' or die
"hsp70_nuc.fasta.txt: $!";
# build new sequence
my $newDNASeq = '';
while ( $origDNASeq = said:
{
if ($origDNASeq =~ m/^((.)(.)(.).*)/)

You are capturing four strings from $origDNASeq but you only use three
of them so that can be simplified to:

if ($origDNASeq =~ m/^((.)(.).+)/)
{
if (rand(1) < $mutationRate)
{
# randomly mutate $nuc to a new nucleotide (A, C, G, T)
$newDNASeq = $newDNASeq . $1 . $2 . randNuc($nuc);
}
else
{
# append original nucleotide to new sequence
$newDNASeq = $newDNASeq . $1 . $2 . $3;
}
}
else
{
}
}
#print "$newDNASeq";
# translate new DNreadInDNA( $aaFile )A sequence to protein sequence
my $newProtSeq = translate( $newDNASeq );

# calculate the percent identity
my $identity = calcIdentity( $newProtSeq, $origProtSeq );

return $identity;
}


Could anyone provide some helpful hints on how to get this thing
working correctly? I'd be happy to answer any more questions if I
didn't make anything clear enough. Thanks so much for your time!


John
 
X

xhoster

I am trying to write a subroutine for the mutation of only the third
nucleotide in a series of three. I don't really know perl all that
well, so I was hoping someone would be able to tell me where I am
going wrong, and why I am getting a lot of errors when I edit my code.

There is a nearly infinite number of ways one could edit code that would
introduce errors. If you don't show us the errors, and don't show us the
edited code, it makes it rather difficult to help you.

Are you using "use strict" and "use warnings"?

Here is my subroutine:

sub strategy2
{
# split on the empty pattern; returns individual letters
my @nucleotides = split ( //, $origDNASeq );

Where is $origDNASeq coming from?
my $nuc = (@nucleotides);

Since @nucleotides doesn't seem to be used anywhere, you could just capture
the number of characters in $origDNASeq directly without creating an
intermediate variable by doing:

my $nuc = length $origDNASeq;
my $nucFile = "hsp70_nuc.fasta.txt";
open ($origDNASeq, '<', $nucFile);

opening a file to a "handle" which is a variable that has some arbitrary
data in it is not a good idea. And under "use strict", which is a very
good idea, it will fail. Maybe that is one of the errors you refer to?
You should also check to see if open succeeded.

open (my $seq_handle, "<", $nucFile) or die $!;
# build new sequence
my $newDNASeq = "";

while ($origDNASeq = <$nucFile>)

$nucFile contains the *name* of the file. It does not have a file handle
in it, which is what you try to read from.

And what is going on with $origDNASeq? First you assume it starts out set
to something, we know not what. Then you try to make it a filehandle,
now you try to make it the line of text read from a file. Don't re-use
variables willy-nilly.

while (my $line = said:
{
if ($origDNASeq =~ m/^((.)(.)(.).*)/)

This will only operate on the first codon of each line of the file. Maybe
that is what you want, but it seems odd to me.

Also, the entire line will be in $1; and the first three individual bases
will be in $2, $3, and $4. Drop the outermost capturing parenthesis. And
once you do there is not point in having the trailing .*

if ($origDNASeq =~ m/^(.)(.)(.)/)

{
if (rand(1) < $mutationRate)
{
# randomly mutate $nuc to a new nucleotide (A, C, G, T)
$newDNASeq = $newDNASeq . $1 . $2 . randNuc($nuc);

This will copy all of $newDNASeq each time. If $newDNASeq gets quite
large, this will be slow. It would be faster (but not change the end
result) to tell Perl to append directly to the end of it.

$newDNASeq .= $1 . $2 . randNuc($nuc);


Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
S

sln

[snip]

You might wan't to ask your high school Biology teacher. Don't
expect me to be taking any pill produced from a Perl subroutine.


sln
 
L

Leon Timmermans

I am trying to write a subroutine for the mutation of only the third
nucleotide in a series of three. I don't really know perl all that well,
so I was hoping someone would be able to tell me where I am going wrong,
and why I am getting a lot of errors when I edit my code. Here is my
subroutine:
[snip]

Could anyone provide some helpful hints on how to get this thing working
correctly? I'd be happy to answer any more questions if I didn't make
anything clear enough. Thanks so much for your time!

One problem of your line by line approach is that you potentially miss
nucleotides if the number of nucleotides in one line isn't dividable by
3. The safest approach would be to slurp it as a whole. If that is not an
acceptable solution (because of memory constraints) it's going to be a
lot more complicated, but it doesn't seem like you're working with whole
chromosomes at once ;-).

Your code is very very confusing, so I had to make a few hunches, here it
goes:

use File::Slurp;

sub strategy2 {
my $newDNASeq = read_file($nucFile);
$newDNASeq =~ s/ ^ > [^\n]* \n //x; # Remove the fasta header!
$newDNASeq =~ s/ \S //gx; # Remove all whitespace
$newDNASeq =~ s{ (?<=[agtc]{2}) ([agtc]) }
{ (rand(1) < $mutationRate) ? randNuc() : $nuc }gexi;

return calcIdentity( translate($newDNASeq), $origProtSeq );
}


Regards,

Leon Timmermans
 
J

John W. Krahn

Leon said:
I am trying to write a subroutine for the mutation of only the third
nucleotide in a series of three. I don't really know perl all that well,
so I was hoping someone would be able to tell me where I am going wrong,
and why I am getting a lot of errors when I edit my code. Here is my
subroutine:
[snip]
Could anyone provide some helpful hints on how to get this thing working
correctly? I'd be happy to answer any more questions if I didn't make
anything clear enough. Thanks so much for your time!

One problem of your line by line approach is that you potentially miss
nucleotides if the number of nucleotides in one line isn't dividable by
3. The safest approach would be to slurp it as a whole. If that is not an
acceptable solution (because of memory constraints) it's going to be a
lot more complicated, but it doesn't seem like you're working with whole
chromosomes at once ;-).

Your code is very very confusing, so I had to make a few hunches, here it
goes:

use File::Slurp;

sub strategy2 {
my $newDNASeq = read_file($nucFile);
$newDNASeq =~ s/ ^ > [^\n]* \n //x; # Remove the fasta header!
$newDNASeq =~ s/ \S //gx; # Remove all whitespace

Your comment says remove whitespace but your code says remove
non-whitespace.
$newDNASeq =~ s{ (?<=[agtc]{2}) ([agtc]) }
{ (rand(1) < $mutationRate) ? randNuc() : $nuc }gexi;

return calcIdentity( translate($newDNASeq), $origProtSeq );
}


John
 
L

Leon Timmermans

Your comment says remove whitespace but your code says remove
non-whitespace.

Oops! I made a typo, you're right! That should have been \s

Thanks,

Leon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,710
Latest member
bernietqt

Latest Threads

Top