Actually, that is what I want it to do, something like a sliding
frame. As for the values in STDIN, they are numerical values. This
way, I can specify an arbitrary sequence for it to iterate over.
Although this example seems ridiculous, its actually a simplification
of a first order Markov transitional chain matrix.
I'm actually having an error running this. After I modify my code to
fit this, essentially what you suggested but with a print function at
the end. I get this error: "Use of unintialized value in string C:\Perl
\bin\markovgen2.pl line 107, <STDIN> line 2."
This is my exact code:
print "\nThere are a total of 18990 bases in the entire sequence\n";
print "\nWhat is the starting base number? Keep in mind that Perl
begins the tally";
print "\nwith 0. So if you wanted to start from base 30, input in
29\n";
#Here we can define where we want the sequence to begin and end
print "Input Starting Number:\n";
$beginning_of_sequence = <STDIN>;
print "Input Ending Base Number:\n";
$end_of_sequence = <STDIN>;
$length_of_sequence = $end_of_sequence-$beginning_of_sequence+1;
%dinucleotidepair = (
AT => 0,
AC => 0,
AG => 0,
AA => 0,
TA => 0,
TC => 0,
TG => 0,
TT => 0,
CA => 0,
CT => 0,
CG => 0,
CC => 0,
GA => 0,
GT => 0,
GC => 0,
GG => 0,
);
#$AC = 'AC';
#$ACcountss = 0;
#for my $i (0 .. $length_of_sequence-1) {
# my $dinuc = substr($fastasequence, $i, 2);
# if ($dinuc =~ $AC) {
# $ACcountss++;
# }
#}
for my $i (0 .. $length_of_sequence-1) {
foreach (keys %dinucleotidepair) {
$dinucleotidepair{$_}++ if substr($fastasequence, $i, 2) =~ /$_/;
}}
print "$dinucleotidepair";
Can anyone explain to me the reason the error is popping up? Thanks.
~Frank- Hide quoted text -
- Show quoted text -
Actually, I solved my own problem, but have a fresh problem. Here is
the working script:
print "\nThere are a total of 18990 bases in the entire sequence\n";
print "\nWhat is the starting base number? Keep in mind that Perl
begins the tally";
print "\nwith 0. So if you wanted to start from base 30, input in
29\n";
#Here we can define where we want the sequence to begin and end
print "Input Starting Number:\n";
$beginning_of_sequence = <STDIN>;
print "Input Ending Base Number:\n";
$end_of_sequence = <STDIN>;
$length_of_sequence = $end_of_sequence-$beginning_of_sequence+1;
%dinucleotidepair = (
AT => 0,
AC => 0,
AG => 0,
AA => 0,
TA => 0,
TC => 0,
TG => 0,
TT => 0,
CA => 0,
CT => 0,
CG => 0,
CC => 0,
GA => 0,
GT => 0,
GC => 0,
GG => 0,
);
#$AC = 'AC';
#$ACcountss = 0;
#for my $i (0 .. $length_of_sequence-1) {
# my $dinuc = substr($fastasequence, $i, 2);
# if ($dinuc =~ $AC) {
# $ACcountss++;
# }
#}
for my $i (0 .. $length_of_sequence-1) {
foreach (keys %dinucleotidepair) {
$dinucleotidepair{$_}++ if substr($fastasequence, $i,
2) =~ /$_/;
}
}
while ( my($keys,$values) = each(%dinucleotidepair) ) {
print "$keys $values\n";
}
#print "The Fasta sequence segment has $ACcountss AC's in
$beginning_of_sequence to $end_of_sequence",
#printf "for a relative frequency of %f\n", $ACcountss/
$length_of_sequence;
My new problem is this. I have to calculate the relative frequencies
for everything. So, what that means, is that if one of the keys in my
hash has an occurence, I have to divide that by $length_of_sequence
to
find the relative frequency. Then, that relative frequency will be
used in another Perl script.
My question is this:
How do I manipulate individual elements in a hash
given the value of the key is not zero?
For example. In $fastasequence, the first 30 nucleotide bases contain
4 occurences of the variable "AC" yet no occurences of the base
combination "TG".
Thus, the variable frequency of AC is 0.133.
Secondly, how do I save 0.133 as a variable that can be carried over
and used by another
Perl script for calcuation purposes?
Thanks!
~Frank