Help: Replace Help

A

Amy Lee

Hello,

I'm going to process some RNA sequences files. And I make a small script
to reverse these sequences. However, I face a problem while it's running
because of order problem.

This is my file contents.
seq1 ACGU
seq2
GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point the
reversed file should be viewed like this.
seq1 CAUG
seq2
UGCAAUG

This is my codes.

if (@ARGV == 1)
{
$file = $ARGV[0];
unless (-e $file)
{
print "***Error: $file dose not exist.\n";
next;
}
unless (open $FILE_IN, '<', $file)
{
print "***Error: Cannot read $file.\n";
next;
}
while (<$FILE_IN>)
{
unless (/^>.*$/)
{
s/A/C/g;
s/C/A/g;
s/G/U/g;
s/U/G/g;
}
print $_;
}
close $FILE_IN;
}

When I finished doing this task, the file is like this.
seq1 AAGG
seq2
GGAAAGG

And I don't wanna use BioPerl to solve this tiny problem, anyway I'm
trying to know how to do that.

So how to solve this kind of order problem? I suppose that the replacement
must process at the same time.

Thank you very much~

Regards,

Amy Lee
 
A

A. Sinan Unur

Re: Help: Replace Help

You have just wasted your subject line by repeating the word 'Help'.
Clearly, by posting a question here, you are asking for help. Repeating
the word 'help' does not serve any useful purpose.
This is my file contents.

GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point
the reversed file should be viewed like this.

UGCAAUG

This is my codes.

You are missing

use strict;
use warnings;
if (@ARGV == 1)
{
$file = $ARGV[0];
unless (-e $file)
{
print "***Error: $file dose not exist.\n";
next;
}
unless (open $FILE_IN, '<', $file)
{
print "***Error: Cannot read $file.\n";
next;
}

I do not understand what the 'next's are for. You should not send error
messages to STDOUT lest it also contain output you would like to use
further. You should show the reason for the error in your error messages
by including the $! variable. In short, replace all of the above with:

my ($file) = @ARGV;

open my $FILE_IN, '<', $file
or die "Cannot open '$file': $!";

Now, if you are processing files in a loop, replace die with warn.

Let's suppose $_ contains ACGU

Now it is CCGU.

Now it is AAGU.

Now it is AAUU.

Now it is AAGG.

I am assuming this is not what you wanted.
And I don't wanna

s/wanna/want to/

wanna makes you sound childish.
use BioPerl

Well, I do not know a thing about BioPerl so ...

#!/usr/bin/perl

use strict;
use warnings;

my %subst = qw( A C C A G U U G );
my @strings = qw( ACGU GUACCGU );

print "Before:\t@strings\n";

s/([ACGU])/$subst{$1}/g for @strings;

print "After\t@strings\n";

__END__

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
A

Amy Lee

Sorry, I did a principle mistake in my post.

I hope replace A to U, U to A, C to G, G to C.

Regards,

Amy
 
J

Jürgen Exner

Amy Lee said:
GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point the
reversed file should be viewed like this.

UGCAAUG

This is my codes.
[4 individual s///]
So how to solve this kind of order problem? I suppose that the replacement
must process at the same time.

Long-winded option: replace A with some temporary value, e.g. X, then C
to A, then X to C. And then the same for G and U.

Much better option: use tr{}{}

tr {ACGU}{CAUG};

jue
 
A

A. Sinan Unur

Sorry, I did a principle mistake in my post.

I hope replace A to U, U to A, C to G, G to C.

You can easily adapt both Jurgen's (better for single character lookup
table driven substitutions) or mine to work with whatever you need.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
B

Ben Bullock

So how to solve this kind of order problem? I suppose that the
replacement must process at the same time.

For single letters you can use

tr/ACGU/CAUG/;

If the strings to swap are longer than a single character,

s/A/unlikely/g;
s/C/A/g;
s/unlikely/C/g;
s/G/unlikely/g;
s/U/G/g;
s/unlikely/U/g;

where "unlikely" is a string which is unlikely to occur in your data.
 
A

Amy Lee

Amy Lee said:
GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point the
reversed file should be viewed like this.

UGCAAUG

This is my codes.
[4 individual s///]
So how to solve this kind of order problem? I suppose that the replacement
must process at the same time.

Long-winded option: replace A with some temporary value, e.g. X, then C
to A, then X to C. And then the same for G and U.

Much better option: use tr{}{}

tr {ACGU}{CAUG};

jue
Thank you very much. I've solved my problem. And could you tell me what
{} stands for?

Thank you again~

Amy
 
R

RedGrittyBrick

Amy said:
could you tell me what {} stands for?

{} stands for {}

They are just used to group the characters to be replaced and their
replacements.

The following are all equivalent

tr/ACGU/CAUG/;
tr!ACGU!CAUG!;
tr-ACGU-CAUG-;
tr.ACGU.CAUG.;

tr{ACGU}{CAUG};
tr(ACGU)(CAUG);
tr[ACGU][CAUG];
tr<ACGU>(CAUG);

Perl lets you use almost any character as a delimiter/separator for the
two groups of characters, you can instead use any of a few types of
bracket or brace like characters to group the two sets of characters.

Choose whatever characters make the code clearest to readers. The oldest
form is the first shown above but people can use one of the other forms
for greater clarity if, for example, they need to translate '/' to
something else.
 
A

A. Sinan Unur

For single letters you can use

tr/ACGU/CAUG/;

If the strings to swap are longer than a single character,

s/A/unlikely/g;
s/C/A/g;
s/unlikely/C/g;
s/G/unlikely/g;
s/U/G/g;
s/unlikely/U/g;

where "unlikely" is a string which is unlikely to occur in your data.

A simple lookup table driven solution would obviate the need to make
assumptions about the unlikeliness of a given character as well as
getting rid of the multiple substitutions.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
J

Jo

RedGrittyBrick said:
The following are all equivalent
tr/ACGU/CAUG/;
...
tr[ACGU][CAUG];

I'd like to add that whitespace is allowed also. This can help writing
readable code:

tr [ACGU]
[CAUG];
 
S

szr

A. Sinan Unur wrote:
[...]
Clearly, by posting a question here, you are asking for help.

I disagree that, in general, by simply posting, one is seeking help. One
could just as well be seek a discussion, or insight on something, but
not necessarily assistance. After all, this /is/ a *discussion* group
:)
 
B

Ben Bullock

A. Sinan Unur said:
A simple lookup table driven solution would obviate the need to make
assumptions about the unlikeliness of a given character as well as
getting rid of the multiple substitutions.

And a simple tr/// based solution would obviate the need to for you to
write a lookup table solution. But if the strings to swap are longer than
a single character, the lookup table solution is going to be somewhat
complex.

Here is an example of a badly-written lookup table solution:

#!/usr/bin/perl

use strict;
use warnings;

my %subst = qw( A C C A G U U G );
my @strings = qw( ACGU GUACCGU );

print "Before:\t@strings\n";

s/([ACGU])/$subst{$1}/g for @strings;

print "After\t@strings\n";

__END__

The problem here is that the writer has put the same data, the list of
stuff to swap, in three different places. Maybe that kind of clumsy
solution is OK for an example program, but for the real world it's
not. If one uses a lookup table, then the swapping data should only be
in exactly one place:

my %subst = qw/A C G U/; # Do not repeat this data anywhere!!!!!
%subst = (%subst, reverse %subst);
my $substkeys = join ('|',keys %subst); # We want to swap strings so use |
my @strings = qw( ACGU GUACCGU );
s/($substkeys)/$subst{$1}/g for @strings;

If one uses the original solution proposed above, as the list of data
to swap changes, (and since the strings consist of more than one
character, remember), bugs will occur if the programmer is not
extremely careful about updating both parts of the list of stuff to
swap and the left hand side of the substitution.

So I don't recommend a lookup table, unless one knows what one is doing.
 
A

A. Sinan Unur

(e-mail address removed) (Ben Bullock) wrote in
And a simple tr/// based solution would obviate the need to for you to
write a lookup table solution. But if the strings to swap are longer
than a single character, the lookup table solution is going to be
somewhat complex.
Granted.

Here is an example of a badly-written lookup table solution:

The problem here is that the writer has put the same data, the list of
stuff to swap, in three different places. Maybe that kind of clumsy
solution is OK for an example program,

and that was the spirit in which those lines were written.
but for the real world it's not. If one uses a lookup table, then the
swapping data should only be in exactly one place:

my %subst = qw/A C G U/; # Do not repeat this data anywhere!!!!!
%subst = (%subst, reverse %subst);
my $substkeys = join ('|',keys %subst); # We want to swap strings so use |
my @strings = qw( ACGU GUACCGU );
s/($substkeys)/$subst{$1}/g for @strings;

If one uses the original solution proposed above, as the list of data
to swap changes, (and since the strings consist of more than one
character, remember), bugs will occur if the programmer is not
extremely careful about updating both parts of the list of stuff to
swap and the left hand side of the substitution.

So I don't recommend a lookup table, unless one knows what one is
doing.

Well, if one uses the solution you proposed above and the list of data
to swap changes to

my %subst = qw( A|C C|A G|U U|G );

there will be issues with the way you build the search string.

So:

#!/usr/bin/perl

use strict;
use warnings;

my %replace = qw( A|C C|A G|U U|G A$A Z$Z);
%replace = (%replace, reverse %replace);

my $search = join ('|', map { "(?:\Q$_\E)" } keys %replace);
my @strings = qw( A|C G|U G|UA|CC|AG|U Z$Z A$A );

print "Before:\t@strings\n";

s/($search)/$replace{$1}/g for @strings;

print "After\t@strings\n";

__END__

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 
B

Ben Bullock

A. Sinan Unur said:
Well, if one uses the solution you proposed above and the list of data
to swap changes to

my %subst = qw( A|C C|A G|U U|G );

there will be issues with the way you build the search string.
my $search = join ('|', map { "(?:\Q$_\E)" } keys %replace);

So you agree that the lookup table driven solution isn't simple?

I think my original method of substituting in an unlikely string,
which you objected to, was fairly appropriate for this particular
question. I often use this kind of method for quick jobs.
 
A

A. Sinan Unur

(e-mail address removed) (Ben Bullock) wrote in
So you agree that the lookup table driven solution isn't simple?

I think my original method of substituting in an unlikely string,
which you objected to, was fairly appropriate for this particular
question. I often use this kind of method for quick jobs.

Yes. That was the first thing in my response: 'Granted'.

OTOH, the number of repeated substitution operations which the 'unlikely
string' approach requires (especially as the number of
lookups/replacements grows) makes me think that the more complex
approach might end up being simpler to maintain for any 'durable'
program.

Thank you for your corrections.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top