Help: Replace Help

Amy Lee · May 1, 2008

Hello,

I'm going to process some RNA sequences files. And I make a small script
to reverse these sequences. However, I face a problem while it's running
because of order problem.

This is my file contents.

seq1 ACGU
seq2

GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point the
reversed file should be viewed like this.

seq1 CAUG
seq2

UGCAAUG

This is my codes.

if (@ARGV == 1)
{
$file = $ARGV[0];
unless (-e $file)
{
print "***Error: $file dose not exist.\n";
next;
}
unless (open $FILE_IN, '<', $file)
{
print "***Error: Cannot read $file.\n";
next;
}
while (<$FILE_IN>)
{
unless (/^>.*$/)
{
s/A/C/g;
s/C/A/g;
s/G/U/g;
s/U/G/g;
}
print $_;
}
close $FILE_IN;
}

When I finished doing this task, the file is like this.

seq1 AAGG
seq2

GGAAAGG

And I don't wanna use BioPerl to solve this tiny problem, anyway I'm
trying to know how to do that.

So how to solve this kind of order problem? I suppose that the replacement
must process at the same time.

Thank you very much~

Regards,

Amy Lee

A. Sinan Unur · May 1, 2008

Re: Help: Replace Help

You have just wasted your subject line by repeating the word 'Help'.
Clearly, by posting a question here, you are asking for help. Repeating
the word 'help' does not serve any useful purpose.

This is my file contents.

GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point
the reversed file should be viewed like this.

UGCAAUG

This is my codes.

You are missing

use strict;
use warnings;

if (@ARGV == 1)
{
$file = $ARGV[0];
unless (-e $file)
{
print "***Error: $file dose not exist.\n";
next;
}
unless (open $FILE_IN, '<', $file)
{
print "***Error: Cannot read $file.\n";
next;
}

I do not understand what the 'next's are for. You should not send error
messages to STDOUT lest it also contain output you would like to use
further. You should show the reason for the error in your error messages
by including the $! variable. In short, replace all of the above with:

my ($file) = @ARGV;

open my $FILE_IN, '<', $file
or die "Cannot open '$file': $!";

Now, if you are processing files in a loop, replace die with warn.

Let's suppose $_ contains ACGU

s/A/C/g;

Now it is CCGU.

s/C/A/g;

Now it is AAGU.

s/G/U/g;

Now it is AAUU.

s/U/G/g;

Now it is AAGG.

I am assuming this is not what you wanted.

And I don't wanna

s/wanna/want to/

wanna makes you sound childish.

use BioPerl

Well, I do not know a thing about BioPerl so ...

#!/usr/bin/perl

use strict;
use warnings;

my %subst = qw( A C C A G U U G );
my @strings = qw( ACGU GUACCGU );

print "Before:\t@strings\n";

s/([ACGU])/$subst{$1}/g for @strings;

print "After\t@strings\n";

__END__

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Amy Lee · May 1, 2008

Sorry, I did a principle mistake in my post.

I hope replace A to U, U to A, C to G, G to C.

Regards,

Amy

Jürgen Exner · May 1, 2008

Amy Lee said:
GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point the
reversed file should be viewed like this.

UGCAAUG

This is my codes.

[4 individual s///]

So how to solve this kind of order problem? I suppose that the replacement
must process at the same time.

Long-winded option: replace A with some temporary value, e.g. X, then C
to A, then X to C. And then the same for G and U.

Much better option: use tr{}{}

tr {ACGU}{CAUG};

jue

A. Sinan Unur · May 1, 2008

Sorry, I did a principle mistake in my post.

I hope replace A to U, U to A, C to G, G to C.

You can easily adapt both Jurgen's (better for single character lookup
table driven substitutions) or mine to work with whatever you need.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Ben Bullock · May 1, 2008

So how to solve this kind of order problem? I suppose that the
replacement must process at the same time.

For single letters you can use

tr/ACGU/CAUG/;

If the strings to swap are longer than a single character,

s/A/unlikely/g;
s/C/A/g;
s/unlikely/C/g;
s/G/unlikely/g;
s/U/G/g;
s/unlikely/U/g;

where "unlikely" is a string which is unlikely to occur in your data.

Amy Lee · May 1, 2008

Amy Lee said:
Amy Lee said:

GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point the
reversed file should be viewed like this.

UGCAAUG

This is my codes.

Click to expand...

[4 individual s///]

So how to solve this kind of order problem? I suppose that the replacement
must process at the same time.

Click to expand...

Long-winded option: replace A with some temporary value, e.g. X, then C
to A, then X to C. And then the same for G and U.

Much better option: use tr{}{}

tr {ACGU}{CAUG};

jue

Thank you very much. I've solved my problem. And could you tell me what
{} stands for?

Thank you again~

Amy

RedGrittyBrick · May 1, 2008

Amy said:
could you tell me what {} stands for?

{} stands for {}

They are just used to group the characters to be replaced and their
replacements.

The following are all equivalent

tr/ACGU/CAUG/;
tr!ACGU!CAUG!;
tr-ACGU-CAUG-;
tr.ACGU.CAUG.;

tr{ACGU}{CAUG};
tr(ACGU)(CAUG);
tr[ACGU][CAUG];
tr<ACGU>(CAUG);

Perl lets you use almost any character as a delimiter/separator for the
two groups of characters, you can instead use any of a few types of
bracket or brace like characters to group the two sets of characters.

Choose whatever characters make the code clearest to readers. The oldest
form is the first shown above but people can use one of the other forms
for greater clarity if, for example, they need to translate '/' to
something else.

Jürgen Exner · May 1, 2008

Amy Lee said:
And could you tell me what {} stands for?

Hmmmm, what do you mean? It's just curly brackets or braces, see
http://en.wikipedia.org/wiki/Brackets#Uses_of_.E2.80.9C.7B.E2.80.9D_and_.E2.80.9C.7D.E2.80.9D

And maybe 'perldoc perlop', section 'Quotes and quote-like Operators'.

jue

A. Sinan Unur · May 1, 2008

For single letters you can use

tr/ACGU/CAUG/;

If the strings to swap are longer than a single character,

s/A/unlikely/g;
s/C/A/g;
s/unlikely/C/g;
s/G/unlikely/g;
s/U/G/g;
s/unlikely/U/g;

where "unlikely" is a string which is unlikely to occur in your data.

A simple lookup table driven solution would obviate the need to make
assumptions about the unlikeliness of a given character as well as
getting rid of the multiple substitutions.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Jo · May 1, 2008

RedGrittyBrick said:

The following are all equivalent
tr/ACGU/CAUG/;
...
tr[ACGU][CAUG];

I'd like to add that whitespace is allowed also. This can help writing
readable code:

tr [ACGU]
[CAUG];

szr · May 2, 2008

A. Sinan Unur wrote:
[...]

Clearly, by posting a question here, you are asking for help.

I disagree that, in general, by simply posting, one is seeking help. One
could just as well be seek a discussion, or insight on something, but
not necessarily assistance. After all, this /is/ a *discussion* group

Ben Bullock · May 2, 2008

A. Sinan Unur said:
A simple lookup table driven solution would obviate the need to make
assumptions about the unlikeliness of a given character as well as
getting rid of the multiple substitutions.

And a simple tr/// based solution would obviate the need to for you to
write a lookup table solution. But if the strings to swap are longer than
a single character, the lookup table solution is going to be somewhat
complex.

Here is an example of a badly-written lookup table solution:

#!/usr/bin/perl

use strict;
use warnings;

my %subst = qw( A C C A G U U G );
my @strings = qw( ACGU GUACCGU );

print "Before:\t@strings\n";

s/([ACGU])/$subst{$1}/g for @strings;

print "After\t@strings\n";

__END__

The problem here is that the writer has put the same data, the list of
stuff to swap, in three different places. Maybe that kind of clumsy
solution is OK for an example program, but for the real world it's
not. If one uses a lookup table, then the swapping data should only be
in exactly one place:

my %subst = qw/A C G U/; # Do not repeat this data anywhere!!!!!
%subst = (%subst, reverse %subst);
my $substkeys = join ('|',keys %subst); # We want to swap strings so use |
my @strings = qw( ACGU GUACCGU );
s/($substkeys)/$subst{$1}/g for @strings;

If one uses the original solution proposed above, as the list of data
to swap changes, (and since the strings consist of more than one
character, remember), bugs will occur if the programmer is not
extremely careful about updating both parts of the list of stuff to
swap and the left hand side of the substitution.

So I don't recommend a lookup table, unless one knows what one is doing.

A. Sinan Unur · May 2, 2008

(e-mail address removed) (Ben Bullock) wrote in

And a simple tr/// based solution would obviate the need to for you to
write a lookup table solution. But if the strings to swap are longer
than a single character, the lookup table solution is going to be
somewhat complex.
Granted.

Here is an example of a badly-written lookup table solution:

The problem here is that the writer has put the same data, the list of
stuff to swap, in three different places. Maybe that kind of clumsy
solution is OK for an example program,

and that was the spirit in which those lines were written.

but for the real world it's not. If one uses a lookup table, then the
swapping data should only be in exactly one place:

my %subst = qw/A C G U/; # Do not repeat this data anywhere!!!!!
%subst = (%subst, reverse %subst);
my $substkeys = join ('|',keys %subst); # We want to swap strings so use |
my @strings = qw( ACGU GUACCGU );
s/($substkeys)/$subst{$1}/g for @strings;

If one uses the original solution proposed above, as the list of data
to swap changes, (and since the strings consist of more than one
character, remember), bugs will occur if the programmer is not
extremely careful about updating both parts of the list of stuff to
swap and the left hand side of the substitution.

So I don't recommend a lookup table, unless one knows what one is
doing.

Well, if one uses the solution you proposed above and the list of data
to swap changes to

my %subst = qw( A|C C|A G|U U|G );

there will be issues with the way you build the search string.

So:

#!/usr/bin/perl

use strict;
use warnings;

my %replace = qw( A|C C|A G|U U|G A$A Z$Z);
%replace = (%replace, reverse %replace);

my $search = join ('|', map { "(?:\Q$_\E)" } keys %replace);
my @strings = qw( A|C G|U G|UA|CC|AG|U Z$Z A$A );

print "Before:\t@strings\n";

s/($search)/$replace{$1}/g for @strings;

print "After\t@strings\n";

__END__

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Ben Bullock · May 2, 2008

A. Sinan Unur said:
Well, if one uses the solution you proposed above and the list of data
to swap changes to

my %subst = qw( A|C C|A G|U U|G );

there will be issues with the way you build the search string.

my $search = join ('|', map { "(?:\Q$_\E)" } keys %replace);

So you agree that the lookup table driven solution isn't simple?

I think my original method of substituting in an unlikely string,
which you objected to, was fairly appropriate for this particular
question. I often use this kind of method for quick jobs.

A. Sinan Unur · May 2, 2008

(e-mail address removed) (Ben Bullock) wrote in

So you agree that the lookup table driven solution isn't simple?

I think my original method of substituting in an unlikely string,
which you objected to, was fairly appropriate for this particular
question. I often use this kind of method for quick jobs.

Yes. That was the first thing in my response: 'Granted'.

OTOH, the number of repeated substitution operations which the 'unlikely
string' approach requires (especially as the number of
lookups/replacements grows) makes me think that the more complex
approach might end up being simpler to maintain for any 'durable'
program.

Thank you for your corrections.

Sinan

--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Help: Reverse Letters	11	May 2, 2008
Help: Count characters	1	Oct 21, 2007
Help: Content extraction	8	May 10, 2008
Python battle game help	2	Feb 23, 2023
Can't solve problems! please Help	0	Sep 26, 2022
Blue J Ciphertext Program	2	Nov 22, 2023
Need help with this script	4	Mar 12, 2023
Processing in Python help	0	Aug 31, 2022

Help: Replace Help

Amy Lee

A. Sinan Unur

Amy Lee

Jürgen Exner

A. Sinan Unur

Ben Bullock

Amy Lee

RedGrittyBrick

Jürgen Exner

A. Sinan Unur

Jo

szr

Ben Bullock

A. Sinan Unur

Ben Bullock

A. Sinan Unur

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads