Help: Content extraction

A

Amy Lee

Hello,

I have a problem while I'm processing my sequence file. The file content
is like this.
seq1 ACGGTC
ACTG
CGATCC
ACCTC
seq3
.......

And I hope make every sequence into a single file. For example, a file
"seq1" content is
ACGGTC
ACTG
And a file "seq2" content is
CGATCC
ACCTC
and so on.

However, I'm only a newbie in perl, I don't know what to do. So could
anyone post some sample codes to do that? And I don't wanna use BioPerl
because others machines do not install this package although it's quite
useful.

Thank you very much~

Regards,

Amy Lee
 
J

Jürgen Exner

Amy Lee said:
I have a problem while I'm processing my sequence file.

I know text files, binary files, random access files, sequential files,
but I've never heard of a sequence file.
The file content
is like this.

......

And I hope make every sequence into a single file. For example, a file

What is a sequence?
"seq1" content is
ACGGTC
ACTG
And a file "seq2" content is
CGATCC
ACCTC
and so on.

How is this desired content different from the original content? They
seem to be identical to me.
However, I'm only a newbie in perl, I don't know what to do. So could
anyone post some sample codes to do that?

Probably not without some much improved specification.

jue
 
A

Amy Lee

Jue,

My most work is to process DNA so I save DNA sequences as a format called
FastA as you've seen before. And you could call my file dna.fasta, the
content is
seq1 ACGGTC
ACTG
CGATCC
ACCTC
seq3
.......

The "seq1" "seq2" "seq3" and "seqx" is the names of these sequences. I can
say, it's a mark. And under "seqx" it's DNA sequences. My point is quite
simple, I wanna extract every sequences as a file saved. I mean I can
extract sequences for dna.fasta and make a single file for every sequences.

There's an example.

From dna.fasta, I can make 3 sequences files and the names are from
mark names. They are seq1, seq2, seq3. In seq1, its content is
ACGGTC
ACTG
In file seq2, its content is
CGATCC
ACCTC
And so on. So from this I can deal with my sequences easily.

Thank you very much~

Regards,

Amy Lee
 
J

Jürgen Exner

Amy Lee said:
My most work is to process DNA so I save DNA sequences as a format called
FastA as you've seen before. And you could call my file dna.fasta, the
content is

......

From your previous description I thought those were 3 separte files.
Obviously I was wrong.
The "seq1" "seq2" "seq3" and "seqx" is the names of these sequences. I can
say, it's a mark. And under "seqx" it's DNA sequences. My point is quite
simple, I wanna extract every sequences as a file saved. I mean I can
extract sequences for dna.fasta and make a single file for every sequences.

So you want to split the file at each ">seq*" marker.

Well, then why not just loop (while (<>)) through the input file and
whenever you encounter such a marker (m//) close() the current output
file and open() a new one?

jue
 
J

John W. Krahn

Amy said:
I have a problem while I'm processing my sequence file. The file content
is like this.

......

And I hope make every sequence into a single file. For example, a file
"seq1" content is
ACGGTC
ACTG
And a file "seq2" content is
CGATCC
ACCTC
and so on.

while ( <> ) {
if ( /^>(.+)/ ) {
open my $OUT, '>>', $1 or die "Cannot open '$1' $!";
select $OUT;
}
print;
}



John
 
A

Amy Lee

while ( <> ) {
if ( /^>(.+)/ ) {
open my $OUT, '>>', $1 or die "Cannot open '$1' $!";
select $OUT;
}
print;
}



John
Thank you very much~
I've solved this problem.

Regards,

Amy Lee
 
A

Amy Lee

From your previous description I thought those were 3 separte files.
Obviously I was wrong.


So you want to split the file at each ">seq*" marker.

Well, then why not just loop (while (<>)) through the input file and
whenever you encounter such a marker (m//) close() the current output
file and open() a new one?

jue
Yes, you are right, and the codes is right for my work.

Thank you again~

Amy
 
J

Jürgen Exner

Amy Lee said:
Anyway, could you tell me how to find out the usage of "select" function?

The usage of each perl function is described in the first line(s) of the
manual page for this function. It doesn't explicitely say "Usage" as in
Unix man pages, but it has the same format:
select FILEHANDLE

Sometimes, if a function is overloaded, there may be additional usages
farther down the page, too, e.g.
select RBITS,WBITS,EBITS,TIMEOUT

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top