Help: Content extraction

Amy Lee · May 10, 2008

Hello,

I have a problem while I'm processing my sequence file. The file content
is like this.

seq1 ACGGTC
ACTG
CGATCC
ACCTC
seq3

.......

And I hope make every sequence into a single file. For example, a file
"seq1" content is

seq1

ACGGTC
ACTG
And a file "seq2" content is

seq2

CGATCC
ACCTC
and so on.

However, I'm only a newbie in perl, I don't know what to do. So could
anyone post some sample codes to do that? And I don't wanna use BioPerl
because others machines do not install this package although it's quite
useful.

Thank you very much~

Regards,

Amy Lee

Jürgen Exner · May 10, 2008

Amy Lee said:
I have a problem while I'm processing my sequence file.

I know text files, binary files, random access files, sequential files,
but I've never heard of a sequence file.

The file content
is like this.

......

And I hope make every sequence into a single file. For example, a file

What is a sequence?

"seq1" content is
ACGGTC
ACTG
And a file "seq2" content is
CGATCC
ACCTC
and so on.

How is this desired content different from the original content? They
seem to be identical to me.

However, I'm only a newbie in perl, I don't know what to do. So could
anyone post some sample codes to do that?

Probably not without some much improved specification.

jue

Amy Lee · May 10, 2008

Jue,

My most work is to process DNA so I save DNA sequences as a format called
FastA as you've seen before. And you could call my file dna.fasta, the
content is

seq1 ACGGTC
ACTG
CGATCC
ACCTC
seq3

.......

The "seq1" "seq2" "seq3" and "seqx" is the names of these sequences. I can
say, it's a mark. And under "seqx" it's DNA sequences. My point is quite
simple, I wanna extract every sequences as a file saved. I mean I can
extract sequences for dna.fasta and make a single file for every sequences.

There's an example.

From dna.fasta, I can make 3 sequences files and the names are from
mark names. They are seq1, seq2, seq3. In seq1, its content is

seq1

ACGGTC
ACTG
In file seq2, its content is

seq2

CGATCC
ACCTC
And so on. So from this I can deal with my sequences easily.

Thank you very much~

Regards,

Amy Lee

Jürgen Exner · May 10, 2008

Amy Lee said:
My most work is to process DNA so I save DNA sequences as a format called
FastA as you've seen before. And you could call my file dna.fasta, the
content is

......

From your previous description I thought those were 3 separte files.
Obviously I was wrong.

The "seq1" "seq2" "seq3" and "seqx" is the names of these sequences. I can
say, it's a mark. And under "seqx" it's DNA sequences. My point is quite
simple, I wanna extract every sequences as a file saved. I mean I can
extract sequences for dna.fasta and make a single file for every sequences.

So you want to split the file at each ">seq*" marker.

Well, then why not just loop (while (<>)) through the input file and
whenever you encounter such a marker (m//) close() the current output
file and open() a new one?

jue

John W. Krahn · May 10, 2008

Amy said:
I have a problem while I'm processing my sequence file. The file content
is like this.

......

And I hope make every sequence into a single file. For example, a file
"seq1" content is
ACGGTC
ACTG
And a file "seq2" content is
CGATCC
ACCTC
and so on.

while ( <> ) {
if ( /^>(.+)/ ) {
open my $OUT, '>>', $1 or die "Cannot open '$1' $!";
select $OUT;
}
print;
}

John

Amy Lee · May 10, 2008

while ( <> ) {
if ( /^>(.+)/ ) {
open my $OUT, '>>', $1 or die "Cannot open '$1' $!";
select $OUT;
}
print;
}

John

Thank you very much~
I've solved this problem.

Regards,

Amy Lee

Amy Lee · May 10, 2008

Thank you very much~
I've solved this problem.

Regards,

Amy Lee

Anyway, could you tell me how to find out the usage of "select" function?

Thank you.

Amy Lee · May 10, 2008

From your previous description I thought those were 3 separte files.
Obviously I was wrong.

So you want to split the file at each ">seq*" marker.

Well, then why not just loop (while (<>)) through the input file and
whenever you encounter such a marker (m//) close() the current output
file and open() a new one?

jue

Yes, you are right, and the codes is right for my work.

Thank you again~

Amy

Jürgen Exner · May 10, 2008

Amy Lee said:
Anyway, could you tell me how to find out the usage of "select" function?

The usage of each perl function is described in the first line(s) of the
manual page for this function. It doesn't explicitely say "Usage" as in
Unix man pages, but it has the same format:
select FILEHANDLE

Sometimes, if a function is overloaded, there may be additional usages
farther down the page, too, e.g.
select RBITS,WBITS,EBITS,TIMEOUT

jue

Help: Reverse Letters	11	May 2, 2008
Help: Replace Help	15	May 1, 2008
extraction tool using CRF++	3	Oct 1, 2013
Help please	8	Jul 7, 2023
another problem with modules	0	Feb 17, 2011
How do I set the default content page) on a Classic ASP file?	0	Aug 24, 2021
Help with Paypal Live Transactions	1	May 19, 2023
match sequence	4	May 9, 2006

Help: Content extraction

Amy Lee

Jürgen Exner

Amy Lee

Jürgen Exner

John W. Krahn

Amy Lee

Amy Lee

Amy Lee

Jürgen Exner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads