Need help with a program

evilweasel · Jan 28, 2010

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

a = seq1[0]
list1.append(a)

d = seq1[1]
lister.append(d)

b = len(lister)
for j in range(0, b):
if lister[j] == 0:
listers.append(j)
else:
listers1.append(j)

print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!

Alf P. Steinbach · Jan 28, 2010

* evilweasel:

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

a = seq1[0]
list1.append(a)

d = seq1[1]
lister.append(d)

b = len(lister)
for j in range(0, b):
if lister[j] == 0:
listers.append(j)
else:
listers1.append(j)

print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working.

What do you mean by "isn't working"?

I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Click to expand...

What do you expect as output, and what do you actually get as output?

Cheers,

- Alf

Mark Dickinson · Jan 28, 2010

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program.

for j in range(0, b):
if lister[j] == 0:

At a guess, this line should be:

if lister[j] == '0':
...

Krister Svanlund · Jan 28, 2010

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

a = seq1[0]
list1.append(a)

d = seq1[1]
lister.append(d)

b = len(lister)
for j in range(0, b):
if lister[j] == 0:
listers.append(j)
else:
listers1.append(j)

print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!

I'm not totaly sure what you want to do but try this (python2.6+):

newlines = []

with open(sys.argv[1], 'r') as f:
text = f.read();
for line in text.splitlines():
if not line.strip() and line.strip().endswith('1'):
newlines.append('seq'+line)

with open(sys.argv[2], 'w') as f:
f.write('\n'.join(newlines))

Krister Svanlund · Jan 28, 2010

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

a = seq1[0]
list1.append(a)

d = seq1[1]
lister.append(d)

b = len(lister)
for j in range(0, b):
if lister[j] == 0:
listers.append(j)
else:
listers1.append(j)

print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!

Click to expand...

I'm not totaly sure what you want to do but try this (python2.6+):

newlines = []

with open(sys.argv[1], 'r') as f:
text = f.read();
for line in text.splitlines():
if not line.strip() and line.strip().endswith('1'):

newlines.append('seq'+line.strip()[:-1].strip())

with open(sys.argv[2], 'w') as f:
f.write('\n'.join(newlines))

Click to expand...

Gah, made some errors

Krister Svanlund · Jan 28, 2010

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,
and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

file1 = open(sys.argv[1], 'r')
for line in file1:
if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

a = seq1[0]
list1.append(a)

d = seq1[1]
lister.append(d)

b = len(lister)
for j in range(0, b):
if lister[j] == 0:
listers.append(j)
else:
listers1.append(j)

print listers1
resultsfile = open("sequences1.txt", 'w')
for i in listers1:
resultsfile.write('\n>seq' + str(i) + '\n' + list1 + '\n')

But this isn't working. I am not able to find the bug in this. I would
be thankful if someone could point it out. Thanks in advance!

Cheers!

Click to expand...

Click to expand...

I'm trying this again:

newlines = []

with open(sys.argv[1], 'r') as f:
text = f.read();
for line in (l.strip() for l in text.splitlines()):
if line:
line_elem = line.split()
if len(line_elem) == 2 and line_elem[1] == '1':
newlines.append('seq'+line_elem[0])

with open(sys.argv[2], 'w') as f:
f.write('\n'.join(newlines))

D'Arcy J.M. Cain · Jan 28, 2010

I am a newbie to python, and I would be grateful if someone could
Welcome.

point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

You don't say how it isn't working. As a first step you should read
http://catb.org/~esr/faqs/smart-questions.html.

The text is nothing but DNA sequences, and there is a number next to
it. What I will have to do is, ignore those lines that have 0 in it,

Your code doesn't completely ignore them. See below.

and print all other lines (excluding the number) in a new text file
(in a particular format called as FASTA format). This is the program I
wrote for that:

seq1 = []
list1 = []
lister = []
listers = []
listers1 = []
a = []
d = []
i = 0
j = 0
num = 0

This seems like an awful lot of variables for such a simple task.

file1 = open(sys.argv[1], 'r')
for line in file1:

This is good. You aren't trying to load the whole file into memory at
once. If the file is huge as you say then that would have been bad. I
would have made one small optimization that saves one assignment and
one extra variable.

for line in open(sys.argv[1], 'r'):

if not line.startswith('\n'):
seq1 = line.split()
if len(seq1) == 0:
continue

This is redundant and perhaps not even correct at the end of the file.
It assumes that the last line ends with a newline. Look at what
'\n'.split() gives you and see if you can't improve the above code.

Another small optimization - "if seq1" is better than "if len(seq1)".

a = seq1[0]
list1.append(a)

Aha! I may have found your bug. Are you mixing tabs and spaces?
Don't do that. Either always use spaces or always use tabs. My
suggestion is to use spaces and choose a short indent such as three or
even two but that's a religious issue.

d = seq1[1]
lister.append(d)

You can also do "a, d = seq1". Of course you must be sure that you
have two fields. Perhaps that's guaranteed for your input but a quick
sanity test wouldn't hurt here.

However, I don't understand all of the above. It may also be a source
of problems. You say the files are huge. Are you filling up memory
here? You did the smart thing reading the file but you lose it here.
In any case, see below.

b = len(lister)
for j in range(0, b):

Go lookup zip()

if lister[j] == 0:

I think that you will find that lister[j] is "0", not 0.

listers.append(j)
else:
listers1.append(j)

Why are you collecting the input? Just toss the '0' ones and write the
others lines directly to the output.

Hope this helps with this script and in further understanding the power
and simplicity of Python. Good luck.

evilweasel · Jan 28, 2010

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

seq59902 TTTTTTTATAAAATATATAGT

TTTTTTTATTTCTTGGCGTTGT

TTTTTTTGGTTGCCCTGCGTGG

seq59905

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program.

I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it.

nn · Jan 28, 2010

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it.

People have already given you some pointers to your problem. In the
end you will have to "tweak the details" because only you have access
to the data not us.

Just as example here is another way to do what you are doing:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
partgen=(line.split() for line in infile)
dnagen=(str(i+1)+'\n'+part[0]+'\n'
for i,part in enumerate(partgen)
if len(part)>1 and part[1]!='0')
outfile.writelines(dnagen)

Arnaud Delobelle · Jan 28, 2010

nn said:
I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it.

Click to expand...

People have already given you some pointers to your problem. In the
end you will have to "tweak the details" because only you have access
to the data not us.

Just as example here is another way to do what you are doing:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
partgen=(line.split() for line in infile)
dnagen=(str(i+1)+'\n'+part[0]+'\n'
for i,part in enumerate(partgen)
if len(part)>1 and part[1]!='0')
outfile.writelines(dnagen)

I think that generator expressions are overrated

What's wrong with:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
for i, line in enumerate(infile):
parts = line.split()
if len(parts) > 1 and parts[1] != '0':
outfile.write(">seq%s\n%s\n" % (i+1, parts[0]))

(untested)

John Posner · Jan 28, 2010

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it.

Your program is a good first try. It contains a newbie error (looking
for the number 0 instead of the string "0"). But more importantly,
you're doing too much work yourself, rather than letting Python do the
heavy lifting for you. These practices and tools make life a lot easier:

* As others have noted, don't accumulate output in a list. Just write
data to the output file line-by-line.

* You don't need to initialize every variable at the beginning of the
program. But there's no harm in it.

* Use the enumerate() function to provide a line counter:

for counter, line in enumerate(file1):

This eliminates the need to accumulate output data in a list, then use
the index variable "j" as the line counter.

* Use string formatting. Each chunk of output is a two-line string, with
the line-counter and the DNA sequence as variables:

outformat = """seq%05d
%s
"""

... later, inside your loop ...

resultsfile.write(outformat % (counter, sequence))

HTH,
John

Jean-Michel Pichavant · Jan 28, 2010

evilweasel said:
I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it.

Using regexp may increase readability (if you are familiar with it).
What about

import re

output = open("sequences1.txt", 'w')

for index, line in enumerate(open(sys.argv[1], 'r')):
match = re.match('(?P<sequence>[GATC]+)\s+1')
if match:
output.write('seq%s\n%s\n' % (index, match.group('sequence')))

Jean-Michel

D'Arcy J.M. Cain · Jan 28, 2010

Using regexp may increase readability (if you are familiar with it).

If you have a problem and you think that regular expressions are the
solution then now you have two problems. Regex is really overkill for
the OP's problem and it certainly doesn't improve readability.

Jean-Michel Pichavant · Jan 28, 2010

D'Arcy J.M. Cain said:
If you have a problem and you think that regular expressions are the
solution then now you have two problems. Regex is really overkill for
the OP's problem and it certainly doesn't improve readability.

It depends on the reader ability to understand a *simple* regexp.
It is also strange to get such answer after taking so much precautions,
so let me quote myself:

"Using regexp *may* increase readability (*if* you are *familiar* with it)."

I honestly find it quite readable in the sample code I provided and
spares all the if-len-startwith-strip logic, but If the OP does not
agree, fine with me. But there's no need to get certain that I'm
completly wrong.

JM

Steven Howe · Jan 28, 2010

evilweasel said:
evilweasel said:

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it.

Click to expand...

Using regexp may increase readability (if you are familiar with it).
What about

import re

output = open("sequences1.txt", 'w')

for index, line in enumerate(open(sys.argv[1], 'r')):
match = re.match('(?P<sequence>[GATC]+)\s+1')
if match:
output.write('seq%s\n%s\n' % (index, match.group('sequence')))

Jean-Michel

Finally!

After ready 8 or 9 messages about find a line ending with '1', someone
suggests Regex.
It was my first thought.

Steven

Mensanator · Jan 28, 2010

Using regexp may increase readability (if you are familiar with it).
What about

Click to expand...

import re

Click to expand...

output = open("sequences1.txt", 'w')

Click to expand...

for index, line in enumerate(open(sys.argv[1], 'r')):
match = re.match('(?P<sequence>[GATC]+)\s+1')
if match:
output.write('seq%s\n%s\n' % (index, match.group('sequence')))

Click to expand...

Jean-Michel

Click to expand...

Finally!

After ready 8 or 9 messages about find a line ending with '1', someone
suggests Regex.
It was my first thought.

And as a first thought, it is, of course, wrong.

You don't want lines ending in '1', you want ANY non-'0' amount.

Likewise, you don't want to exclude lines ending in '0' because
you'll end up excluding counts of 10, 20, 30, etc.

You need a regex that extracts ALL the numeric characters at the end
of the
line and exclude those that evaluate to 0.

MRAB · Jan 28, 2010

Steven said:
evilweasel said:

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

seq59902
TTTTTTTATAAAATATATAGT

seq59903
TTTTTTTATTTCTTGGCGTTGT

seq59904
TTTTTTTGGTTGCCCTGCGTGG

seq59905
TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it.

Click to expand...

Using regexp may increase readability (if you are familiar with it).
What about

import re

output = open("sequences1.txt", 'w')

for index, line in enumerate(open(sys.argv[1], 'r')):
match = re.match('(?P<sequence>[GATC]+)\s+1')
if match:
output.write('seq%s\n%s\n' % (index, match.group('sequence')))

Jean-Michel

Click to expand...

Finally!

After ready 8 or 9 messages about find a line ending with '1', someone
suggests Regex.
It was my first thought.

I'm a great fan of regexes, but I never though of using them for this
because it doesn't look like a regex type of problem to me.

nn · Jan 28, 2010

Arnaud said:
nn said:

I will make my question a little more clearer. I have close to 60,000
lines of the data similar to the one I posted. There are various
numbers next to the sequence (this is basically the number of times
the sequence has been found in a particular sample). So, I would need
to ignore the ones containing '0' and write all other sequences
(excluding the number, since it is trivial) in a new text file, in the
following format:

seq59902

TTTTTTTATAAAATATATAGT

seq59903

TTTTTTTATTTCTTGGCGTTGT

seq59904

TTTTTTTGGTTGCCCTGCGTGG

seq59905

TTTTTTTGTTTATTTTTGGG

The number next to 'seq' is the line number of the sequence. When I
run the above program, what I expect is an output file that is similar
to the above output but with the ones containing '0' ignored. But, I
am getting all the sequences printed in the file.

Kindly excuse the 'newbieness' of the program. I am hoping to
improve in the next few months. Thanks to all those who replied. I
really appreciate it.

Click to expand...

People have already given you some pointers to your problem. In the
end you will have to "tweak the details" because only you have access
to the data not us.

Just as example here is another way to do what you are doing:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
partgen=(line.split() for line in infile)
dnagen=(str(i+1)+'\n'+part[0]+'\n'
for i,part in enumerate(partgen)
if len(part)>1 and part[1]!='0')
outfile.writelines(dnagen)

Click to expand...

I think that generator expressions are overrated What's wrong with:

with open('dnain.dat') as infile, open('dnaout.dat','w') as outfile:
for i, line in enumerate(infile):
parts = line.split()
if len(parts) > 1 and parts[1] != '0':
outfile.write(">seq%s\n%s\n" % (i+1, parts[0]))

(untested)

Nothing really,
After posting I was thinking I should have posted a more
straightforward version like the one you wrote. Now there is! It
probably is more efficient too. I just have a tendency to think in
terms of pipes: "pipe this junk in here, then in here, get output".
Probably damage from too much Unix scripting.Since I can't resist the
urge to post crazy code here goes the bonus round (don't do this at
work):

open('dnaout.dat','w').writelines(
'seq%s\n%s\n'%(i+1,part[0])
for i,part in enumerate(line.split() for line in open('dnain.dat'))
if len(part)>1 and part[1]!='0')

Arnaud Delobelle · Jan 28, 2010

nn said:
After posting I was thinking I should have posted a more
straightforward version like the one you wrote. Now there is! It
probably is more efficient too. I just have a tendency to think in
terms of pipes: "pipe this junk in here, then in here, get output".
Probably damage from too much Unix scripting.

This is funny, I did think *exactly* this when I saw your code

Johann Spies · Jan 29, 2010

Hi folks,

I am a newbie to python, and I would be grateful if someone could
point out the mistake in my program. Basically, I have a huge text
file similar to the format below:

AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0

I know this is a python list but if you really want to get the job
done quickly this is one method without writing python code:

$ cat /tmp/y
AAAAAGACTCGAGTGCGCGGA 0
AAAAAGATAAGCTAATTAAGCTACTGG 0
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1
AAAAAGGTCGCCTGACGGCTGC 0
$ grep -v 0 /tmp/y > tmp/z
$ cat /tmp/z
AAAAAGATAAGCTAATTAAGCTACTGGGTT 1
AAAAAGGGGGCTCACAGGGGAGGGGTAT 1

Regards
Johann
--
Johann Spies Telefoon: 021-808 4599
Informasietegnologie, Universiteit van Stellenbosch

"My son, if sinners entice thee, consent thou not."
Proverbs 1:10

I need help with a program	2	Apr 17, 2023
Need Help with Repository Program (Beginner)	1	Jul 7, 2023
Need help with a beginner program in X	3	Nov 19, 2021
Need help! Following code isnt working fully Comparison of integer and pointer	0	Nov 20, 2022
Need help with this script	4	Mar 12, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
Need help in debugging tic tac toe (Beginner)	0	Jun 28, 2023
Help in this program.	2	May 14, 2022

Need help with a program

evilweasel

Alf P. Steinbach

Mark Dickinson

Krister Svanlund

Krister Svanlund

Krister Svanlund

D'Arcy J.M. Cain

evilweasel

nn

Arnaud Delobelle

John Posner

Jean-Michel Pichavant

D'Arcy J.M. Cain

Jean-Michel Pichavant

Steven Howe

Mensanator

MRAB

nn

Arnaud Delobelle

Johann Spies

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads