Processing text using python

N

nuttydevil

Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


I'm going to be optimistic and thank you for your help in advance!
Samantha.
 
A

Alex Martelli

nuttydevil said:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?

Open each file and call thefile.read(3) in a loop, move to the next file
when the current one is exhausted. What part of this is giving you
problems?


Alex
 
R

Roy Smith

"nuttydevil said:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?

Don't reinvent the wheel. Take a look at http://www.biopython.org/.
 
X

Xavier Morel

nuttydevil said:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


I'm going to be optimistic and thank you for your help in advance!
Samantha.
Since you're reading from files, the "read" operation of file-like
objects takes an argument specifying the number of characters to read
from the stream e.g.
'erization'

Would that be enough for what you need?
 
D

danmcleran

I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();
 
S

Steven Bethard

I think this is what you want:

file = open(r'c:/test.txt','r')

c = file.read(3)
while c:
print c
c = file.read(3)

file.close();


Or:

def read3():
return file.read(3)
for chars in iter(read3, ''):
... do something with chars ...

STeVe
 
F

Fredrik Lundh

nuttydevil said:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?

did you read the string chapter in the tutorial ?

http://docs.python.org/tut/node5.html#SECTION005120000000000000000

around the middle of that chapter, there's a section on slicing:

"substrings can be specified with the slice notation: two indices
separated by a colon"

</F>
 
G

Gerard Flanagan

nuttydevil said:
Hey everyone! I'm hoping someone will be able to help me, cause I
haven't had success searching on the web so far... I have large chunks
of text ( all in a long string) that are currently all in separate
notebook files. I want to use python to read these strings of text,
THREE CHARACTERS AT A TIME. (I'm studying the genetic code you see, so
I need to read and analyse each sequence one codon at a time
effectively.) Does anyone have any idea of how to do this using python?


I'm going to be optimistic and thank you for your help in advance!
Samantha.


data1 = '''FOOTFALLSECHOINTHEMEMORY
DOWNTHEPASSAGEWHICHWEDIDNOTTAKE
TOWARDSTHEDOORWENEVEROPENED'''

num_codons = len(data1) // 3

codons = [ data1[3*i:3*(i+1)] for i in range( num_codons ) ]

print codons

class Codon(object):
#__slots__ = ['alpha', 'beta', 'gamma']
def __init__(self, a, b, c):
self.alpha = a
self.beta = b
self.gamma = c

codons = [ Codon(*codon) for codon in codons ]

print codons[0].alpha, codons[0].beta, codons[0].gamma

###output####

['FOO', 'TFA', 'LLS', 'ECH', 'OIN', 'THE', 'MEM', 'ORY', '\nDO', 'WNT',
'HEP', 'ASS', 'AGE', 'WHI', 'CHW', 'EDI', 'DNO', 'TTA', 'KE\n', 'TOW',
'ARD', 'STH', 'EDO', 'ORW', 'ENE', 'VER', 'OPE', 'NED']
F O O


Gerard
 
P

plahey

Hi,

you have plenty of good responses. I thought I would add one more:

def data_iter(file_name):
data = file(file_name)
while True:
value = data.read(3)
if not value:
break
yield value
data.close()

With the above, you can grab the entire data set (3 characters at a
time) like so:

data_set = [ d for d in data_iter('data') ]

Or iterate over it:

for d in data_iter('data'):
# do stuff

Enjoy!
 
A

Alex Martelli

Xavier Morel said:
Fredrik, how would you use slices to split a string by groups of 3
characters?

I can't answer for him, but maybe:

[s[i:i+3] for i in xrange(0, len(s), 3)]

....?


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top