P
patrick.waldo
Hi all,
I started Python just a little while ago and I am stuck on something
that is really simple, but I just can't figure out.
Essentially I need to take a text document with some chemical
information in Czech and organize it into another text file. The
information is always EINECS number, CAS, chemical name, and formula
in tables. I need to organize them into lines with | in between. So
it goes from:
200-763-1 71-73-8
nátrium-tiopentál C11H18N2O2S.Na to:
200-763-1|71-73-8|nátrium-tiopentál|C11H18N2O2S.Na
but if I have a chemical like: kyselina moÄová
I get:
200-720-7|69-93-2|kyselina|moÄová
|C5H4N4O3|200-763-1|71-73-8|nátrium-tiopentál
and then it is all off.
How can I get Python to realize that a chemical name may have a space
in it?
Thank you,
Patrick
So far I have:
#take tables in one text file and organize them into lines in another
import codecs
path = "c:\\text_samples\\chem_1_utf8.txt"
path2 = "c:\\text_samples\\chem_2.txt"
input = codecs.open(path, 'r','utf8')
output = codecs.open(path2, 'w', 'utf8')
#read and enter into a list
chem_file = []
chem_file.append(input.read())
#split words and store them in a list
for word in chem_file:
words = word.split()
#starting values in list
e=0 #EINECS
c=1 #CAS
ch=2 #chemical name
f=3 #formula
n=0
loop=1
x=len(words) #counts how many words there are in the file
print '-'*100
while loop==1:
if n<x and f<=x:
print words[e], '|', words[c], '|', words[ch], '|', words[f],
'\n'
output.write(words[e])
output.write('|')
output.write(words[c])
output.write('|')
output.write(words[ch])
output.write('|')
output.write(words[f])
output.write('\r\n')
#increase variables by 4 to get next set
e = e + 4
c = c + 4
ch = ch + 4
f = f + 4
# increase by 1 to repeat
n=n+1
else:
loop=0
input.close()
output.close()
I started Python just a little while ago and I am stuck on something
that is really simple, but I just can't figure out.
Essentially I need to take a text document with some chemical
information in Czech and organize it into another text file. The
information is always EINECS number, CAS, chemical name, and formula
in tables. I need to organize them into lines with | in between. So
it goes from:
200-763-1 71-73-8
nátrium-tiopentál C11H18N2O2S.Na to:
200-763-1|71-73-8|nátrium-tiopentál|C11H18N2O2S.Na
but if I have a chemical like: kyselina moÄová
I get:
200-720-7|69-93-2|kyselina|moÄová
|C5H4N4O3|200-763-1|71-73-8|nátrium-tiopentál
and then it is all off.
How can I get Python to realize that a chemical name may have a space
in it?
Thank you,
Patrick
So far I have:
#take tables in one text file and organize them into lines in another
import codecs
path = "c:\\text_samples\\chem_1_utf8.txt"
path2 = "c:\\text_samples\\chem_2.txt"
input = codecs.open(path, 'r','utf8')
output = codecs.open(path2, 'w', 'utf8')
#read and enter into a list
chem_file = []
chem_file.append(input.read())
#split words and store them in a list
for word in chem_file:
words = word.split()
#starting values in list
e=0 #EINECS
c=1 #CAS
ch=2 #chemical name
f=3 #formula
n=0
loop=1
x=len(words) #counts how many words there are in the file
print '-'*100
while loop==1:
if n<x and f<=x:
print words[e], '|', words[c], '|', words[ch], '|', words[f],
'\n'
output.write(words[e])
output.write('|')
output.write(words[c])
output.write('|')
output.write(words[ch])
output.write('|')
output.write(words[f])
output.write('\r\n')
#increase variables by 4 to get next set
e = e + 4
c = c + 4
ch = ch + 4
f = f + 4
# increase by 1 to repeat
n=n+1
else:
loop=0
input.close()
output.close()