F
flamingivanova
I have several ascii files that contain '\ooo' strings which represent
the octal value for a character. I want to convert these files to
unicode, and I came up with the following script. But it seems to me
that there must be a much simpler way to do it. Could someone more
experienced suggest some improvements?
I want to convert a file eg. containing:
hello \326du
with the unicode file containing:
hello Ödu
----------8<---------------------------------------
#!/usr/bin/python
import re, string, sys
if len(sys.argv) > 1:
file = open(sys.argv[1],'r')
lines = file.readlines()
file.close()
else:
print "give a filename"
sys.exit()
def to_unichr(str):
oct = string.atoi(str.group(1),8)
return unichr(oct)
for line in lines:
line = string.rstrip(unicode(line,'Latin-1'))
if re.compile(r'\\\d\d\d').search(line):
line = re.sub(r'\\(\d\d\d)', to_unichr, line)
line = line.encode('utf-8')
print line
----------8<---------------------------------------
the octal value for a character. I want to convert these files to
unicode, and I came up with the following script. But it seems to me
that there must be a much simpler way to do it. Could someone more
experienced suggest some improvements?
I want to convert a file eg. containing:
hello \326du
with the unicode file containing:
hello Ödu
----------8<---------------------------------------
#!/usr/bin/python
import re, string, sys
if len(sys.argv) > 1:
file = open(sys.argv[1],'r')
lines = file.readlines()
file.close()
else:
print "give a filename"
sys.exit()
def to_unichr(str):
oct = string.atoi(str.group(1),8)
return unichr(oct)
for line in lines:
line = string.rstrip(unicode(line,'Latin-1'))
if re.compile(r'\\\d\d\d').search(line):
line = re.sub(r'\\(\d\d\d)', to_unichr, line)
line = line.encode('utf-8')
print line
----------8<---------------------------------------