G
geoff_ness
Hello - and apologies in advance for the length of this post.
I am having a hard time understanding the errors being generated by a
program I've written. The code is intended to parse text files which
are copied and pasted from web pages from an online game. The encoding
of the pages is ISO-8859-1, but the text that gets copied contains
characters from character sets other than latin-1.
For instance, one of the lines I need to be able to read is:
196679 Daimyo 石 Druid 145 27 12/09/07 21:40:04 [ Expel ]
I start with the file 'citizen_list' and use this function to read it
and return a list of names (for instance, Daimyo 石 Druid) and ID
numbers:
# builds the list of names from the citizens list
def getNames(f):
"""Builds a list from the town list of names
Returns a list"""
newlist = []
for line in f:
namewords = line.rstrip('[Expel]\n\t ')\
.rstrip(':/0123456789 ').rstrip('\t ').rstrip('0123456789 ')\
.rstrip('\t ').rstrip('0123456789 ').rstrip('\t ').split()
entry = ";".join([namewords[0], "
".join(namewords[1:len(namewords)])])
newlist.append(entry)
return newlist
citizens = codecs.open('citizen_list', 'r', 'utf-8', 'strict')
listNames = getNames(citizens)
citizens.close()
I've specified 'utf-8' as the encoding as this seemed to be the best
candidate for picking up all the names in the list. I use the names in
other functions - for example:
def getdamage(warrior, rpt):
"""reads each line of war report
returns damage and number of kills for citizen name"""
for line in rpt:
if (line.startswith(warrior.name) or \
line.startswith('A blue aura surrounds ' +
warrior.name))\
and line.find('weapon') > 0:
warrior.addDamage(int(line[line.find('caused ')
+7:line.find(' damage')]))
if rpt.next().find('is dead') >0:
warrior.addKill()
elif line.startswith(warrior.name+' is dead'):
warrior.dies()
break
elif line.startswith('Starting round'):
warrior.addRound()
for cit in listNames:
c = Warrior(cit.split(';')[0], cit.split(';')[1])
totalnum += 1
report = codecs.open('war_report','r', 'utf-8', 'strict')
getdamage(c, report)
report.close()
--[snip]--
def buildString(warrior):
"""Build a string from a warrior's stats
Returns string for output to warStat."""
return "!tr!!td!!id!"+str(warrior.ID)+"!/id!!/td!"+\
"!td!"+str(warrior.damage)+"!/td!!td!"+str(warrior.kills)+\
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
This code runs fine on my linux machine, but when I sent the code to a
friend with python running on windows, he got the following error:
Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in _main_._dict_
File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\parser_1.0.py", line 63, in <module>
"".join(["%s" % buildString(c) for c in citlistS[:100]])+"!/
table!")
File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\iotp_alt2.py", line 169, in buildString
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)
As I understand it the error is related to the ascii codec being
unable to cope with the unicode string u'\ufeff'.
The issue I have is that this error doesn't show up for me - ascii is
the default encoding for me also. Any thoughts or assistance would be
welcomed.
Cheers
I am having a hard time understanding the errors being generated by a
program I've written. The code is intended to parse text files which
are copied and pasted from web pages from an online game. The encoding
of the pages is ISO-8859-1, but the text that gets copied contains
characters from character sets other than latin-1.
For instance, one of the lines I need to be able to read is:
196679 Daimyo 石 Druid 145 27 12/09/07 21:40:04 [ Expel ]
I start with the file 'citizen_list' and use this function to read it
and return a list of names (for instance, Daimyo 石 Druid) and ID
numbers:
# builds the list of names from the citizens list
def getNames(f):
"""Builds a list from the town list of names
Returns a list"""
newlist = []
for line in f:
namewords = line.rstrip('[Expel]\n\t ')\
.rstrip(':/0123456789 ').rstrip('\t ').rstrip('0123456789 ')\
.rstrip('\t ').rstrip('0123456789 ').rstrip('\t ').split()
entry = ";".join([namewords[0], "
".join(namewords[1:len(namewords)])])
newlist.append(entry)
return newlist
citizens = codecs.open('citizen_list', 'r', 'utf-8', 'strict')
listNames = getNames(citizens)
citizens.close()
I've specified 'utf-8' as the encoding as this seemed to be the best
candidate for picking up all the names in the list. I use the names in
other functions - for example:
def getdamage(warrior, rpt):
"""reads each line of war report
returns damage and number of kills for citizen name"""
for line in rpt:
if (line.startswith(warrior.name) or \
line.startswith('A blue aura surrounds ' +
warrior.name))\
and line.find('weapon') > 0:
warrior.addDamage(int(line[line.find('caused ')
+7:line.find(' damage')]))
if rpt.next().find('is dead') >0:
warrior.addKill()
elif line.startswith(warrior.name+' is dead'):
warrior.dies()
break
elif line.startswith('Starting round'):
warrior.addRound()
for cit in listNames:
c = Warrior(cit.split(';')[0], cit.split(';')[1])
totalnum += 1
report = codecs.open('war_report','r', 'utf-8', 'strict')
getdamage(c, report)
report.close()
--[snip]--
def buildString(warrior):
"""Build a string from a warrior's stats
Returns string for output to warStat."""
return "!tr!!td!!id!"+str(warrior.ID)+"!/id!!/td!"+\
"!td!"+str(warrior.damage)+"!/td!!td!"+str(warrior.kills)+\
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
This code runs fine on my linux machine, but when I sent the code to a
friend with python running on windows, he got the following error:
Traceback (most recent call last):
File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
exec codeObject in _main_._dict_
File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\parser_1.0.py", line 63, in <module>
"".join(["%s" % buildString(c) for c in citlistS[:100]])+"!/
table!")
File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\iotp_alt2.py", line 169, in buildString
"!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)
As I understand it the error is related to the ascii codec being
unable to cope with the unicode string u'\ufeff'.
The issue I have is that this error doesn't show up for me - ascii is
the default encoding for me also. Any thoughts or assistance would be
welcomed.
Cheers