T
Tim Arnold
Hi, I'm getting the by-now-familiar error:
return codecs.charmap_decode(input,errors,decoding_map)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position
4615: ordinal not in range(128)
the html file I'm working with is in utf-8, I open it with codecs, try to
feed it to TidyHTMLTreeBuilder, but no luck. Here's my code:
from elementtree import ElementTree as ET
from elementtidy import TidyHTMLTreeBuilder
fd = codecs.open(htmfile,encoding='utf-8')
tidyTree =
TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8')
tidyTree.feed(fd.read())
self.tree = tidyTree.close()
fd.close()
what am I doing wrong? Thanks in advance.
On a related note, I have another question--where/how can I get the
cElementTree.py module? Sorry for something so basic, but I tried installing
cElementTree, but while I could compile with setup.py build, I didn't end up
with a cElementTree.py file anywhere. The directory structure on my system
(HPux, but no root access) doesn't work well with setup.py install.
thanks,
--Tim Arnold
return codecs.charmap_decode(input,errors,decoding_map)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position
4615: ordinal not in range(128)
the html file I'm working with is in utf-8, I open it with codecs, try to
feed it to TidyHTMLTreeBuilder, but no luck. Here's my code:
from elementtree import ElementTree as ET
from elementtidy import TidyHTMLTreeBuilder
fd = codecs.open(htmfile,encoding='utf-8')
tidyTree =
TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8')
tidyTree.feed(fd.read())
self.tree = tidyTree.close()
fd.close()
what am I doing wrong? Thanks in advance.
On a related note, I have another question--where/how can I get the
cElementTree.py module? Sorry for something so basic, but I tried installing
cElementTree, but while I could compile with setup.py build, I didn't end up
with a cElementTree.py file anywhere. The directory structure on my system
(HPux, but no root access) doesn't work well with setup.py install.
thanks,
--Tim Arnold