elementtree w/utf8

T

Tim Arnold

Hi, I'm getting the by-now-familiar error:
return codecs.charmap_decode(input,errors,decoding_map)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position
4615: ordinal not in range(128)

the html file I'm working with is in utf-8, I open it with codecs, try to
feed it to TidyHTMLTreeBuilder, but no luck. Here's my code:
from elementtree import ElementTree as ET
from elementtidy import TidyHTMLTreeBuilder

fd = codecs.open(htmfile,encoding='utf-8')
tidyTree =
TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8')
tidyTree.feed(fd.read())
self.tree = tidyTree.close()
fd.close()

what am I doing wrong? Thanks in advance.

On a related note, I have another question--where/how can I get the
cElementTree.py module? Sorry for something so basic, but I tried installing
cElementTree, but while I could compile with setup.py build, I didn't end up
with a cElementTree.py file anywhere. The directory structure on my system
(HPux, but no root access) doesn't work well with setup.py install.

thanks,
--Tim Arnold
 
M

Marc 'BlackJack' Rintsch

Hi, I'm getting the by-now-familiar error:
return codecs.charmap_decode(input,errors,decoding_map)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position
4615: ordinal not in range(128)

the html file I'm working with is in utf-8, I open it with codecs, try to
feed it to TidyHTMLTreeBuilder, but no luck. Here's my code:
from elementtree import ElementTree as ET
from elementtidy import TidyHTMLTreeBuilder

fd = codecs.open(htmfile,encoding='utf-8')
tidyTree =
TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8')
tidyTree.feed(fd.read())
self.tree = tidyTree.close()
fd.close()

what am I doing wrong? Thanks in advance.

You feed decoded data to `TidyHTMLTreeBuilder`. As the `encoding`
argument suggests this class wants bytes not unicode. Decoding twice
doesn't work.

Ciao,
Marc 'BlackJack' Rintsch
 
D

Diez B. Roggisch

Tim said:
Hi, I'm getting the by-now-familiar error:
return codecs.charmap_decode(input,errors,decoding_map)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position
4615: ordinal not in range(128)

the html file I'm working with is in utf-8, I open it with codecs, try to
feed it to TidyHTMLTreeBuilder, but no luck. Here's my code:
from elementtree import ElementTree as ET
from elementtidy import TidyHTMLTreeBuilder

fd = codecs.open(htmfile,encoding='utf-8')
tidyTree =
TidyHTMLTreeBuilder.TidyHTMLTreeBuilder(encoding='utf-8')
tidyTree.feed(fd.read())
self.tree = tidyTree.close()
fd.close()

what am I doing wrong? Thanks in advance.

Being to clever for your own good.. sorry to say so. But
TidyHTMLTreeBuilder takes the encoding for a reason: it expects a
byte-string that it will decode itself.

But you decode first, creating a unicode-object. When feeding that to
the string-expecting feed-method, python attempts a conversion to a
byte-string using the default-encoding.

Not using codecs but a file instead should do the trick.

diez
 
T

Tim Arnold

Marc 'BlackJack' Rintsch said:
You feed decoded data to `TidyHTMLTreeBuilder`. As the `encoding`
argument suggests this class wants bytes not unicode. Decoding twice
doesn't work.

Ciao,
Marc 'BlackJack' Rintsch

well now that you say it, it seems so obvious...
some day I will get the hang of this encode/decode stuff. When I read about
it, I'm fine, it makes sense, etc. maybe even a little boring. And then I
write stuff like the above!

Thanks to you and Diez for straightening me out.
--Tim
 
R

rzzzwilson

Tim Arnold wrote:
On a related note, I have another question--where/how can I get the
cElementTree.py module? Sorry for something so basic, but I tried installing
cElementTree, but while I could compile with setup.py build, I didn't end up
with a cElementTree.py file anywhere. The directory structure on my system
(HPux, but no root access) doesn't work well with setup.py install.

thanks,
--Tim Arnold

I had the same question a while ago .... and the answer is ElementTree
is now
part of the standard library.

http://docs.python.org/lib/module-xml.etree.ElementTree.html

Ross
 
S

Stefan Behnel

Tim said:
On a related note, I have another question--where/how can I get the
cElementTree.py module? Sorry for something so basic, but I tried installing
cElementTree, but while I could compile with setup.py build, I didn't end up
with a cElementTree.py file anywhere.

That's because it compiles into a binary extension module, not a plain Python
module (mind the 'c' in its name, which stands for the C language here).

I don't know what the standard library extension is under HP-UX, but look a
little closer at the files that weren't there before, you'll find it.
Depending on what you did to build it, it might also end up in the "build"
directory or as an installable package in the "dist" directory.

The directory structure on my system
(HPux, but no root access) doesn't work well with setup.py install.

That shouldn't be a problem as long as you keep the binary in your PYTHONPATH.

As suggested before, if you have Python 2.5, you don't even need to install it
yourself.

Stefan
 
T

Tim Arnold

Stefan Behnel said:
That's because it compiles into a binary extension module, not a plain
Python
module (mind the 'c' in its name, which stands for the C language here).

I don't know what the standard library extension is under HP-UX, but look
a
little closer at the files that weren't there before, you'll find it.
Depending on what you did to build it, it might also end up in the "build"
directory or as an installable package in the "dist" directory.



That shouldn't be a problem as long as you keep the binary in your
PYTHONPATH.

As suggested before, if you have Python 2.5, you don't even need to
install it
yourself.

Stefan

very nice--thanks. I saw the cElementTree.sl file, but didn't realize it
would work as-is.
thanks,
--Tim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top