elementtidy, \0 chars and parsing from a string

S

Steven Bethard

So I see that elementtidy doesn't like strings with \0 characters in them:
Traceback (most recent call last):
...
File "...elementtidy\TidyHTMLTreeBuilder.py", line 90, in close
stdout, stderr = _elementtidy.fixup(*args)
TypeError: fixup() argument 1 must be string without null bytes, not str

The obvious solution would be to str.replace('\0', '') on the file's
text, but I'm not sure how to ask elementtidy to parse from a string
instead of a file-like object. Do I need to wrap it in a StringIO, or
is there a better way?

STeVe
 
S

Simon Percivall

Well, it seems you can do:

parser = elementtidy.TidyHTMLTreeBuilder.TidyHTMLTreeBuilder()
parser.feed(your_str)
tree = elementtree.ElementTree.ElementTree(element=parser.close())

Look at the parse() method in the ElementTree class.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,294
Messages
2,571,511
Members
48,213
Latest member
DonnellTol

Latest Threads

Top