J
John Nagle
"html5lib" is apparently not thread safe.
(see "http://code.google.com/p/html5lib/issues/detail?id=189")
Looking at the code, I've only found about three problems.
They're all the usual "cached in a global without locking" bug.
A few locks would fix that.
But html5lib calls the XML SAX parser. Is that thread-safe?
Or is there more trouble down at the bottom?
(I run a multi-threaded web crawler, and currently use BeautifulSoup,
which is thread safe, although dated. I'm looking at converting to
html5lib.)
John Nagle
(see "http://code.google.com/p/html5lib/issues/detail?id=189")
Looking at the code, I've only found about three problems.
They're all the usual "cached in a global without locking" bug.
A few locks would fix that.
But html5lib calls the XML SAX parser. Is that thread-safe?
Or is there more trouble down at the bottom?
(I run a multi-threaded web crawler, and currently use BeautifulSoup,
which is thread safe, although dated. I'm looking at converting to
html5lib.)
John Nagle