[repost] Does standard python have BeautifulSoup (or something likeit) ?

S

steve

Hi,

I had sent this question below a couple of months ago but didn't receive any
replies. I forgot about it though, since I had moved on to using BeautifulSoup.
Now however, I am wondering why something like this is not present in the
standard lib. What is the accepted procedure to propose the inclusion of some
module in our beloved 'batteries included' library ?

I'd like to see this module (or something similar) included.

regards,
- steve

-------- Original Message --------
Subject: Does standard python have BeautifulSoup (or something like it) ?
Date: Tue, 14 Jul 2009 23:51:31 +0200
To: (e-mail address removed)

Hi,

After a long time, I decided to play with some web site scraping and realized
that for all the batteries included approach, the standard python lib (still)
does not have a simple to use html parser (ie: one which does not require one to
implement a class just to extract the url of a image tag) ...or does it ?

I like BeautifulSoup, which I used once long ago, but would prefer to use
something which is part of the standard lib. Is there a module that I completely
missed ? (btw, I'm using python-2.6. If not in 2.6 is there something in like
BeautifulSoup in 3.0 ?)

regards,
- steve
 
S

Stefan Behnel

steve said:
I had sent this question below a couple of months ago but didn't receive
any replies. I forgot about it though, since I had moved on to using
BeautifulSoup. Now however, I am wondering why something like this is
not present in the standard lib. What is the accepted procedure to
propose the inclusion of some module in our beloved 'batteries included'
library ?

I'd like to see this module (or something similar) included.

This has been proposed and discussed before and was rejected, IIRC, mainly
due to the lack of a maintainer. But there are also other reasons. For
stdlib inclusion, it's generally required for a module/package to be
best-of-breed. So, why BS instead of html5lib, which, supposedly, is a lot
more 'future proof' than BS. Or lxml, which is a lot faster and more memory
friendly? Or maybe others?

See, for example, the python-dev archives from 2009-03-02.

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top