[ANN] scRUBYt! 0.4.1

peter · Dec 11, 2008

Hello all,

After more than a year, I'd like to announce a new release of scRUBYt! -
so "scRUBYt!".is_vaporware? == false now!

==========
scRUBwhat?
==========

scRUBYt! is a (hopefully) easy to use, yet powerful Web scraping framework
based on Hpricot, mechanize and/or FireWatir. It's purpose is to free you
from the drudgery of web page crawling, looking up HTML tags, attributes,
XPaths, form names and other typical low-level web scraping stuff by
figuring these out from your examples copy'n'pasted from the Web page or
Firebug.

===============
In this release
===============

Finally it is possible to use FireWatir as the agent for navigation,
enabling AJAX/more robust scraping via Firefox.
Another big news is that the RubyInline, ParseTree and Ruby2Ruby
dependency was dropped since we couldn't solve this problem for win32 for
one year.
Of course a lot of bugs were fixed as well!

=========
CHANGELOG
=========

- [NEW] possibility to use FireWatir as the agent for scraping (credit:
Glenn Gillen)
- [FIX] navigation doesn't crash if a 404/500 is returned (credit: Glen
Gillen)
- [NEW] navigation action: click_by_xpath to click arbitrary elements
- [MOD] dropped dependencies: RubyInline, ParseTree, Ruby2Ruby (hooray for
win32 users)
- [NEW] scraping through frames (e.g. google analytics)
- [MOD] exporting temporarily doesn't work - for now, generated XPaths are
printed to the screen
- [MOD] possibility to wait after clicking link/filling textfield (to be
able to scrape inserted AJAX stuff)
- [NEW] possibility to fetch from a string, by specifying nil as the url
and the html string with the :html option
- [FIX] firewatir slowness (credit: jak4)
- [FIX] lot of bugfixes and stability fixes

===========
What's next
===========

The biggest news is that scRUBYt! is going to be rewritten from scratch -
the work has already been started by Glenn Gillen. scRUBYt! has grown too
big for our taste, so we decided to start anew, aiming for 100% rSpec
coverage, refactored code, speed/performance optimization and leaving
cruft behind. So scRUBYt! 0.4.1, the last one based on the original
scRUBYt! will be supported until the new, rewritten one (0.5.0) comes out
and takes it's place.

We are working on a public scraper repository
(http://github.com/scrubber/scrubyt_examples/tree/master) where you can
post your scRUBYt! snippets and check out what are the others doing.

And there are other interesting stuff in the pipeline as well - stay tuned!

Cheers,
Glenn && Peter

___
http://scrubyt.org
http://www.rubypond.com (Glenn)
http://www.rubyrailways.com (Peter)

jzakiya · Jan 6, 2009

Hello all,

After more than a year, I'd like to announce a new release of scRUBYt! -
so "scRUBYt!".is_vaporware? == false now!

==========
scRUBwhat?
==========

scRUBYt! is a (hopefully) easy to use, yet powerful Web scraping framework
based on Hpricot, mechanize and/or FireWatir. It's purpose is to free you
from the drudgery of web page crawling, looking up HTML tags, attributes,
XPaths, form names and other typical low-level web scraping stuff by
figuring these out from your examples copy'n'pasted from the Web page or
Firebug.

===============
In this release
===============

Finally it is possible to use FireWatir as the agent for navigation,
enabling AJAX/more robust scraping via Firefox.
Another big news is that the RubyInline, ParseTree and Ruby2Ruby
dependency was dropped since we couldn't solve this problem for win32 for
one year.
Of course a lot of bugs were fixed as well!

=========
CHANGELOG
=========

- [NEW] possibility to use FireWatir as the agent for scraping (credit:
Glenn Gillen)
- [FIX] navigation doesn't crash if a 404/500 is returned (credit: Glen
Gillen)
- [NEW] navigation action: click_by_xpath to click arbitrary elements
- [MOD] dropped dependencies: RubyInline, ParseTree, Ruby2Ruby (hooray for
win32 users)
- [NEW] scraping through frames (e.g. google analytics)
- [MOD] exporting temporarily doesn't work - for now, generated XPaths are
printed to the screen
- [MOD] possibility to wait after clicking link/filling textfield (to be
able to scrape inserted AJAX stuff)
- [NEW] possibility to fetch from a string, by specifying nil as the url
and the html string with the :html option
- [FIX] firewatir slowness (credit: jak4)
- [FIX] lot of bugfixes and stability fixes

===========
What's next
===========

The biggest news is that scRUBYt! is going to be rewritten from scratch -
the work has already been started by Glenn Gillen. scRUBYt! has grown too
big for our taste, so we decided to start anew, aiming for 100% rSpec
coverage, refactored code, speed/performance optimization and leaving
cruft behind. So scRUBYt! 0.4.1, the last one based on the original
scRUBYt! will be supported until the new, rewritten one (0.5.0) comes out
and takes it's place.

We are working on a public scraper repository
(http://github.com/scrubber/scrubyt_examples/tree/master) where you can
post your scRUBYt! snippets and check out what are the others doing.

And there are other interesting stuff in the pipeline as well - stay tuned!

Cheers,
Glenn && Peter

___http://scrubyt.orghttp://www.rubypond.com(Glenn)http://www.rubyrailways.com(Peter)

I tried to update this gem from 0.4.06 to 0.4.11 but it doesn't seem
to see it, even when I use the --source URL option. Have others
experienced problems in updating too?

I tried

$ gem update scrubty --source 'http://rubyforge.org/projects/scrubty'

and it gave me the error message

Updating installed gems
ERROR: While executing gem ... (Zlib::GzipFile::Error)
not in gzip format

[ANN] scRUBYt! 0.3.4	0	Sep 27, 2007
[ANN] scRUBYt! 0.2.8	4	Apr 19, 2007
scrubyt anyone?	4	Sep 16, 2009
scRUBYt! 0.3.1 released	0	May 29, 2007
Problem while using scrubyt	0	Oct 8, 2008
[ANN] scRUBYt! 0.2.3 - Hpricot and Mechanize on steroids	2	Feb 21, 2007
Can't install//use the scrubyt gem? LoadError: no such file to load-- parse_tree_reloaded. What did	1	Aug 29, 2007
update my gems this morning ;(	2	Jun 10, 2008

[ANN] scRUBYt! 0.4.1

peter

jzakiya

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads