A
A. S. Bradbury
= Ariel release 0.1.0
== About - Ariel: A Ruby Information Extraction Library
Ariel is a library that allows you to extract information from semi-structured
documents (such as websites). It is different to existing tools because rather
than expecting the developer to write rules to extract the desired
information, Ariel will use a small number of labeled examples to generate
and learn effective extraction rules. It is developed by Alex Bradbury and
released under the MIT license. Ariel was started as a Google Summer of Code
project mentored by Austin Ziegler in 2006.
== Install
gem install ariel
== Announcement
I'm happy to announce the release of Ariel 0.1.0, the result of my Summer of
Code work. This release should be easy to use, very functional, and hopefully
useful - so it's worth trying out. I've put a lot of effort in to writing
clear and straightforward documentation to get your started, so take a look
at the docs available at http://ariel.rubyforge.org. In particular, flick
through the tutorial and quick start guide. If you're interested, you may
also want to take a look at the theory page where I've made a good start on
describing the method Ariel uses to learn extraction rules. If you have any
problems or find any bugs, just send me an email or add it to the issue
tracker (see link below). Enjoy. See the FAQ for a vim snippet to make
labeling examples a little easier.
== Quickstart/Basic usage
* @require 'ariel'@
* Define a structure for the information you wish to extract:
structure = Ariel::Node::Structure.new do |r|
r.item :title
r.item :body
r.list :comments do |c|
c.list_item :comment do |d|
d.item :author
d.item :body
end
end
end
* Collect a few examples of the sort of document you wish to extract
information from (pages from the same website for instance).
* Label each example with tags such as <l:title>, <l:comment> and so on in the
relevant places.
* Ariel.learn structure, labeled_file1, labeled_file2, labeled_file3
* Find the documents you want to extract information from.
* extractions = Ariel.extract structure, unlabeled_file1,
unlabeled_file2
* extractions[0].search('comments/*/body').each {|e| puts e.extracted_text}
=> "Great stuff, loving it", "I love life", .....
* extractions[0].at('comments/34') => nil</tt> (there is no 34th comment, #at
returns the first result rather than an array of matches).
== Credits
Ariel is developed by Alex Bradbury as a Google Summer of Code project under
the mentoring of Austin Ziegler.
== Links
SVN Repository: http://rubyforge.org/projects/ariel
Issue tracker: http://code.google.com/p/ariel/issues/
Documentation/homepage: http://ariel.rubyforge.org
RDoc: http://ariel.rubyforge.org/rdoc/
== About - Ariel: A Ruby Information Extraction Library
Ariel is a library that allows you to extract information from semi-structured
documents (such as websites). It is different to existing tools because rather
than expecting the developer to write rules to extract the desired
information, Ariel will use a small number of labeled examples to generate
and learn effective extraction rules. It is developed by Alex Bradbury and
released under the MIT license. Ariel was started as a Google Summer of Code
project mentored by Austin Ziegler in 2006.
== Install
gem install ariel
== Announcement
I'm happy to announce the release of Ariel 0.1.0, the result of my Summer of
Code work. This release should be easy to use, very functional, and hopefully
useful - so it's worth trying out. I've put a lot of effort in to writing
clear and straightforward documentation to get your started, so take a look
at the docs available at http://ariel.rubyforge.org. In particular, flick
through the tutorial and quick start guide. If you're interested, you may
also want to take a look at the theory page where I've made a good start on
describing the method Ariel uses to learn extraction rules. If you have any
problems or find any bugs, just send me an email or add it to the issue
tracker (see link below). Enjoy. See the FAQ for a vim snippet to make
labeling examples a little easier.
== Quickstart/Basic usage
* @require 'ariel'@
* Define a structure for the information you wish to extract:
structure = Ariel::Node::Structure.new do |r|
r.item :title
r.item :body
r.list :comments do |c|
c.list_item :comment do |d|
d.item :author
d.item :body
end
end
end
* Collect a few examples of the sort of document you wish to extract
information from (pages from the same website for instance).
* Label each example with tags such as <l:title>, <l:comment> and so on in the
relevant places.
* Ariel.learn structure, labeled_file1, labeled_file2, labeled_file3
* Find the documents you want to extract information from.
* extractions = Ariel.extract structure, unlabeled_file1,
unlabeled_file2
* extractions[0].search('comments/*/body').each {|e| puts e.extracted_text}
=> "Great stuff, loving it", "I love life", .....
* extractions[0].at('comments/34') => nil</tt> (there is no 34th comment, #at
returns the first result rather than an array of matches).
== Credits
Ariel is developed by Alex Bradbury as a Google Summer of Code project under
the mentoring of Austin Ziegler.
== Links
SVN Repository: http://rubyforge.org/projects/ariel
Issue tracker: http://code.google.com/p/ariel/issues/
Documentation/homepage: http://ariel.rubyforge.org
RDoc: http://ariel.rubyforge.org/rdoc/