[Note: parts of this message were removed to make it a legal post.]
Just a quick check. I am starting into Ruby with little interest in
Ruby on Rails (its good to know its there though). I want to use Ruby
as a more data centric means. So in my first case I want to extract
data from websites and put it in a suitable format xml, csv etc for
use in excel or later for use in a db application such as mysql or
postgre.
I was looking at several comparison and I thought that Ruby dealt with
lists and data in a logical fashion.
By "extract data from websites" I assume you mean screen scraping. Here are
two Railscasts about ithttp://railscasts.com/episodes/173-screen-scraping-with-scrapihttp://railscasts.com/episodes/190-screen-scraping-with-nokogiri
Some Nokogiri tutorials about ithttp://nokogiri.org/tutorials
Some Mechanize tutorials about it (you will only need to use Mechanize if
you need to interact with the site, it uses Nokogiri under the covers. Note
that it can't handle Javascript, and there are some alternatives if you need
that)
http://mechanize.rubyforge.org/mechanize/EXAMPLES_rdoc.html
Depending on what site you're trying to get info from, it might have an API,
and there might even be a gem for interacting with that API, and saving
yourself the headache and brittleness of screen scraping.
For outputing to XML, Nokogiri can do that. If you have difficulty getting
it installed, I've also enjoyed using Hpricot (aside from api, the biggest
difference is that Nokogiri is built on libxml2, an open source very popular
C library, while Hpricot is built on a Ragel parser), and if you have
difficulty with that as well, the standard library provides one called
REXML.
Also consider YAML, which is built into the stdlib, (but has difficulty, I
found, dealing with huge data sets).
There are a couple of gems for JSON, I can't remember which one I've used..
For CSV, the fastercsv gem.
I am almost certain there are tools for interacting with Excel, but I'm on a
Mac, so not able to really help there.
Depending on what you're doing, you may not need the intermediate form tobe
human readable (maybe you just need to perpetuate an array of strings
between runnings of your script, or something like that). If that is the
case, you can just marshall the data.
http://ruby-doc.org/core/classes/Marshal.htmlProbably the easiest solution,
and really fast, but it means your data is Ruby.
For dealing with databases, ActiveRecord, DataMapper, and Sequel should be
able to help you out.
ActiveRecord is extremely mature as it's the de facto Rails M in its MVC,
but it requires a little bit of infrastructure to get going outside of
Rails. If you want to use it,
http://guides.rubyonrails.org/is, IMO, the
best resource. There are also lots of Railscasts that deal with it (note
that AR3 just released, so the interface is a little different).
DataMapper is another nice project, I like it because you can do it all in
one file without migrations (easy to get up and going) you literally define
your schema in your code. It has some other nice features such as
guaranteeing that there will only ever be one instance of your DB rows in
memory at a time (you can find yourself in some wonky situations with AR,
where it has cached results, or you load the same data twice, and the oneis
unaware of the other). It also has a cool solution to the n+1 problem, where
it will preload data as soon as it recognizes you're going to query for it
in a loop. Unfortunately, it's nowhere near as mature as ActiveRecord. I
finally ended up switching my last project off of DataMapper and onto
ActiveRecord after too many headaches dealing with polymorphism, immature
libraries for it (I needed tagging), and dissatisfaction with the IRC
channel. If you don't need external libraries like that, you probably won't
experience such frustrations. If you're interested in it, it has some good
tutorials on its sitehttp://datamapper.org/docs/I also really liked the $9
Peepcode about Sinatra, which uses DataMapper to talk to its database.
http://peepcode.com/products/sinatra
I've not used Sequel, but I've seen its creator present at Ruby Midwest. He
_really_ knows his stuff. I've also only heard good things about the
project, such as actively developed, and easy to get support for. But my
understanding is that it's main strength is in connecting to "non
opinionated" (what AR would call "legacy") databases. If you have the
ability to design yours from the beginning, some of it's strengths might be
necessary.- Hide quoted text -
- Show quoted text -