Ruby screen scraping

Thread starter Chris Gallagher
Start date Nov 19, 2006

Peter Szinek

Nov 20, 2006

Chris said:
Turns out I actually ended up abandonning HTree and the rest. I used
net/http in order to fetch the page and then took the table of the page
that I was interested in examining and converted that using rexml. I
have now been able to grab the values that I wanted using XPath

If you are keen on XPaths, why not:

table = XPath.first(doc, "//table[@class='index' && @width='100%']")

then use 'table' instead of 'converted_data'...

or even

module_name = XPath.first(doc, "//table[@class='index' &&
@width='100%']//td[@class='data']/a/]")

etc.

(Untested since I don't have your doc, but it should +- work)

Cheers,
Peter

__
http://www.rubyrailways.com

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Centering picture element for larger screen sizes	2	Sep 21, 2023
Screen Scraping Advice	9	Sep 17, 2007
screen scraping using htmltools and rexml	0	Jan 21, 2006
[mini-ANN] Web scraping article, episode 1	3	Feb 5, 2007
Web scraping from Java	2	May 28, 2009
Screen scraping question	2	Oct 12, 2005
Help with Screen Scraper!	0	Nov 19, 2008
website screen scraping with Mechanize or Rubyful Soup	9	Sep 12, 2005

Facebook Twitter Reddit Pinterest Tumblr WhatsApp Email Link

Members online

No members online now.

Total: 755 (members: 1, guests: 754)
Robots: 75

Forum statistics

Threads: 474,219

Messages: 2,571,117

Members: 47,729

Latest member: taulaju99

Latest Threads

Python - Hidden Text / Html Mail
- Started by python1337
- Today at 2:22 PM
Best MSG to PDF Converter for Mac.
- Started by LyliAnderson
- Today at 10:11 AM
Text box simply do not stand out against the wall paper.
- Started by Farreach2565
- Today at 10:00 AM
Now that's How Save OST Files to PST
- Started by treekmostly22
- Today at 5:46 AM
Self-RAG creation
- Started by JuneKid
- Yesterday at 7:18 PM
Game engine
- Started by gancflex
- Yesterday at 11:12 AM
Is There an Easy Way to Convert VCF to MSG Files?
- Started by brooksmith
- Yesterday at 6:32 AM
The end of maintenance
- Started by WhiteCube
- Wednesday at 10:26 PM
Online retro BASIC programming
- Started by david4523
- Wednesday at 12:23 PM
Best IMAP Backup Tool.
- Started by LyliAnderson
- Wednesday at 9:33 AM

Top