How to get data from html table

Vikash Kumar · Nov 27, 2006

I want to store the values of a table in different variables, I have the
following table structure:

<table width="579">
<tr class="even">
<td class width="65"> Case5-04</td>
<td class width="130">10/11/2006 23:24:33</td>
<td class width="61">Case5-04</td>
<td class width="32">1005</td>
<td class width="59">Sell</td>
<td class width="36">1,000</td>
<td class width="34">ARP</td>
<td class width="52">$36.90</td>
</tr>
<tr class="odd">
<td class width="65"> Case5-03</td>
<td class width="130">10/11/2006 23:20:07</td>
<td class width="61">Case5-03</a></td>
<td class width="32">1005</td>
<td class width="59">Buy</td>
<td class width="36">1,500</td>
<td class width="34">ARP</td>
<td class width="52">$36.70</td>
</tr>
<tr class="even">
<td class width="65"> Case4-04</td>
<td class width="130">10/11/2006 05:28:54</td>
<td class width="61">Case4-04</a></td>
<td class width="32">1004</td>
<td class width="59">Sell</td>
<td class width="36">300</td>
<td class width="34">RIL</td>
<td class width="52">$490.00</td>
</tr>
<tr class="odd">
<td class width="65"> Case4-03</td>
<td class width="130">10/11/2006 05:21:32</td>
<td class width="61">Case4-03</a></td>
<td class width="32">1004</td>
<td class width="59">Buy</td>
<td class width="36">200</td>
<td class width="34">RIL</td>
<td class width="52">$489.90</td>
</tr>
</table>

I want to store the values in variables so that I can compare records.
Please help me out how to do this in ruby.

Peter Szinek · Nov 27, 2006

I want to store the values in variables so that I can compare records.

Please help me out how to do this in ruby.

One possible way:

Record = Struct.new("Record", :name, :date, :name_again, :some_num,
:buy_link, :some_num2, :letters,

rice)
records = []

doc = Hpricot(doc)
stuff = doc/"/table/tr/td"

elements = stuff.map { |elem| elem.inner_html }.each_slice(8) do |slice|
records << Record.new(*slice)
end

p records.sort_by {|record| record.price.slice(1..record.size) }

Note that since I did not know the semantics of the table cells,
sometimes the Struct Record has some weird fields in it, but you get the
idea.

Also I am not 100% sure if the sort_by should not be done on to_f-d
prices (probably not due to rounding problems, but I wonder if there can
be some weird string issues, too).

HTH,
Peter

__
http://www.rubyrailways.com

Park Heesob · Nov 27, 2006

Hi,

From: Vikash Kumar <[email protected]>
Reply-To: (e-mail address removed)
To: (e-mail address removed) (ruby-talk ML)
Subject: How to get data from html table
Date: Mon, 27 Nov 2006 20:20:54 +0900

I want to store the values of a table in different variables, I have the
following table structure:

<table width="579">
<tr class="even">
<td class width="65"> Case5-04</td>
<td class width="130">10/11/2006 23:24:33</td>
<td class width="61">Case5-04</td>
<td class width="32">1005</td>
<td class width="59">Sell</td>
<td class width="36">1,000</td>
<td class width="34">ARP</td>
<td class width="52">$36.90</td>
</tr>
<tr class="odd">
<td class width="65"> Case5-03</td>
<td class width="130">10/11/2006 23:20:07</td>
<td class width="61">Case5-03</a></td>
<td class width="32">1005</td>
<td class width="59">Buy</td>
<td class width="36">1,500</td>
<td class width="34">ARP</td>
<td class width="52">$36.70</td>
</tr>
<tr class="even">
<td class width="65"> Case4-04</td>
<td class width="130">10/11/2006 05:28:54</td>
<td class width="61">Case4-04</a></td>
<td class width="32">1004</td>
<td class width="59">Sell</td>
<td class width="36">300</td>
<td class width="34">RIL</td>
<td class width="52">$490.00</td>
</tr>
<tr class="odd">
<td class width="65"> Case4-03</td>
<td class width="130">10/11/2006 05:21:32</td>
<td class width="61">Case4-03</a></td>
<td class width="32">1004</td>
<td class width="59">Buy</td>
<td class width="36">200</td>
<td class width="34">RIL</td>
<td class width="52">$489.90</td>
</tr>
</table>

I want to store the values in variables so that I can compare records.
Please help me out how to do this in ruby.

Here is another way:

After saving the html table text to file 'w.xml',
You can deal the value like this:

require 'rexml/document'
include REXML
doc = Document.new File.new("w.xml")
doc.elements.each("*/tr/td") {|e|
puts e.texts
}

Regards,

Park Heesob

_________________________________________________________________
FREE pop-up blocking with the new MSN Toolbar - get it now!
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/

Peter Szinek · Nov 27, 2006

Hello,

Digression: when solving a problem like this, it is often much easier to
write a few lines of HTML than to try to use a high-powered library to
accomplish it.

I don't see why is it an advantage here. The first solution in this thread:

-------------------------------------------------------------------
Record = Struct.new("Record", :name, :date, :name_again, :some_num,
:buy_link, :some_num2, :letters,

rice)
records = []

cells = Hpricot(doc)/"/table/tr/td"

cells.map { |elem| elem.inner_html }.each_slice(8) do |slice|
records << Record.new(*slice)
end

p records.sort_by {|record| record.price.slice(1..record.size) }
------------------------------------------------------------------

is shorter, does not care about malformed HTML and even does the sorting
which I believe was the main intention of the OP. So why not use a
high-powered library?

Discalimer: that solution was actually mine but I am not referring to it
because of this, but rather because I think that parsing all the cells
with a one liner using a robust HTML parser is actually much better in
practice than to use a basic set of regexps and then patch the results
they yield with ad-hoc rules (missing close tags etc) looked up from 3
examples. I believe the above HPricot-powered solution will work with
100 records, too (if the other 97 does not get *really* messed up - but
in that case the regexps will fail miserably too) whereas the
we-do-not-need-any-high-powered-library approach may need another 25
patches due to the other errors in the 100-record HTML...

I do not argue that parsing the page with regexps and seeing what's
going on under the hood can provide a lot of experience, but I am really
sure that feeding a real life page to a HTML parser is safer than to use
the regexp approach.

Of course if this question is just a theoretical one, and there won't be
100 (or more than 3) records, just these 3, then forget about this mail.

Cheers,
Peter

__
http://www.rubyrailways.com

Vikash Kumar · Nov 28, 2006

#!/usr/bin/ruby -w

data = File.read(sourcefilename)

output = []

html_rows = data.scan(%r{<tr.*?>(.*?)</tr>}im).flatten

html_rows.each do |row|
# filter these undesired elements
row.gsub!(" ","")
row.gsub("</a>","")
cells = row.scan(%r{<td.*?>(.*?)</td>}im).flatten
output << cells
end

# done collecting, now display

output.each do |row|
line = row.join(",")
puts line
end

What will be right solution if some one wants to get the data from yahoo
site http://finance.yahoo.com/q?s=IBM and then displaying only some
values such as Prev Close, Last Trade. Lets suppose we go to the URL
through :

require 'watir'
include Watir
require 'hpricot'
include Hpricot
ie=Watir::IE.new
ie.goto("http://finance.yahoo.com/q?s=IBM")

Now, whats next. Also let suppose we want to get all the values of
table, we don't know the table structure then what what should be the
correct solution ?

How can I calculate the last payment of the year to be the sum of all previous payments for that year and subtracting it from Research Costs value?	7	Aug 22, 2023
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
How to create a JSON array with values from DOM(HTML TABLE) when I click a button using JQuery/Javascript?	0	May 1, 2023
How to add dropdown selected data to table using jquery	2	Jul 2, 2022
How to have two html audio players on one page?	0	May 3, 2022
When I send email as HTML, why do erroneous whitespaces getintroduced to the HTML source and a few <	2	Nov 8, 2013
Function call from button to automatic	7	Mar 17, 2023

How to get data from html table

Vikash Kumar

Peter Szinek

Park Heesob

Peter Szinek

Vikash Kumar

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads