D
Don Norcott
This code is conceptually what I want to do with the nokogiri code below
s1 = [1,2,3] ; s2 = [4,5,6]; s3 = [7,8,9]
str = [s1,s2,s3]
str.each do |itm|
puts "********"
puts " #{itm[2]}" Select middle item from each s1 , s2 ,s3
puts "*********"
end
Results as expected
********
3
*********
********
6
*********
********
9
*********
I have an html page with multiple <table>...</table> elements
(equivalent to str above) and want to process each table (equivalent to
s1, s2, s3) and extract one item <td[ class="itemNumbr ...> from the
table (equivalent to extracting the middle element in any of s1 s2 s3).
I initially thought this was straight forward - but I am missing
something very fundamental when I move the concept to Nokogiri objects
---------------------- NOKOGIRI CODE ----------------------
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open("c:/RUBY_OUT.TXT")); # file containing web
page
doc.xpath("//table[@class='result']").each do |node| # select a table
puts "*************"
puts node.to_html # as expected
puts node.xpath("//td[@class='itemNumbr']") # 15 per each
puts "*************"
end
---------------------- NOKOGIRI CODE ----------------------
The output below dispays the table HTML as expected - but not itemnumbrs
***********
<table ..................../table> for item 1
<td class ="itemNumbr.....<b1> 1.</b>...../td>
<td class ="itemNumbr.....<b1> 2.</b>...../td>
......
<td class ="itemNumbr.....<b1> 15.</b>...../td>
**********
**********
<table ..................../table> for item 2
<td class ="itemNumbr.....<b1> 1.</b>...../td>
<td class ="itemNumbr.....<b1> 2.</b>...../td>
......
<td class ="itemNumbr.....<b1> 15.</b>...../td>
**********
**********
<table ..................../table> for item 3
<td class ="itemNumbr.....<b1> 1.</b>...../td>
.......
The tables are outputted as expected Tables with itemnumbr 1 to 15
sequentially.
The node.xpath("//td[@class='itemNumbr']") acts as if node contains all
15 tables but the output indicates otherwise. I think node should
always contain HTML for a single table only, but I appear to be wrong.
Also if i put a subscript on the first xpath
doc.xpath("//table[@class='result'][5]").each do |node|
to ensure only one table is found, still get itemnumbrs for all 15 table
elements
WHAT AM I MISSING HERE
s1 = [1,2,3] ; s2 = [4,5,6]; s3 = [7,8,9]
str = [s1,s2,s3]
str.each do |itm|
puts "********"
puts " #{itm[2]}" Select middle item from each s1 , s2 ,s3
puts "*********"
end
Results as expected
********
3
*********
********
6
*********
********
9
*********
I have an html page with multiple <table>...</table> elements
(equivalent to str above) and want to process each table (equivalent to
s1, s2, s3) and extract one item <td[ class="itemNumbr ...> from the
table (equivalent to extracting the middle element in any of s1 s2 s3).
I initially thought this was straight forward - but I am missing
something very fundamental when I move the concept to Nokogiri objects
---------------------- NOKOGIRI CODE ----------------------
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open("c:/RUBY_OUT.TXT")); # file containing web
page
doc.xpath("//table[@class='result']").each do |node| # select a table
puts "*************"
puts node.to_html # as expected
puts node.xpath("//td[@class='itemNumbr']") # 15 per each
puts "*************"
end
---------------------- NOKOGIRI CODE ----------------------
The output below dispays the table HTML as expected - but not itemnumbrs
***********
<table ..................../table> for item 1
<td class ="itemNumbr.....<b1> 1.</b>...../td>
<td class ="itemNumbr.....<b1> 2.</b>...../td>
......
<td class ="itemNumbr.....<b1> 15.</b>...../td>
**********
**********
<table ..................../table> for item 2
<td class ="itemNumbr.....<b1> 1.</b>...../td>
<td class ="itemNumbr.....<b1> 2.</b>...../td>
......
<td class ="itemNumbr.....<b1> 15.</b>...../td>
**********
**********
<table ..................../table> for item 3
<td class ="itemNumbr.....<b1> 1.</b>...../td>
.......
The tables are outputted as expected Tables with itemnumbr 1 to 15
sequentially.
The node.xpath("//td[@class='itemNumbr']") acts as if node contains all
15 tables but the output indicates otherwise. I think node should
always contain HTML for a single table only, but I appear to be wrong.
Also if i put a subscript on the first xpath
doc.xpath("//table[@class='result'][5]").each do |node|
to ensure only one table is found, still get itemnumbrs for all 15 table
elements
WHAT AM I MISSING HERE