V
Vikash Kumar
I am running a test case, in which I have to first login to a web page
then I have to go to some particular page in the same web site, then
extract some data from that page. The data is in the table.
Such as the script first call http://localhost/login.asp, then we enter
user name and password, then we click on login button. By this we enter
to the web page, then we go to http://localhost/achievements.asp, from
this page we want to extract the data residing in html table. What
should be the approach to do this.
I can use the below code to extract the data if I have not to login to
the web site.
require 'net/http'
# read the page data
http = Net::HTTP.new('kvcrpf.org, 80)
resp, page = http.get('/achievements.htm', nil )
# BEGIN processing HTML
def parse_html(data,tag)
return data.scan(%r{<#{tag}\s*.*?>(.*?)</#{tag}>}im).flatten
end
output = []
table_data = parse_html(page,"table")
table_data.each do |table|
out_row = []
row_data = parse_html(table,"tr")
row_data.each do |row|
cell_data = parse_html(row,"td")
cell_data.each do |cell|
cell.gsub!(%r{<.*?>},"")
end
out_row << cell_data
end
output << out_row
end
# END processing HTML
# examine the result
def parse_nested_array(array,tab = 0)
n = 0
array.each do |item|
if(item.size > 0)
puts "#{"\t" * tab}[#{n}] {"
if(item.class == Array)
parse_nested_array(item,tab+1)
else
puts "#{"\t" * (tab+1)}#{item}"
end
puts "#{"\t" * tab}}"
end
n += 1
end
end
parse_nested_array(output[2][4])
aa, ab, ac, ad = output[2][4]
puts"hello"
puts aa + "\t" + ab + "\t" + ac + "\t" + ad
then I have to go to some particular page in the same web site, then
extract some data from that page. The data is in the table.
Such as the script first call http://localhost/login.asp, then we enter
user name and password, then we click on login button. By this we enter
to the web page, then we go to http://localhost/achievements.asp, from
this page we want to extract the data residing in html table. What
should be the approach to do this.
I can use the below code to extract the data if I have not to login to
the web site.
require 'net/http'
# read the page data
http = Net::HTTP.new('kvcrpf.org, 80)
resp, page = http.get('/achievements.htm', nil )
# BEGIN processing HTML
def parse_html(data,tag)
return data.scan(%r{<#{tag}\s*.*?>(.*?)</#{tag}>}im).flatten
end
output = []
table_data = parse_html(page,"table")
table_data.each do |table|
out_row = []
row_data = parse_html(table,"tr")
row_data.each do |row|
cell_data = parse_html(row,"td")
cell_data.each do |cell|
cell.gsub!(%r{<.*?>},"")
end
out_row << cell_data
end
output << out_row
end
# END processing HTML
# examine the result
def parse_nested_array(array,tab = 0)
n = 0
array.each do |item|
if(item.size > 0)
puts "#{"\t" * tab}[#{n}] {"
if(item.class == Array)
parse_nested_array(item,tab+1)
else
puts "#{"\t" * (tab+1)}#{item}"
end
puts "#{"\t" * tab}}"
end
n += 1
end
end
parse_nested_array(output[2][4])
aa, ab, ac, ad = output[2][4]
puts"hello"
puts aa + "\t" + ab + "\t" + ac + "\t" + ad