J
Jesse Crockett
Hello, I need some help using hpricot. I'm trying simply to print out
the each link within "ul.class-name" for a CSV file.
ref: http://code.whytheluckystiff.net/hpricot/wiki/HpricotCssSearch
sample source for search "butter oil":
<ul class='mw-search-results'>
<li><a href="/wiki/Coconut_oil" title="Coconut oil">Coconut <span
class='searchmatch'>oil</span></a> ...
<li><a href="/wiki/Cocoa_butter" title="Cocoa butter">Cocoa <span
class='searchmatch'>butter</span></a></li>
</ul>
ruby:
search = "corn+flakes
bananas
milk+lowfat
blueberries+raw"
search = search.split("\n")
a = 0; until a == 3
query = search[a]
doc =
Hpricot(URI.parse("http://en.wikipedia.org/wiki/Special:Search?search=#{query}&fulltext=Search").read)
# grab report list to build from (I need an array of inner html per
link)
doc = (doc/"ul.mw-search-results a")
line = "#{a + 1}|#{query}|"
begin
#
line << ??? # fails for all attempts, please help here
#
rescue
printf "%i|%s\n", a + 1, "## Exception Caught ##"
end
sleep 8 # + rand(20) # to avoid
a += 1
end
the each link within "ul.class-name" for a CSV file.
ref: http://code.whytheluckystiff.net/hpricot/wiki/HpricotCssSearch
sample source for search "butter oil":
<ul class='mw-search-results'>
<li><a href="/wiki/Coconut_oil" title="Coconut oil">Coconut <span
class='searchmatch'>oil</span></a> ...
<li><a href="/wiki/Cocoa_butter" title="Cocoa butter">Cocoa <span
class='searchmatch'>butter</span></a></li>
</ul>
ruby:
search = "corn+flakes
bananas
milk+lowfat
blueberries+raw"
search = search.split("\n")
a = 0; until a == 3
query = search[a]
doc =
Hpricot(URI.parse("http://en.wikipedia.org/wiki/Special:Search?search=#{query}&fulltext=Search").read)
# grab report list to build from (I need an array of inner html per
link)
doc = (doc/"ul.mw-search-results a")
line = "#{a + 1}|#{query}|"
begin
#
line << ??? # fails for all attempts, please help here
#
rescue
printf "%i|%s\n", a + 1, "## Exception Caught ##"
end
sleep 8 # + rand(20) # to avoid
a += 1
end