Hi all
I am new to Ruby. Found it as an intersting language. Can anyone help
me with a simple code in Ruby to check for all the dead and live links
in a website ?
Thanks
Rati
A wonderful language, a sort of rough request here is some (very)
simple code to get you started, using some great ruby libraries,
including rubyful soup (
http://www.crummy.com/software/RubyfulSoup/),
which is available as a gem. As written it only checks one page, you
would need to make it "walk" the links to recursively check a whole
site.
Hope it helps
pth
require 'open-uri'
require 'uri'
require 'rubyful_soup'
url =3D '
http://www.yahoo.com/'
uri =3D URI.parse(url)
html =3D open(uri).read
soup =3D BeautifulSoup.new(html)
#Search the soup
links =3D soup.find_all('a').map { |a| a['href'] }
# Remove javascript
links.delete_if { |href| href =3D~ /javascript/ }
links.each do |l|
# resolve relative paths (there is probably a better way)
link =3D URI.parse(l)
link.scheme =3D 'http' unless link.scheme
link.host =3D uri.host unless link.host
link.path =3D uri.path + link.path unless link.path[0] =3D=3D ?/
link =3D URI.parse(link.to_s)
# check the link
begin
open(link).read
# if we made it here the link is probably good
rescue Exception =3D> e
puts "#{link.to_s}: #{e.to_s}"
end
end