Get title from URL?

C

Cisco Ri

Anybody have a code snippet that extracts the title from the <title> tag
from a given URL?
 
C

Cisco Ri

Heesob said:
require 'rubygems'
require 'mechanize'
title = WWW::Mechanize.new.get('http://google.com').title
=> "Google"


Regards,
Park Heesob

I used this method for a while, and it was fine for most sites.
However, with wikipedia.org it errored out with a 403 Forbidden error.
The Hpricot/open-uri method works for most sites, including
wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
errors out with a 500 Internal Server error.

I haven't tried out the open-uri only method yet.

Thanks for the help everyone.
 
H

Heesob Park

2009/4/28 Cisco Ri said:
I used this method for a while, and it was fine for most sites.
However, with wikipedia.org it errored out with a 403 Forbidden error.
The Hpricot/open-uri method works for most sites, including
wikipedia.org, but for thesixtyone.com (Javascript intensive site) it
errors out with a 500 Internal Server error.
You can work around like this:

require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
title = agent.get('http://wikipedia.org').title


Regards,
Park Heesob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,175
Messages
2,570,942
Members
47,489
Latest member
BrigidaD91

Latest Threads

Top