Converting escaped html to utf-8

C

Chris Worrall

Hi everyone,

I've looked around online for a solution, but I'm pretty new to ruby
and programming in general, so I feel like I'm hitting a wall here.

I'm retrieving data from Hpricot that I'd like to store in UTF-8, but
I can't find a function to convert hex NCRs like:

á

Surely somebody's had to do this in the past that could point me in
the right direction? Thanks!
 
C

Chris Worrall

Well, after some more googling, I found a solution. If anyone was curious --

require 'cgi'
require 'iconv'

n = "á"
n = CGI.unescapeHTML(n)
n = Iconv.conv("UTF-8", "ISO-8859-1", n)
 
D

Daniel DeLorme

Chris said:
Well, after some more googling, I found a solution. If anyone was
curious --

require 'cgi'
require 'iconv'

n = "á"
n = CGI.unescapeHTML(n)
n = Iconv.conv("UTF-8", "ISO-8859-1", n)

I'm surprised no one mentioned it but you could use

require "rubygems"
require "htmlentities"
puts HTMLEntities.decode_entities("Ā Ĉ Ď")
=> Ā Ĉ Ď

Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top