Ruby1.9: Encoding problems (how to use #force_encoding ?)

  • Thread starter Iñaki Baz Castillo
  • Start date
I

Iñaki Baz Castillo

Hi, I'm using geo_location Ruby gem which returns to me a hash with the giv=
en=20
IP geolocation.

I use Ruby1.9 and UTF-8 works fine, but in this case, when the "city" has=20
"strange" symbols the the gem gives the string encoded in ASCII-8BIT.

=46or example:

Alarc'n (theorically it should be "Alarc=C3=B3n")

I need to send this string to a server which mandates UTF-8 usage so sendin=
g=20
it as it's fails.

I've tryed to convert the encoding but received an error:

result.encode "UTF-8"
=3D> `encode': "\xF3" from ASCII-8BIT to UTF-8
(Encoding::UndefinedConversionError)

I've also tryed with force_encoding:
result.force_encoding "UTF-8"

and then, the "result" string is converted to UTF-8 (I've checked=20
result.encoding) but it's also not valid for the server and when printing i=
t I=20
see the same as before.


I need all of this just for a simple demo, so it owuld be valid for me just=
to=20
delete the non valid UTF-8 chars from the result string, but I don't know=20
how to do it.

Any help please?
=2D-=20
I=C3=B1aki Baz Castillo <[email protected]>
 
B

Brian Candler

Iñaki Baz Castillo said:
Hi, I'm using geo_location Ruby gem which returns to me a hash with the
given
IP geolocation.

Lots of gems are not ruby-1.9 compatible. You should probably report
problems to the author, ideally with a patch which fixes it, and a test
case which reproduces it.
I use Ruby1.9 and UTF-8 works fine, but in this case, when the "city"
has
"strange" symbols the the gem gives the string encoded in ASCII-8BIT.

All data read from a socket is tagged as ASCII-8BIT by default. That's
probably what's happening in the library you're using.
I need to send this string to a server which mandates UTF-8 usage so
sending
it as it's fails.

That doesn't make much sense. A string, when it hits a socket, is just a
stream of bytes. So you should be sending the same stream of bytes as
you receive.
I've tryed to convert the encoding but received an error:

result.encode "UTF-8"
=> `encode': "\xF3" from ASCII-8BIT to UTF-8
(Encoding::UndefinedConversionError)

That's correct. Transcoding tries to *transcode* (replace characters one
at a time), and these high characters in ASCII-8BIT have no Unicode
equivalents.
I've also tryed with force_encoding:
result.force_encoding "UTF-8"

and then, the "result" string is converted to UTF-8 (I've checked
result.encoding)

It's not converted, it's just tagged as being a string of UTF-8
characters, which it sounds like it is.
but it's also not valid for the server

Again, doesn't mean much without seeing the code which is trying to
submit this to the server.
I need all of this just for a simple demo, so it owuld be valid for me
just to
delete the non valid UTF-8 chars from the result string, but I don't
know
how to do it.

str.force_encoding("ASCII-8BIT") # if not already
str.gsub!(/[^\x20-\x7e]/,'')
 
I

Iñaki Baz Castillo

2009/9/2 Axel Etzold said:
Dear I=C3=B1aki,

maybe you can use CGI.escape for your encoding problems?

I fetched the Spanish wikipedia page for Alarc=C3=B3n like so:

# encoding: utf-8

require "cgi"
require 'open-uri'

search_what=3DCGI.escape("Alarc=C3=B3n")
page=3D"http://es.wikipedia.org/w/index.php?title=3DEspecial:Buscar&sea= rch=3D#{search_what}&fulltext=3DBuscar"
open(page){ |f| print f.read }

Thanks but the problem is that the geo_location Ruby gem returns a
wrong string (encoded in ASCII-8BIT) since it contains invalid chars
for ASCII-8BIT encoding, so Ruby fails when trying to convert it to
other encoding :(

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
J

James Edward Gray II

page=3D"http://es.wikipedia.org/w/index.php?title=3DEspecial:Buscar&sear=
ch=3D#{search_what}&fulltext=3DBuscar=20

Thanks but the problem is that the geo_location Ruby gem returns a
wrong string (encoded in ASCII-8BIT) since it contains invalid chars
for ASCII-8BIT encoding, so Ruby fails when trying to convert it to
other encoding :(

There are no invalid characters in ASCII-8BIT. It's a catch all =20
Encoding. So that's definitely not the problem=85 ;)

James Edward Gray II=
 
I

Iñaki Baz Castillo

El Mi=E9rcoles, 2 de Septiembre de 2009, James Edward Gray II escribi=F3:
=20
There are no invalid characters in ASCII-8BIT. It's a catch all
Encoding. So that's definitely not the problem=85 ;)

Ok, that's a good point.
I'll try it.

=2D-=20
I=F1aki Baz Castillo <[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top