RTranslate Gem (Open-URI) and Encoding

T

The Chromag

I'm using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word "where" returns "dónde" instead of "dónde"

Any idea why this is happening and what I can do to fix this?

Thanks!
 
R

Richard Conroy

I'm using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.

Translating the word "where" returns "d=C3=B3nde" instead of "d=F3nde"

Any idea why this is happening and what I can do to fix this?
You need to specify encoding in your ruby script. Ruby (1.8 at least, I am
not certain of 1.9)
will use your system encoding for strings by default.

Set this constant in your script to make Ruby process strings as UTF-8,
independently of
your machine
$KCODE =3D 'u'

There is more detail here:
http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_libra=
ry

Note that Ruby could be processing google translate correctly (i.e. you are
doing everything above),
but if you are outputting the result to the console/system out (via puts)
your machine may still
process the UTF-8 text according to the host system. This for instance is a
particularly annoying
problem on windows systems IME. Output to a file instead if you are seeing
this problem.

regards,
Richard.
--=20
http://richardconroy.blogspot.com
 
T

The Chromag

Richard said:
You need to specify encoding in your ruby script. Ruby (1.8 at least, I
am
not certain of 1.9)
will use your system encoding for strings by default.

Set this constant in your script to make Ruby process strings as UTF-8,
independently of
your machine
$KCODE = 'u'

There is more detail here:
http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library

Note that Ruby could be processing google translate correctly (i.e. you
are
doing everything above),
but if you are outputting the result to the console/system out (via
puts)
your machine may still
process the UTF-8 text according to the host system. This for instance
is a
particularly annoying
problem on windows systems IME. Output to a file instead if you are
seeing
this problem.

I had the $KCODE variable set. It didn't seem to do anything in this
case. I outputted the translated text to a file to see if it was a
display issue with the console and the text was still incorrect in the
file.

Any other ideas?

Thanks.
 
S

Seebs

I'm using the rtranslate gem (sishen-rtranslate) to handle translating
some text. It uses open-uri to scrape Google Translate. If I attempt
to translate text from English to Spanish it outputs some garbage
characters.
Translating the word "where" returns "dónde" instead of "dónde"

The amusing part is that the first one looks fine to me.

I suspect this means that you're getting UTF8 when you expect something
in some particular encoding, so you should be specifying encodings...

-s
 
J

Jonathan Nielsen

Translating the word "where" returns "d=C3=83=C2=B3nde" instead of "d=C3=
=B3nde"
The amusing part is that the first one looks fine to me.

Indeed. The first one is properly encoded in UTF-8, the second in ISO-8859=
-1.

-Jonathan Nielsen
 
T

The Chromag

Seebs said:
The amusing part is that the first one looks fine to me.

I suspect this means that you're getting UTF8 when you expect something
in some particular encoding, so you should be specifying encodings...

Where would I specify the encoding to fix this problem? And yes, I just
noticed that in a reply it DOES look correct, but when I posted it, it
was not. I'm guessing somewhere (other than $KCODE) I need to set it as
UTF-8.

Thanks.
 
J

Jonathan Nielsen

Where would I specify the encoding to fix this problem? =C2=A0And yes, I =
just
noticed that in a reply it DOES look correct, but when I posted it, it
was not. =C2=A0I'm guessing somewhere (other than $KCODE) I need to set i= t as
UTF-8.

Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding("UTF-8") or
string.force_encoding("ISO-8859-1")... as needed. In ruby 1.8, I
think it just works with the bits you provide it and it's your
terminal that determines what actually gets displayed.

-Jonathan Nielsen
 
T

The Chromag

Jonathan said:
Are you using ruby 1.8 or ruby 1.9? In 1.9, you should do
string.force_encoding("UTF-8") or
string.force_encoding("ISO-8859-1")... as needed. In ruby 1.8, I
think it just works with the bits you provide it and it's your
terminal that determines what actually gets displayed.

I'm using 1.8.7. I don't think it's the terminal but I'm not entirely
sure. I'm outputting the translation to a text file, but technically
I'm viewing it in a terminal app (Putty) so it may be screwing up there.

Thanks.
 
E

Eric Christopherson

I'm using 1.8.7. =A0I don't think it's the terminal but I'm not entirely
sure. =A0I'm outputting the translation to a text file, but technically
I'm viewing it in a terminal app (Putty) so it may be screwing up there.

Thanks.

Make sure PuTTY is set for UTF-8.
 
T

The Chromag

Eric said:
Make sure PuTTY is set for UTF-8.

Aha! Well that fixed the problem of being able to see the correct
output in the terminal. It should greatly help the debugging process
now. I'm then taking the encoded string and transferring it with XML via
a socket connection. I'll have to look into the transfer to see if it's
breaking there.

Thanks for the help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,417
Latest member
DarrenGaun

Latest Threads

Top