Convert \uXXXX to character

B

born in USSR

I have string: '\u041f\u0440\u0438\u0432\u0435\u0442!' and i need to
convert it to string such as 'привет!'.
I can convert string to '041f 0440 0438 0432 0435 0442', then convert to
decimal and at the end convert each code to character with function:
str.scan(/[0-9]+/).each {|x| result_str << x.to_i}

but i don't think that it is the most rational way.
 
J

Justin Collins

I have string: '\u041f\u0440\u0438\u0432\u0435\u0442!' and i need to
convert it to string such as 'привет!'.
I can convert string to '041f 0440 0438 0432 0435 0442', then convert to
decimal and at the end convert each code to character with function:

str.scan(/[0-9]+/).each {|x| result_str<< x.to_i}
but i don't think that it is the most rational way.

irb(main):001:0> RUBY_VERSION
=> "1.9.1"
irb(main):002:0> puts '\u041f\u0440\u0438\u0432\u0435\u0442!'
\u041f\u0440\u0438\u0432\u0435\u0442!
=> nil
irb(main):003:0> puts "\u041f\u0440\u0438\u0432\u0435\u0442!"
Привет!
=> nil

Note the difference in single quotes versus double quotes.

-Justin
 
G

Gary Wright

I have string: '\u041f\u0440\u0438\u0432\u0435\u0442!' and i need to
convert it to string such as '=D0=BF=D1=80=D0=B8=D0=B2=D0=B5=D1=82!'.
I can convert string to '041f 0440 0438 0432 0435 0442', then convert = to
decimal and at the end convert each code to character with function:

If I understand you correctly you can leverage Ruby's parser to
interpret your string literal:

irb> x =3D '\u041f\u0440\u0438\u0432\u0435\u0442!'
=3D> "\\u041f\\u0440\\u0438\\u0432\\u0435\\u0442!"
irb> eval("\"#{x}\"")
=3D> "=D0=9F=D1=80=D0=B8=D0=B2=D0=B5=D1=82!"

Be careful though with eval, make sure your string to be evaluated =
doesn't contain any untrusted code.

Gary Wright=
 
M

Markus Schirp

I think the JSON parser is able to decode this unicode escapes
correctly!

The JSON parser will not decode an pure string to you have to wrap the
string into array syntax, and extract after parsing:

mbj@mbj ~ $ irb
irb(main):001:0> require 'json'
=> true
irb(main):002:0> x = '\u041f\u0440\u0438\u0432\u0435\u0442!'
=> "\\u041f\\u0440\\u0438\\u0432\\u0435\\u0442!"
irb(main):003:0> JSON.parse('["'+x+'"]')[0]
=> "Привет!"
irb(main):004:0>

IMHO better than eval ;)
 
B

Benoit Daloze

IMHO better than eval ;)

str = '\u041f\u0440\u0438\u0432\u0435\u0442!'
p str.gsub(/\\u(\h{4})/) {
$1.to_i(16).chr('UTF-8')
}

What do you say of this?
Well, I was searching something in the line of String#unpack, like

p str.gsub(/\\u(\h{4})/) {
[$1.to_i(16)].pack('U')
}

but as we are scanning one by one, it is not interesting and need an
extra array like in JSON (but it is 1.8 compatible).

B.D.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top