Short question about encoding.

G

Gabriel Lichard

Hello everybody at the Ruby Forums!

After getting more into Ruby I stumbled across a little problem: how
would I go about making a string with a "square root" sign (=E2=88=9A) in=
side
it, so that it gets encoded in UTF-8 and so that I can later save it in
a file?

Also, I read that Ruby does have encoding-related issues, so, if that's
the case, I can probably live without the "square root" sign.

puts "=E2=88=9A".encode("UTF-8") # converts it into a "v"
puts "=E2=88=9A" # -//-
puts "=E2=88=9A".force_encoding("UTF-8") # -//-

Any ideas, please?

-- =

Posted via http://www.ruby-forum.com/.=
 
A

Ammar Ali

Hello everybody at the Ruby Forums!

After getting more into Ruby I stumbled across a little problem: how
would I go about making a string with a "square root" sign (=E2=88=9A) in= side
it, so that it gets encoded in UTF-8 and so that I can later save it in
a file?

Also, I read that Ruby does have encoding-related issues, so, if that's
the case, I can probably live without the "square root" sign.

puts "=E2=88=9A".encode("UTF-8") # converts it into a "v"
puts "=E2=88=9A" =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = # -//-
puts "=E2=88=9A".force_encoding("UTF-8") # -//-

Any ideas, please?

One possibility is using the hex codes for that code point, like:

"\xE2\x88\x9A"

There is also a unicode escape, \u. For more information, take a look at:

http://ruby.runpaint.org/strings#escapes-summary

HTH,
Ammar
 
R

Rob Biedenharn

One possibility is using the hex codes for that code point, like:

"\xE2\x88\x9A"

There is also a unicode escape, \u. For more information, take a =20
look at:

http://ruby.runpaint.org/strings#escapes-summary

HTH,
Ammar

It seems to be fine with Ruby 1.9.2-p0.

irb> sqrt =3D "=E2=88=9A"
=3D> "=E2=88=9A"
irb> sqrt.encoding
=3D> #<Encoding:UTF-8>
irb> sqrt.bytes
=3D> #<Enumerator: "=E2=88=9A":bytes>
irb> sqrt.bytes.to_a
=3D> [226, 136, 154]
irb> sqrt.chars.to_a
=3D> ["=E2=88=9A"]
irb> puts sqrt
=E2=88=9A
=3D> nil
irb> puts sqrt.encode("US-ASCII")
Encoding::UndefinedConversionError: U+221A from UTF-8 to US-ASCII
from (irb):10:in `encode'
from (irb):10
from /Users/rab/.rvm/rubies/ruby-1.9.2-p0/bin/irb:17:in `<main>'
irb> puts sqrt.force_encoding("US-ASCII")
=E2=88=9A
=3D> nil
irb> sqrt.force_encoding("US-ASCII").chars.to_a
=3D> ["\xE2", "\x88", "\x9A"]

Are you perhaps sending the output to a device that does not =20
understand UTF-8?

-Rob

Rob Biedenharn =09
(e-mail address removed) http://AgileConsultingLLC.com/
(e-mail address removed) http://GaslightSoftware.com/
 
G

Gabriel Lichard

Are you perhaps sending the output to a device that does not understand=
UTF-8?

I'm guessing that's the issue.

Although, I'm on Windows 7 and in the Command Prompt I can copy/paste
the "=E2=88=9A" character and it appears fine in the Courier New font and=

everything, when I run Ruby (Ruby 1.9.2p0) or irb I can't copy/paste
that character anymore. When I save this in a file and run it:

s =3D "=E2=88=9A"
p s
puts s

it outputs:

"\u221A"
=CE=93=C3=AA=C3=9C

But then again when I just do:

File.open("test.txt", "w"){|x|x << "=E2=88=9A"}

and run it, it makes the test.txt file and saves it without any problems
and with the actual square root character in the file.

Any idea what I'm doing wrong or why it won't appear in the console?

-- =

Posted via http://www.ruby-forum.com/.=
 
G

Gabriel Lichard

I tried the chcp thing too :/ :

system "chcp 65001"
s =3D "=E2=88=9A"
p s
puts s

outputs:

Active code page: 65001
"=CE=93=C3=AA=C3=9C"
=CE=93=C3=AA=C3=9C

I'm still doing something wrong :/ Any more ideas?

-- =

Posted via http://www.ruby-forum.com/.=
 
P

Phillip Gawlowski

I tried the chcp thing too :/ :

system "chcp 65001"
s =3D "=E2=88=9A"
p s
puts s

outputs:

Active code page: 65001
"=CE=93=C3=AA=C3=9C"
=CE=93=C3=AA=C3=9C

I'm still doing something wrong :/ Any more ideas?

If you have Vista or Win 7 installed, try your script in PowerShell.
Otherwise, install PowerShell, and try your script.

PowerShell is a .NET-based (almost) drop-in replacement for cmd.exe,
and, AFAIK, fully Unicode-aware.

--=20
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
 
G

Gabriel Lichard

Yup I'm on Windows 7.

About PowerShell: I tried the classic PowerShell.exe which printed out
the same weird characters so I tried "type .\test.rb" and that gave an
error so I Googled about PowerShell and Unicode and read that the
PowerShell ISE supports Unicode SO I tried that one: "type .\test.rb"
works fine and displays the character well, but "ruby .\test.rb"
(1.9.2p0) prints the weird characters again.

Any more suggestions, please?

Powershell ISE log:

_________________________________________________________________
PS C:\Users\Ye Olde Poopsmith\Desktop> ruby -v
ruby 1.9.2p0 (2010-08-18) [i386-mingw32]

_________________________________________________________________
PS C:\Users\Ye Olde Poopsmith\Desktop> ruby .\test.rb
"\u221A"
#<Encoding:UTF-8>
=CE=93=C3=AA=C3=9C

_________________________________________________________________
PS C:\Users\Ye Olde Poopsmith\Desktop> type .\test.rb
#system "chcp 65001"
s =3D "=E2=88=9A"
p s
p s.encoding
puts s.to_s

-- =

Posted via http://www.ruby-forum.com/.=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,981
Messages
2,570,187
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top