same character show different code in two machine

R

Ryan Smith

one chinese character show different code in two different machine.

machine A: \243\244
machine B: \302\245

so I have to using different pattern for two machines, like this:
machine A: text.split("\243\244")
machine B: text.split("\302\245")

I know this is not the proper way, but could some one tell me:
what is the root course?
What different between machine A and B?
what is the proper way to handle this ?

thanks very much!

-ryan
 
R

Ryan Smith

Thanks, Walton,

need include something?

irb(main):006:0> "Hello".encoding
NoMethodError: undefined method `encoding' for "Hello":String
from (irb):6
 
B

Brian Candler

Ryan said:
one chinese character show different code in two different machine.

machine A: \243\244
machine B: \302\245

In hex those are: \xa3\xa4
\xc2\xa5

The first is not valid UTF-8. I suppose it might be UTF-16: U+A3A4 or
U+A4A3 depending on little or big-endian. Or it could be some older
proprietary Asian encoding.

The second of these could be UTF-8. If so it would be codepoint 165, the
'yen' symbol. Or it could be U+C2A5 in UTF-16.
 
M

Marnen Laibow-Koser

Ryan said:
Thanks, Walton,

need include something?

irb(main):006:0> "Hello".encoding
NoMethodError: undefined method `encoding' for "Hello":String
from (irb):6

No, I don't think that method exists in 1.8.

Best,
-- 
Marnen Laibow-Koser
http://www.marnen.org
(e-mail address removed)
 
B

Brian Candler

Ryan said:
The first is not valid UTF-8. I suppose it might be UTF-16: U+A3A4 or
U+A4A3 depending on little or big-endian. Or it could be some older
proprietary Asian encoding.

[Ryan] How to correct this (to UTF-8), it is a English XP Pro with PRC
as system locale.

Sorry, I have no idea. Are you sure that \xa3\xa4 correponds exactly to
that one character? Is the rest of the encoding variable length or fixed
length? (e.g. are all characters two bytes long, even a western letter
"A"?)

Questions about Microsoft operating systems and what encodings they use
really belong in a Microsoft users' forum, as it's not anything to do
with Ruby.
 
R

Ryan Smith

Brian said:
Ryan said:
The first is not valid UTF-8. I suppose it might be UTF-16: U+A3A4 or
U+A4A3 depending on little or big-endian. Or it could be some older
proprietary Asian encoding.

[Ryan] How to correct this (to UTF-8), it is a English XP Pro with PRC
as system locale.

Sorry, I have no idea. Are you sure that \xa3\xa4 correponds exactly to
that one character? Is the rest of the encoding variable length or fixed
length? (e.g. are all characters two bytes long, even a western letter
"A"?)

Questions about Microsoft operating systems and what encodings they use
really belong in a Microsoft users' forum, as it's not anything to do
with Ruby.


I have no idea either, but I will upgrade to ruby 1.9 to leverage
string.encoding feature. thank you.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top