Japanese / chinese characters

L

Luis G.

Is there a way to test is a string contains any japanese or chinese
character? Is that possible?

Thanks,

Luis
 
R

Richard Conroy

[Note: parts of this message were removed to make it a legal post.]

If you know the encoding of the input string, (preferably unicode) you can
test if the unicode signature of the character strings falls within the
range assigned to the Han character sets.

So yeah, it is possible, though not trivial. You might want to look for
libraries that achieve the same thing.
 
L

Luis G.

I found out this:

irb(main):003:0> p "=E8=A3=8F=E5=AD=97=E5=B9=95=E7=B5=84".unpack("U*")
[35023, 23383, 24149, 32068]

So, I can unpack it and check if is between the range you talked about, =

right? If so, now I just need to find the range for the chinese and =

japanese characters...

Isn't this an heavy operation? I have lots of sentences to test, with =

size not bigger that 512 characters.

-- =

Posted via http://www.ruby-forum.com/.=
 
R

Richard Conroy

[Note: parts of this message were removed to make it a legal post.]

Yeah, that is one way of doing it.

With respect to the speed issue, the range boundaries that define the han
characters (or any character range for that matter) have significance at the
bit level. You could use bit algorithms for speed (though it is possible
that in Ruby you would not achieve the desired speed increase that you might
get with C or Java.

You might also want to look into specifying unicode ranges in your regexes.
I remember that the Java regular expression library had shortcuts for
specifying localised characters (like Han characters). I dont think the Ruby
regex API has these shortcuts, but in the end it is just a unicode range.

I found out this:

irb(main):003:0> p "$BN";zKkAH(B".unpack("U*")
[35023, 23383, 24149, 32068]

So, I can unpack it and check if is between the range you talked about,
right? If so, now I just need to find the range for the chinese and
japanese characters...

Isn't this an heavy operation? I have lots of sentences to test, with
size not bigger that 512 characters.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,807
Latest member
ryef

Latest Threads

Top