J
John Joyce
Many characters of these two set of Chinese(in fact, including Chinese
Characters in Japanese and Korean...) are the same. Aren't they =20
encoded
to the same codes when they are identical?
Yes. There is lots of overlap. So there is not always a clean =20
separation line. But, the Japanese and Korean phonetic characters =20
will be in a range. You might never use all the kanji/hanzi chinese =20
characters, and a few are Japanese only (very few).
You must mean Unicode range.
http://www.khngai.com/chinese/charmap/tbluni.php?page=3D0
Yes that's exactly what he means.
.... hmmmmm.. if only I could find out what it does...
John Joyce wrote:
I took a look at it. It's the database of characters, sort of. It is =20
a big text file list. Not a proper gem at all actually. The same db =20
file can be downloaded from Unicode.org separately. It doesn't =20
contain the actual characters, just their codes and some comments and =20=
groupings.
=EF=BD=9B =EF=BC=9B =E2=80=98 =EF=BC=81 =EF=BC=A0 =EF=BC=83 =EF=BC=84 =Interesting subject indeed it is.
Today I tried this(!!!!under RoR console!!!!):
=EF=BC=85 =E2=80=A6 =20=E5=8B=BF =E5=8F=BF =E5=93=BF =E5=9B=BF =E5=A7=BF =E5=AF=BF =E5=B4=81 =
=E5=BF=84=E5=BF=BF =20=E6=BF=97 =E7=80=96 =E7=87=BF =E7=8B=A7 =E7=8F=97 =E7=97=BF =E7=9C=80 =
=E7=A7=8A =E7=AB=97 =20=E9=98=80 =E9=9F=97 =E9=A5=A7 =E9=AA=A0 =E9=B6=86 =E9=BE=A5}
"=EF=BC=9C", "=EF=BD=9B", "=EF=BC=9B", "=E2=80=98", =20=3D> ["=E2=80=9C", "=E2=80=9D=E3=80=82", "=EF=BC=8C", "=EF=BC=81", =
"=E4=BF=BF", "=E5=80=80", "=E5=87=BF", "=E5=8B=BF", =20"=EF=BC=81", "=EF=BC=A0", "=EF=BC=83", "=EF=BC=84", "=EF=BC=85",
"=E2=80=A6", "=EF=BC=8A", "=EF=BC=88", "=EF=BC=89", "=E4=B8=80", =
, "=E6=9B=86", "=E6=A1=B6", "=E6=AA=97", "=E6=B3=97", =20"=E5=8F=BF", "=E5=93=BF", "=E5=9B=BF", "=E5=A7=BF", " =E5=AF=BF",
"=E5=B4=81", "=E5=BF=84=E5=BF=BF", "=E6=81=98", "=E6=89=89", "=E6=8E=B5"=
"=E7=B4=80", "=E7=BF=B9", "=E9=80=80", "=E9=87=BD", =20"=E6=BF=97", "=E7=80=96", "=E7=87=BF", "=E7=8B=A7", "=E7=8F=97",
"=E7=97=BF", "=E7=9C=80", "=E7=A7=8A", "=E7=AB=97", "=E7=AF=BF", =
"=E9=8E=B7", "=E9=96=88", "=E9=98=80", "=E9=9F=97", "=E9=A5=A7",
"=E9=AA=A0", "=E9=B6=86", "=E9=BE=A5"]=3D> [226, 226, 239, 239, 239, 239, 239, 226, 239, 239, 239, 239, 239,c.collect.map{|o| o[0]}
226, 239, 239, 239, 228, 228, 229, 229, 229, 229, 229, 229, 229, 229,
229, 229, 230, 230, 230, 230, 230, 230, 230, 230, 231, 231, 231, 231,
231, 231, 231, 231, 231, 231, 231, 233, 233, 233, 233, 233, 233, 233,
233, 233, 233]=3D> [226, 226, 226, 226, 228, 228, 229, 229, 229, 229, 229, 229, 229,c.collect.map{|o| o[0]}.sort
229, 229, 229, 230, 230, 230, 230, 230, 230, 230, 230, 231, 231, 231,
231, 231, 231, 231, 231, 231, 231, 231, 233, 233, 233, 233, 233, 233,
233, 233, 233, 233, 239, 239, 239, 239, 239, 239, 239, 239, 239, 239,
239, 239, 239]=3D> [226, 228, 229, 230, 231, 233, 239]c.collect.map{|o| o[0]}.sort.uniq
There punctuations are those commonly used in China.
There Chinese characters are randomly pickup from
http://www.khngai.com/chinese/charmap/tbluni.php?page=3D0
(from all the six pages.)
maybe 226 to 239 is the range I need.
--=20
Posted via http://www.ruby-forum.com/.
If you have access to a Macintosh, the character pallette is pretty =20
helpful for exploring CJK character ranges as subgroupings within the =20=
range.