How can I get a character, given its Unicode index?

Raymundo · Aug 30, 2009

Hello,

At first, I'm sorry that I'm not good at English.

To represent a Unicode character in a string or in a regexp, I can use
"\x{hex}" notation.

my $char = "\x{AC00}";
# $char = "ê°€" -- a Korean character, pronounced "GA"
(I'm not sure you can see this Korean character in your browser.
Please tell me if you can't)

However, it seems that this representation works only when it is hard-
coded. That means, I can't use a variable for the hex value:

my $index = "AC00";
my $char = "\x{$index}"; # This doesn't work.
print length($char),"\n";
print "[$char]\n";

./test.pl

1 -- $char has one character but...
[] -- that character is not "ê°€"(GA). It isn't even a printable
character.
(In fact, $char seems to be null char "\0". I found it by redirecting
the output into a file and viewing the file with hex editor)

Anyway, I tried several codes including double quote, single quote,
s/// op, etc.

Finally I found the code that works:

(code)
#!/usr/bin/perl

my $index = "AC00";
my $char = eval( "\"\\x{$index}\"" );
print length($char),"\n";
print "[$char]\n";

(output)

./test.pl

1
Wide character in print at ./test.pl line 6.
[ê°€]

I had to make a string that consists of
double quote " (it must be quoted with backslash)
backslash \ (quoted)
x
brace {
Unicode index
brace }
double quote " (quoted)
Then I have to eval that string... This is, I think, so complicated.

I think there may be a better way to do this. I found that
Unicode::Char module provides u() subroutine:

my $u = Unicode::Char->new();
my $char = $u->u('AC00'); # u() returns a character of Unicode
index AC00
( http://search.cpan.org/~dankogai/Unicode-Char-0.02/lib/Unicode/Char.pm
)

But I still wonder if there is a Perl internel function or standard
module that do same thing. I want to know what is the most popular
way.

Thanks.
G.Y.Park from South Korea.

Klaus · Aug 30, 2009

To represent a Unicode character in a string or in a regexp, I can use
"\x{hex}" notation.

my $char = "\x{AC00}";
# $char = "ê°€" -- a Korean character, pronounced "GA"
[...]

However, it seems that this representation works only when it is hard-
coded. That means, I can't use a variable for the hex value:

my $index = "AC00";
my $char = "\x{$index}"; Â # This doesn't work.
[...]

Finally I found the code that works:
[...]

my $index = "AC00";
my $char = eval( "\"\\x{$index}\"" );
[...]

Then I have to eval that string... This is, I think, so complicated.

I think there may be a better way to do this. I found that
Unicode::Char module provides u() subroutine:

my $u = Unicode::Char->new();
my $char = $u->u('AC00'); Â Â # u() returns a character of Unicode
index AC00
(http://search.cpan.org/~dankogai/Unicode-Char-0.02/lib/Unicode/Char.pm
)

But I still wonder if there is a Perl internel function or standard
module that do same thing.

perldoc -f chr
perldoc -f oct

the easiest would be:

my $index = "AC00";
my $char = chr(oct("0x$index"));
print length($char),"\n";
print "[$char]\n";

Jürgen Exner · Aug 30, 2009

Raymundo said:
To represent a Unicode character in a string or in a regexp, I can use
"\x{hex}" notation.

my $char = "\x{AC00}";
# $char = "?" -- a Korean character, pronounced "GA"
(I'm not sure you can see this Korean character in your browser.
Please tell me if you can't)

Obviously you need a Korean font to view Korean characters. As I don't
have a Korean font installed, obviously I can't see it.

However, it seems that this representation works only when it is hard-
coded. That means, I can't use a variable for the hex value:

my $index = "AC00";
my $char = "\x{$index}"; # This doesn't work.

Right. And it's not "hardcoded", but think of it as a notation for a
character.

If you do
$wh = 'wh';
{$wh}ile (someCondition) {...}
then you don't get a while loop, either.

my $char = eval( "\"\\x{$index}\"" );

Arggg, that's ugly!

I think there may be a better way to do this. I found that

perldoc -f chr

jue

John W. Krahn · Aug 30, 2009

Klaus said:
To represent a Unicode character in a string or in a regexp, I can use
"\x{hex}" notation.

my $char = "\x{AC00}";
# $char = "ê°€" -- a Korean character, pronounced "GA"
[...]

But I still wonder if there is a Perl internel function or standard
module that do same thing.

Click to expand...

perldoc -f chr
perldoc -f oct

the easiest would be:

my $index = "AC00";
my $char = chr(oct("0x$index"));

Or:

my $char = chr hex $index;

print length($char),"\n";
print "[$char]\n";

John

Raymundo · Aug 31, 2009

Or:

my $char = chr hex $index;

Oops, "chr" can receive Unicode index as its argument.

I've thought it accepts only bytes (0~255)... I'm so sorry for
bothering you.

Thank you all.
G.Y.Park in South Korea

Keith Thompson · Aug 31, 2009

John W. Krahn said:
Klaus said:

To represent a Unicode character in a string or in a regexp, I can use
"\x{hex}" notation.

my $char = "\x{AC00}";
# $char = "ê°€" -- a Korean character, pronounced "GA"
[...]

But I still wonder if there is a Perl internel function or standard
module that do same thing.

Click to expand...

perldoc -f chr
perldoc -f oct

the easiest would be:

my $index = "AC00";
my $char = chr(oct("0x$index"));

Click to expand...

Or:

my $char = chr hex $index;

Or:

my $index = 0xAC00;
my $char = chr $index;

Though if you have the index as a string, you'll need to use hex().

How can I fix my pattern coding error in c++	0	Mar 19, 2023
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
How to replace UniCode representation with actual character?	6	Dec 18, 2013
Unicode help please	5	Oct 19, 2013
Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
Can anyone help me code a simple python code?	1	Mar 13, 2022
GET NEIL DEGRASSES TYSON, I ripped a hole with this one...	0	Nov 10, 2022

How can I get a character, given its Unicode index?

Raymundo

Klaus

Jürgen Exner

John W. Krahn

Raymundo

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads