Need help with printing Unicode! (C++ on CentOS)

Z

Zerex71

Greetings,

I'm sure this has been addressed before but I've hunted all over the
web and no one seems to provide a comprehensive answer. I just want
to do one thing: Under CentOS, in a simple C++ program, I'd like to be
able to print Unicode characters to a console output. For example,
I'd like to print the musical flat, natural, and sharp signs.

Here's what I've done so far:
1. Using Eclipse, created a small C++ console project.
2. Declare three chars, each of type wchar_t, and assigned them their
Unicode values (0x266d, 0x266e, 0x266f).
3. Attempted to print them out using wprintf().
4. Set my output console to a font which can represent the characters
(glyphs?) - Lucida Console

A few observations:
1. I can go to a Unicode code page website and copy the characters
displayed and paste them into my source file which is in the same font
(that was my first trick which ultimately blew me out of the water
because Eclipse was bitching about not being to save the files due to
encoding...tried changing it...then it promptly deleted all my lines
and left me with a bunch of NUL).
2. Mixing cout and wprintf results in the wprintf statements being
totally ignored.
3. Using only wprintf results in "Sign: ?" displayed in the console
output, even though it can display the glyphs correctly when I pasted
them (1.)
4. Calling setlocale() as directed by an example has no effect on my
program.
5. Using fwide() to determine if my setup is legit works because I
don't hit the exit condition that I wrote for that test.

So, I don't know what else to try to get this to work. There's a lot
of stuff about Unicode on Windows out there but I'm not doing Windows,
and figured the Linux community might have an answer.

Thanks.
 
Z

Zerex71

I am not not sure about CentOS, but in Linux generally UTF-8 is used. One
should have an UTF8 locale (e.g. LANG=en_US.utf8). If your code
internally uses wchar_t, then it should be converted to UTF-8 before
output. I am not sure if wprintf() or wcout() can do that automatically.
In our software we use UTF-8 and std::string internally, and it is
working perfectly in Linux.

hth
Paavo

Hi Paavo,

Here's my locale setting:

(mfeher) mfeher-l4 [~] > locale
LANG=en_US.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C

I was under the impression that I had more of an "environment setup"
issue than a coding issue, i.e. I was unaware that I had to do
anything more to the code than change from cout/printf to wprintf.
Also, from a brief, brief reading of all this material on the
Internet, I don't want UTF-8 because that's too small to hold the
character codes I wish to print. Here's the code I am trying:

#include <iostream>
using namespace std;

int main() {
// cout << "Testing Unicode" << endl; // prints Testing Unicode
// If you try to mix Unicode printing with non-Unicode printing, the
switch
// causes you to lose output!
setlocale(LC_ALL, ""); // Does nothing

// Let's check our orientation...it never fails
if (fwide(stdout, 1) < 0)
{
cerr << "ERROR: Output not set to wide. Exiting..." << endl;
return -1;
}

// Declare a Unicode character and try to print it out
wchar_t mychar = 0x266d; // The music flat sign
wprintf(L"Here's mychar: %lc\n", mychar);
return 0;
}
 
J

Juha Nieminen

Zerex71 said:
I'm sure this has been addressed before but I've hunted all over the
web and no one seems to provide a comprehensive answer. I just want
to do one thing: Under CentOS, in a simple C++ program, I'd like to be
able to print Unicode characters to a console output. For example,
I'd like to print the musical flat, natural, and sharp signs.

Here's what I've done so far:
1. Using Eclipse, created a small C++ console project.
2. Declare three chars, each of type wchar_t, and assigned them their
Unicode values (0x266d, 0x266e, 0x266f).
3. Attempted to print them out using wprintf().

You can't output raw unicode values and expect your terminal emulator
to understand them. You have to output them *encoded* with the same
encoding scheme as your terminal. Usually this will be UTF-8.

Either output the encoded values directly, or use an UTF-8 encoding
library to convert your raw unicode values into UTF-8 codes. One such
library is, for example: http://utfcpp.sourceforge.net/
 
J

Juha Nieminen

Zerex71 said:
I don't want UTF-8 because that's too small to hold the
character codes I wish to print.

I think you have a misunderstanding of what UTF-8 is. UTF-8 can
represent the entire unicode address space.

You might not "want" it, but you have no option because your terminal
emulator most probably wants UTF-8. It doesn't want raw unicode values.
 
Z

Zerex71

  I think you have a misunderstanding of what UTF-8 is. UTF-8 can
represent the entire unicode address space.

  You might not "want" it, but you have no option because your terminal
emulator most probably wants UTF-8. It doesn't want raw unicode values.

I probably do have a misunderstanding, but like I said, it appears
that UTF-8 is "smaller" or more restrictive than UTF-16, UTF-32, etc.
Is that the case? If the -8 means 8 bits, there's no way I can
convert numbers in the upper ranges (e.g. 0x266d) into 8 bits and even
expect to get my glyphs out on-screen.

So it sounds like my terminal/environment is set up to UTF-8, and I
just have to add a little code to my program before, during, or after
the wprintf() call to make sure they are displayed properly on-
screen. At least this is what I gather from your responses.

Mike
 
J

Juha Nieminen

Zerex71 said:
I probably do have a misunderstanding, but like I said, it appears
that UTF-8 is "smaller" or more restrictive than UTF-16, UTF-32, etc.
Is that the case?

No.
If the -8 means 8 bits

It means that the unicode values are encoded into units of 8 bits.
Larger unicode values are encoded into more than one 8-bit unit.

There exists also UTF-7, where each unit is 7 bits (in other words, no
byte will have a value larger than 127). Larger values are encoded using
even more units.

UTF-16 uses units of 16 bits. Unicode values which don't fit in them
are encoded using two 16-bit units, similarly to the previous two.

UTF-8 is the most popular because it "compresses" typical English text
the best, while still allowing the full range of unicode characters to
be represented.
So it sounds like my terminal/environment is set up to UTF-8, and I
just have to add a little code to my program before, during, or after
the wprintf() call to make sure they are displayed properly on-
screen. At least this is what I gather from your responses.

What you do is that you convert your unicode values into a stream of
UTF-8 encoded bytes, and then output those bytes to the terminal. As I
mentioned in another post, there are libraries to help you do this.
 
J

James Kanze

I'm sure this has been addressed before but I've hunted all
over the web and no one seems to provide a comprehensive
answer. I just want to do one thing: Under CentOS, in a
simple C++ program, I'd like to be able to print Unicode
characters to a console output.

I've never heard of CentOS, so I can't address any system
specific problems here (and they would be off topic).
For example, I'd like to print the musical flat, natural, and
sharp signs.
Here's what I've done so far:
1. Using Eclipse, created a small C++ console project.
2. Declare three chars, each of type wchar_t, and assigned them their
Unicode values (0x266d, 0x266e, 0x266f).
3. Attempted to print them out using wprintf().
4. Set my output console to a font which can represent the characters
(glyphs?) - Lucida Console

What locale are you using? And what encoding does the font use?
You need to ensure that the encoding in the locale is the same
as the one used by the renderer for the font.
A few observations:
1. I can go to a Unicode code page website and copy the
characters displayed and paste them into my source file which
is in the same font (that was my first trick which ultimately
blew me out of the water because Eclipse was bitching about
not being to save the files due to encoding...tried changing
it...then it promptly deleted all my lines and left me with a
bunch of NUL).

First, a source file isn't in a "font". A source file is a
sequence of text characters, in a certain encoding. A font
defines how specific characters will be rendered.

Secondly, in order to be displayable everywhere, I think that
the Unicode code pages use images, and not characters, for the
characters in the code pages. This allows displaying characters
which aren't in any font installed on the machine. There's no
way copy/pasting an image to your source file can possibly work.
2. Mixing cout and wprintf results in the wprintf statements being
totally ignored.

You've raised an interesting point. According to the C standard
(relevant to wprintf), you can't mix wide and narrow output on
the same stream (in this case, stdout). C++ has a similar
restriction---if you've output to cout, use of wcout becomes
illegal, and vice versa. And since stdout and cout/wcout are
supposed to use the same stream, and are synchronized with one
another (by default), I'm pretty sure that the intent is not to
allow this either. In general, all of your IO to a given source
or sink should be of the same type; if you want to output
wchar_t somewhere, all output should be as wchar_t.
3. Using only wprintf results in "Sign: ?" displayed in the
console output, even though it can display the glyphs
correctly when I pasted them (1.)

Probably a question of locale. In the "C" locale, most
implementations only allow characters in the range 0...127 when
converting wchar_t to char.

For wprintf, you'll have to set the global locale. For
std::wcout, you'll have to imbue the desired locale (since the
object was constructed using the global locale before you could
modify the global locale).
4. Calling setlocale() as directed by an example has no effect
on my program.

What did you use as an argument to setlocale()? (But this is
very OS dependent. I know how it works under Unix, but not for
other systems.)
5. Using fwide() to determine if my setup is legit works
because I don't hit the exit condition that I wrote for that
test.
So, I don't know what else to try to get this to work.
There's a lot of stuff about Unicode on Windows out there but
I'm not doing Windows, and figured the Linux community might
have an answer.

Linux is pretty simple. Just use a UTF-8 locale and a UTF-8
encoded font, and everything works pretty well. For that
matter, under Unix, if all you're concerned with is a few
special characters, I'd just manually encode them as strings in
UTF-8, and output them as char. Most (in not all) of the
locales simply pass all char straight through, without worrying
whether they're legal or not. So instead of a wchar_t with
0x266D, you'd use:
char const flat[] = "\xE2\x99\xAD" ;
and output that directly. (At least, that's what I think should
happen. I don't get any output for the above, but it works with
other Unicode characters, so I suspect that the problem is
simply that my fonts don't contain the characters you give. All
of the Wingbats (codes 2600 to 26FF) display as a simple blank
on my Linux machine.)
 
J

James Kanze

Here's my locale setting:
(mfeher) mfeher-l4 [~] > locale
LANG=en_US.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
I was under the impression that I had more of an "environment
setup" issue than a coding issue, i.e. I was unaware that I
had to do anything more to the code than change from
cout/printf to wprintf. Also, from a brief, brief reading of
all this material on the Internet, I don't want UTF-8 because
that's too small to hold the character codes I wish to print.

UTF-8, UTF-16 and UTF-32 are "transformation formats",
specifying how to "present" any Unicode (UCS-4) character as a
sequence of 8 bit bytes, 16 bit words, or 32 bit words. Since
all of the data interfaces under Unix are 8 bits, UTF-8 is the
transformation format you need.
Here's the code I am trying:
#include <iostream>
using namespace std;
int main() {
// cout << "Testing Unicode" << endl; // prints Testing Unicode
// If you try to mix Unicode printing with non-Unicode printing, the
switch
// causes you to lose output!
setlocale(LC_ALL, ""); // Does nothing
// Let's check our orientation...it never fails
if (fwide(stdout, 1) < 0)
{
cerr << "ERROR: Output not set to wide. Exiting..." << endl;
return -1;
}
// Declare a Unicode character and try to print it out
wchar_t mychar = 0x266d; // The music flat sign
wprintf(L"Here's mychar: %lc\n", mychar);
return 0;
}

That should work, unless the font doesn't have a rendering for
0x266D (the ones I have installed under Linux don't). This is
easily checked---try some more "usual" Unicode character, e.g.
0x00E9 (an é). If that displays, then the problem is almost
certainly that the font doesn't contain a rendering for the
character you want. In which case, there's no way you'll be
able to display it (other than by finding some font which does
support it, installing it and using it).
 
Z

Zerex71

Zerex71 <[email protected]> kirjutas:
I'm sure this has been addressed before but I've hunted
all over the web and no one seems to provide a
comprehensive answer.  I just want to do one thing: Under
CentOS, in a simple C++ program, I'd like to be able to
print Unicode characters to a console output.  For
example, I'd like to print the musical flat, natural, and
sharp signs.
Here's what I've done so far:
1. Using Eclipse, created a small C++ console project.
2. Declare three chars, each of type wchar_t, and assigned them their
Unicode values (0x266d, 0x266e, 0x266f).
3. Attempted to print them out using wprintf().
4. Set my output console to a font which can represent the characters
(glyphs?) - Lucida Console
I am not not sure about CentOS, but in Linux generally UTF-8
is used. One should have an UTF8 locale (e.g.
LANG=en_US.utf8). If your code internally uses wchar_t, then
it should be converted to UTF-8 before output. I am not sure
if wprintf() or wcout() can do that automatically.  In our
software we use UTF-8 and std::string internally, and it is
working perfectly in Linux.
Here's my locale setting:
(mfeher) mfeher-l4 [~] > locale
LANG=en_US.UTF-8
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
I was under the impression that I had more of an "environment
setup" issue than a coding issue, i.e. I was unaware that I
had to do anything more to the code than change from
cout/printf to wprintf.  Also, from a brief, brief reading of
all this material on the Internet, I don't want UTF-8 because
that's too small to hold the character codes I wish to print.

UTF-8, UTF-16 and UTF-32 are "transformation formats",
specifying how to "present" any Unicode (UCS-4) character as a
sequence of 8 bit bytes, 16 bit words, or 32 bit words.  Since
all of the data interfaces under Unix are 8 bits, UTF-8 is the
transformation format you need.


Here's the code I am trying:
#include <iostream>
using namespace std;
int main() {
//      cout << "Testing Unicode" << endl; // prints Testing Unicode
        // If you try to mix Unicode printing with non-Unicode printing, the
switch
        // causes you to lose output!
        setlocale(LC_ALL, "");  // Does nothing
        // Let's check our orientation...it never fails
        if (fwide(stdout, 1) < 0)
        {
                cerr << "ERROR: Output not set to wide.  Exiting..." << endl;
                return -1;
        }
        // Declare a Unicode character and try to print it out
        wchar_t mychar = 0x266d; // The music flat sign
        wprintf(L"Here's mychar: %lc\n", mychar);
        return 0;
}

That should work, unless the font doesn't have a rendering for
0x266D (the ones I have installed under Linux don't).  This is
easily checked---try some more "usual" Unicode character, e.g.
0x00E9 (an é).  If that displays, then the problem is almost
certainly that the font doesn't contain a rendering for the
character you want.  In which case, there's no way you'll be
able to display it (other than by finding some font which does
support it, installing it and using it).

--
James Kanze (GABI Software)             email:[email protected]
Conseils en informatique orientée objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Hi James,

The font I am using (Lucida Console) supports the characters. My
assertion of this is based on the fact that I can go to one of the
websites with extended character maps, copy the symbol(s) desired, and
stick them into my source file, which also uses the same font. The
characters appeared just fine, but I couldn't save the file and ran
immediately into encoding problems (I am using Eclipse C++) that
resulted in me basically being unable to save or open the file
anymore, so I copied my source into a new project and started anyway.
But I was able to copy the original symbols and drop them directly in
my file editor (Lucida Console) and they displayed fine.

Also, help me understand in your example how my code 0x266D gets
turned into "\xE2\x99\xAD".

Mike
 
Z

Zerex71

  You can't output raw unicode values and expect your terminal emulator
to understand them. You have to output them *encoded* with the same
encoding scheme as your terminal. Usually this will be UTF-8.

  Either output the encoded values directly, or use an UTF-8 encoding
library to convert your raw unicode values into UTF-8 codes. One such
library is, for example:http://utfcpp.sourceforge.net/

That encoding library looked way too involved for what I want to do,
and in the end, I didn't see any simple method to set my encoding or
do whatever I need to do to print my characters. I just want to pass
my Unicode code string to a function and have it print out correctly.
Thanks.
 
Z

Zerex71

What locale are you using?  And what encoding does the font use?
You need to ensure that the encoding in the locale is the same
as the one used by the renderer for the font.

How do I check what encoding the font has?
 
Z

Zerex71

I'm sure this has been addressed before but I've hunted all
over the web and no one seems to provide a comprehensive
answer.  I just want to do one thing: Under CentOS, in a
simple C++ program, I'd like to be able to print Unicode
characters to a console output.

I've never heard of CentOS, so I can't address any system
specific problems here (and they would be off topic).
For example, I'd like to print the musical flat, natural, and
sharp signs.
Here's what I've done so far:
1. Using Eclipse, created a small C++ console project.
2. Declare three chars, each of type wchar_t, and assigned them their
Unicode values (0x266d, 0x266e, 0x266f).
3. Attempted to print them out using wprintf().
4. Set my output console to a font which can represent the characters
(glyphs?) - Lucida Console

What locale are you using?  And what encoding does the font use?
You need to ensure that the encoding in the locale is the same
as the one used by the renderer for the font.
A few observations:
1. I can go to a Unicode code page website and copy the
characters displayed and paste them into my source file which
is in the same font (that was my first trick which ultimately
blew me out of the water because Eclipse was bitching about
not being to save the files due to encoding...tried changing
it...then it promptly deleted all my lines and left me with a
bunch of NUL).

First, a source file isn't in a "font".  A source file is a
sequence of text characters, in a certain encoding.  A font
defines how specific characters will be rendered.

Secondly, in order to be displayable everywhere, I think that
the Unicode code pages use images, and not characters, for the
characters in the code pages.  This allows displaying characters
which aren't in any font installed on the machine.  There's no
way copy/pasting an image to your source file can possibly work.
2. Mixing cout and wprintf results in the wprintf statements being
totally ignored.

You've raised an interesting point.  According to the C standard
(relevant to wprintf), you can't mix wide and narrow output on
the same stream (in this case, stdout).  C++ has a similar
restriction---if you've output to cout, use of wcout becomes
illegal, and vice versa.  And since stdout and cout/wcout are
supposed to use the same stream, and are synchronized with one
another (by default), I'm pretty sure that the intent is not to
allow this either.  In general, all of your IO to a given source
or sink should be of the same type; if you want to output
wchar_t somewhere, all output should be as wchar_t.
3. Using only wprintf results in "Sign: ?" displayed in the
console output, even though it can display the glyphs
correctly when I pasted them (1.)

Probably a question of locale.  In the "C" locale, most
implementations only allow characters in the range 0...127 when
converting wchar_t to char.

For wprintf, you'll have to set the global locale.  For
std::wcout, you'll have to imbue the desired locale (since the
object was constructed using the global locale before you could
modify the global locale).
4. Calling setlocale() as directed by an example has no effect
on my program.

What did you use as an argument to setlocale()?  (But this is
very OS dependent.  I know how it works under Unix, but not for
other systems.)
5. Using fwide() to determine if my setup is legit works
because I don't hit the exit condition that I wrote for that
test.
So, I don't know what else to try to get this to work.
There's a lot of stuff about Unicode on Windows out there but
I'm not doing Windows, and figured the Linux community might
have an answer.

Linux is pretty simple.  Just use a UTF-8 locale and a UTF-8
encoded font, and everything works pretty well.  For that
matter, under Unix, if all you're concerned with is a few
special characters, I'd just manually encode them as strings in
UTF-8, and output them as char.  Most (in not all) of the
locales simply pass all char straight through, without worrying
whether they're legal or not.  So instead of a wchar_t with
0x266D, you'd use:
    char const flat[] = "\xE2\x99\xAD" ;
and output that directly.  (At least, that's what I think should
happen.  I don't get any output for the above, but it works with
other Unicode characters, so I suspect that the problem is
simply that my fonts don't contain the characters you give.  All
of the Wingbats (codes 2600 to 26FF) display as a simple blank
on my Linux machine.)

--
James Kanze (GABI Software)             email:[email protected]
Conseils en informatique orientée objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

So let me see if I can explain my understanding of this whole thing
(because I want to finally solve this problem, having been trying to
figure it out off and on for quite a while):

1. Let's say I have a file, and it's nothing more than a string of 1s
and 0s when you get right down to it.
2. The encoding that I will use to read/display the file specifies to
the OS how to group and treat the bits.
3. A selected encoding then specifies (for lack of a better term) a
set of codepages from which to select the characters to display (i.e.
based on a particular grouping of bits/bytes, this will index into an
appropriate set of characters).
4. The bytes are presented to the display portion of the OS and it
will reference the operable font in the window, editor, dialog, etc.
to display the individual characters.
5. If the specified font doesn't have a glyph for a given byte
combination, the resulting behavior will be unpredictable.
6. If it does, it will basically do a table lookup for the appropriate
glyph, and fetch that glyph and dump it to the screen.

Is any of this correct?
 
Z

Zerex71

For reference, the page I am using to obtain my chars is
http://www.atm.ox.ac.uk/user/iwi/charmap.html.
I select 2 and 6 from the codepage dropdowns at top, then click on 6D,
6E, and 6F, and it places the chars up above in an output box. I can
copy those chars and paste them into my sourcecode editor and they
display properly, but run immediately into problems if I try to save
or compile. But they display just fine.

Mike
 
J

Juha Nieminen

Zerex71 said:
That encoding library looked way too involved for what I want to do,

That's because you didn't really find out how to use it. You were most
probably confused by the large example at the beginning. The library is
really simple to use.

std::string encodedString;
for(size_t i = 0; i < unicodeValues.size(); ++i)
utf8::append(unicodeValues, std::back_inserter(encodedString);
std::cout << encodedString;
and in the end, I didn't see any simple method to set my encoding or
do whatever I need to do to print my characters. I just want to pass
my Unicode code string to a function and have it print out correctly.
Thanks.

The above code does exactly that.
 
Z

Zerex71

Zerex71 said:
That encoding library looked way too involved for what I want to do,

  That's because you didn't really find out how to use it. You were most
probably confused by the large example at the beginning. The library is
really simple to use.

std::string encodedString;
for(size_t i = 0; i < unicodeValues.size(); ++i)
    utf8::append(unicodeValues, std::back_inserter(encodedString);
std::cout << encodedString;
and in the end, I didn't see any simple method to set my encoding or
do whatever I need to do to print my characters.  I just want to pass
my Unicode code string to a function and have it print out correctly.
Thanks.

  The above code does exactly that.


Thanks for your information, but you know what, I really don't care to
spend hours looking for something that should be fairly simple to do.
I wasn't "confused" by anything. I just don't have time an interest
in becoming a Unicode expert to do something very simple. I wonder if
this means I have to download and install the utfcpp library, or if I
can just do this as-is in the code above.
 
Z

Zerex71

Zerex71 <[email protected]> kirjutas:


Also, help me understand in your example how my code 0x266D gets
turned into "\xE2\x99\xAD".

Presumably this is UTF-8 encoding of your character.

One thing is the encoding your source file uses, and the other is what
you want to output. I'm not familiar with Eclipse so I cannot comment on
the former. If needed, you can use iconv() to convert from your encoding
to UTF-8.

The following program works for me on a SuSE Linux and produces some kind
of music sign on the console. My locale is LANG=en_US.utf8.

#include <stdio.h>

int main() {
  const unsigned char test[4]={0xE2, 0x99, 0xAD, 0};
  printf("Test: %s\n", test);

}

hth
Paavo

Hi Paavo,

Thanks for the help. I will try that. I still do not see how 0x266D
=> E299AD. Where is the conversion for that explained?
 
Z

Zerex71

Zerex71 <[email protected]> kirjutas:


Also, help me understand in your example how my code 0x266D gets
turned into "\xE2\x99\xAD".

Presumably this is UTF-8 encoding of your character.

One thing is the encoding your source file uses, and the other is what
you want to output. I'm not familiar with Eclipse so I cannot comment on
the former. If needed, you can use iconv() to convert from your encoding
to UTF-8.

The following program works for me on a SuSE Linux and produces some kind
of music sign on the console. My locale is LANG=en_US.utf8.

#include <stdio.h>

int main() {
  const unsigned char test[4]={0xE2, 0x99, 0xAD, 0};
  printf("Test: %s\n", test);

}

hth
Paavo

I just tried that but it did not work for me - but, I'm running the
console output to the Eclipse console tab, not within an xterm.
 
J

Juha Nieminen

Zerex71 said:
Thanks for your information, but you know what, I really don't care to
spend hours looking for something that should be fairly simple to do.

Yet you have already spent days asking about it in this newsgroup. If
you had googled about it instead and read a few pieces of documentation,
you would have probably saved yourself a lot of trouble.

(Not that it's wrong to ask here or anywhere else for help. It's just
that your attitude feels a bit picky. When someone suggests a relatively
easy solution to your problem you dismiss it without even trying to see
how that solution works.)
I wasn't "confused" by anything. I just don't have time an interest
in becoming a Unicode expert to do something very simple.

Unicode and its encodings are, unfortunately not a simple matter.
Fortunately people have already gone through the trouble and offer free
libraries to do the hard part.
I wonder if
this means I have to download and install the utfcpp library, or if I
can just do this as-is in the code above.

You don't have to install it. It's just a set of header files. You put
it anywhere your compiler will find them (eg. inside your project
directory) and then just #include the appropriate header and start using
it. I gave you a simple example of the usage.

Don't immediately dismiss a solution just because you don't understand
it in 10 seconds.
 
J

Juha Nieminen

Paavo said:
In the URL I posted few days ago: http://en.wikipedia.org/wiki/UTF-8

The conversion itself is relevant only if you are going to do it
yourself (or are genuinely interested in how it works, which wouldn't be
a bad idea, really; general knowledge about things never hurts).

There exist libraries (like the one I gave a link to in another post)
to do the conversion for you.
 
Z

Zerex71

Zerex71 <[email protected]> kirjutas:


Zerex71 <[email protected]> kirjutas:
Also, help me understand in your example how my code 0x266D gets
turned into "\xE2\x99\xAD".
Presumably this is UTF-8 encoding of your character.
One thing is the encoding your source file uses, and the other is
what you want to output. I'm not familiar with Eclipse so I cannot
comment on the former. If needed, you can use iconv() to convert from
your encoding to UTF-8.
The following program works for me on a SuSE Linux and produces some
kind of music sign on the console. My locale is LANG=en_US.utf8.
#include <stdio.h>
int main() {
  const unsigned char test[4]={0xE2, 0x99, 0xAD, 0};
  printf("Test: %s\n", test);
}
hth
Paavo
Hi Paavo,
Thanks for the help.  I will try that.  I still do not see how 0x266D
=> E299AD.  

UTF sequences are usually not expressed as a single number because they
are of variable length.
Where is the conversion for that explained?

In the URL I posted few days ago:http://en.wikipedia.org/wiki/UTF-8

Ah, I see that. Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top