R
Ray
Hi. I'm trying to print Unicode characters to standard output, and failing.
Before you ask, yes, my term programs (I've tried six) are in UTF-8
encoding and yes I'm using a font that does have a glyph for the "a with
dieresis" character that the examples/attempts use.
I have checked terminal mode and font by typing ä directly at the
prompt, where it shows up just fine.
But I haven't been able to get wide-character output from a C program.
Here are a number of minimal example programs that didn't work.
First attempt: This prints a question mark.
#include <stdio.h>
#include <wchar.h>
#include <assert.h>
int main(){
wchar_t vowel;
char utf[8];
/* set output stream to wide-character mode, or halt. */
assert(fwide(stdout, 1) > 0);
/* unicode value of a with dieresis, U+00E4 */
vowel = 0x00e4;
/* attempt to print. */
wprintf(L"%Lc \n", vowel);
}
Drat, I said, maybe it doesn't represent wchar_t characters using
unicode values. So I read man pages and found that there are
standard functions to convert utf-8 to wchar_t, and went, "oh,
so utf-8 is what it deals with, and this function has to know
how to translate that into whatever wchar_t format it's using,
right?"
Second attempt:
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <assert.h>
int main(){
wchar_t vowel;
/* set output stream to wide-character mode, or halt. */
assert(fwide(stdout, 1) > 0);
/* convert utf8 to wide character. This assert succeeds
so it's getting something.... */
assert(mbrtowc(&vowel, "ä", 3, NULL) > 0);
/* but this assert fails so what it got was apparently
a long-character NUL. WTF? */
assert(vowel != 0);
/* attempt to print */
wprintf(L"%Lc \n", vowel);
}
After seeing what happened with the second attempt, I realized that
C compilers don't *have* to do anything meaningful with UTF8 text,
either, although the point of providing a utf-8 conversion function
if they don't is sorta murky to me... so I tried encoding the utf-8
representation directly by hand and then using mbrtowc.
third attempt:
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <assert.h>
int main(){
wchar_t vowel;
char utf[8];
/* set output stream to wide-character mode, or halt. */
assert(fwide(stdout, 1) > 0);
/* utf8 encoding of a with dieresis U+00E4 */
utf[0] = 0xc3;
utf[1] = 0xa4;
utf[2] = 0;
/* convert utf8 to wide character. This assert succeeds
so it's getting something.... */
assert(mbrtowc(&vowel, utf, 3, NULL) > 0);
/* but this assert fails so what it got was apparently
a long-character NUL. */
assert(vowel != 0);
/* attempt to print */
wprintf(L"%Lc \n", vowel);
}
Now I'm completely at a loss. My source is pure ASCII, I don't
rely on Unicode encodings, I provide the UTF-8 encoding that
mbrtowc's man page says it understands by hand, and it's still
failing. How can I get a C program to write out wide characters?
Bear
Before you ask, yes, my term programs (I've tried six) are in UTF-8
encoding and yes I'm using a font that does have a glyph for the "a with
dieresis" character that the examples/attempts use.
I have checked terminal mode and font by typing ä directly at the
prompt, where it shows up just fine.
But I haven't been able to get wide-character output from a C program.
Here are a number of minimal example programs that didn't work.
First attempt: This prints a question mark.
#include <stdio.h>
#include <wchar.h>
#include <assert.h>
int main(){
wchar_t vowel;
char utf[8];
/* set output stream to wide-character mode, or halt. */
assert(fwide(stdout, 1) > 0);
/* unicode value of a with dieresis, U+00E4 */
vowel = 0x00e4;
/* attempt to print. */
wprintf(L"%Lc \n", vowel);
}
Drat, I said, maybe it doesn't represent wchar_t characters using
unicode values. So I read man pages and found that there are
standard functions to convert utf-8 to wchar_t, and went, "oh,
so utf-8 is what it deals with, and this function has to know
how to translate that into whatever wchar_t format it's using,
right?"
Second attempt:
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <assert.h>
int main(){
wchar_t vowel;
/* set output stream to wide-character mode, or halt. */
assert(fwide(stdout, 1) > 0);
/* convert utf8 to wide character. This assert succeeds
so it's getting something.... */
assert(mbrtowc(&vowel, "ä", 3, NULL) > 0);
/* but this assert fails so what it got was apparently
a long-character NUL. WTF? */
assert(vowel != 0);
/* attempt to print */
wprintf(L"%Lc \n", vowel);
}
After seeing what happened with the second attempt, I realized that
C compilers don't *have* to do anything meaningful with UTF8 text,
either, although the point of providing a utf-8 conversion function
if they don't is sorta murky to me... so I tried encoding the utf-8
representation directly by hand and then using mbrtowc.
third attempt:
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <assert.h>
int main(){
wchar_t vowel;
char utf[8];
/* set output stream to wide-character mode, or halt. */
assert(fwide(stdout, 1) > 0);
/* utf8 encoding of a with dieresis U+00E4 */
utf[0] = 0xc3;
utf[1] = 0xa4;
utf[2] = 0;
/* convert utf8 to wide character. This assert succeeds
so it's getting something.... */
assert(mbrtowc(&vowel, utf, 3, NULL) > 0);
/* but this assert fails so what it got was apparently
a long-character NUL. */
assert(vowel != 0);
/* attempt to print */
wprintf(L"%Lc \n", vowel);
}
Now I'm completely at a loss. My source is pure ASCII, I don't
rely on Unicode encodings, I provide the UTF-8 encoding that
mbrtowc's man page says it understands by hand, and it's still
failing. How can I get a C program to write out wide characters?
Bear