-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
If I pass a value to a string, like "xyz\xc2\xbfwww", then the
runtime value (VC++)of this string is "xyz¿www". Is this runtime
value in UTF-8 encoding? How can I check this?
Walk the string and print it out as hex, byte by byte.
On my Linux system, GCC encodes all narrow strings as UTF-8 and all
wide strings as UCS-4. How they are displayed to the user (the output
encoding) depends on the locale, which causes them to be recoded on
the fly if required.
The following test should be portable, but does require that your
compiler accept UTF-8 source (recode it if required)
Regards,
Roger
#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
int main(void)
{
setlocale(LC_ALL, "");
const char *narrow = "Test Unicode (narrow): ïà ý ÐÐ¾Ñ ã‘ãŸã„ã¨é¡˜ã†!\n";
fprintf(stdout, "%s\n", narrow);
fprintf(stdout, "Narrow bytes:\n");
for (int i = 0; i< strlen(narrow); ++i)
fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)narrow+i));
if (fwide (stderr, 1) <= 0)
fprintf(stdout, "Failed to set stderr to wide orientation\n");
const wchar_t *wide = L"Test Unicode (wide): ïà ý ÐÐ¾Ñ ã‘ãŸã„ã¨é¡˜ã†!\n";
fwprintf(stderr, L"\n%ls\n", wide);
fwprintf(stderr, L"\nNarrow-to-wide: %s\n", narrow);
fprintf(stdout, "\nWide-to-narrow: %ls\n", wide);
fprintf(stdout, "Wide bytes:\n");
for (int i = 0; i< (wcslen(wide) * sizeof(wchar_t)); ++i)
fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)wide+i));
return 0;
}
- --
Roger Leigh
Printing on GNU/Linux?
http://gimp-print.sourceforge.net/
Debian GNU/Linux
http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <
http://mailcrypt.sourceforge.net/>
iD8DBQFCNMIuVcFcaSW/uEgRAneFAJwLvrXidezttj2ZdhTer450Q796wQCgjrDL
SfeNBsrg/ggtOoA7s0iU8ew=
=0zUE
-----END PGP SIGNATURE-----