UTF-8 and C string

Mike · Mar 13, 2005

Hi there,

Here is my question:

If I pass a value to a string, like "xyz\xc2\xbfwww", then the runtime
value (VC++)of this string is "xyzÂ¿www". Is this runtime value in
UTF-8 encoding? How can I check this?

Thanks a lot.

Mike

Roger Leigh · Mar 13, 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If I pass a value to a string, like "xyz\xc2\xbfwww", then the
runtime value (VC++)of this string is "xyzÃ‚Â¿www". Is this runtime
value in UTF-8 encoding? How can I check this?

Walk the string and print it out as hex, byte by byte.

On my Linux system, GCC encodes all narrow strings as UTF-8 and all
wide strings as UCS-4. How they are displayed to the user (the output
encoding) depends on the locale, which causes them to be recoded on
the fly if required.

The following test should be portable, but does require that your
compiler accept UTF-8 source (recode it if required)

Regards,
Roger

#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>

int main(void)
{
setlocale(LC_ALL, "");

const char *narrow = "Test Unicode (narrow): Ã¯Ã Ã½ ÐÐ¾Ñ ã‘ãŸã„ã¨é¡˜ã†!\n";
fprintf(stdout, "%s\n", narrow);

fprintf(stdout, "Narrow bytes:\n");
for (int i = 0; i< strlen(narrow); ++i)
fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)narrow+i));

if (fwide (stderr, 1) <= 0)
fprintf(stdout, "Failed to set stderr to wide orientation\n");

const wchar_t *wide = L"Test Unicode (wide): Ã¯Ã Ã½ ÐÐ¾Ñ ã‘ãŸã„ã¨é¡˜ã†!\n";
fwprintf(stderr, L"\n%ls\n", wide);

fwprintf(stderr, L"\nNarrow-to-wide: %s\n", narrow);

fprintf(stdout, "\nWide-to-narrow: %ls\n", wide);

fprintf(stdout, "Wide bytes:\n");
for (int i = 0; i< (wcslen(wide) * sizeof(wchar_t)); ++i)
fprintf(stdout, "%3d: %02X\n", i, (unsigned int) *((unsigned char *)wide+i));

return 0;
}

- --
Roger Leigh
Printing on GNU/Linux? http://gimp-print.sourceforge.net/
Debian GNU/Linux http://www.debian.org/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

iD8DBQFCNMIuVcFcaSW/uEgRAneFAJwLvrXidezttj2ZdhTer450Q796wQCgjrDL
SfeNBsrg/ggtOoA7s0iU8ew=
=0zUE
-----END PGP SIGNATURE-----

Batch Convert HTML to UTF-8 Files	2	Oct 2, 2023
Unicode (UTF-8) in C	13	Mar 16, 2014
Django utf-8 urls works on local but not working on production	1	Apr 11, 2023
UTF-8 vs w_char	48	Nov 3, 2013
UTF-8 read & print?	6	Nov 25, 2012
8 buttons ,3 states and PJON Arduino	0	Jan 15, 2022
UTF-8 and strings	44	Jun 7, 2011
Output confusion	2	Mar 9, 2023

UTF-8 and C string

Mike

Roger Leigh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads