attempting to print unicode characters.

B

BartC

Ben Bacarisse said:
That's fine, but there is some evidence that the OP wants to use
standard C (for example, fwide is not at all standard is MS C). A
version of C that prints wide strings with %s is not standard C and is
just going to further confuse matters.

But with wprintf(). Note the 'w' prefix (which I assume stands for 'wide').
To further complicate things, my quick review of the docs suggests that
wprintf uses %ls as per standard for wide strings (VS 2010).
http://msdn.microsoft.com/en-us/library/tcxf1dw6.aspx

Seems to be down at the minute; I'll check later..
All the more odd to use %s then!

With wprintf() as I said, which also takes wide format strings.
One explanation is that your source is not UTF-8 encoded. The
Windows-1252 encoding where the euro is 0x80 seems likely.

Yes. I forgot my source uses TXT/ANSI 8-bit format. But converting to both
Unicode and UTF-8 didn't really help as, not knowing the compiler switches
necessary (if they even exist), they caused even more errors.

However this is why I suggested constructing the string rather than using
string literals.
 
B

Ben Bacarisse

BartC said:
But with wprintf(). Note the 'w' prefix (which I assume stands for
'wide').

The w determines the type of the format (as you correctly say below) not
the type of the things you can print. %s is for multi-byte strings and
%ls is for wide ones. This is as true for wprintf as it is for printf.
The only special thing about the w* family is the type of the format
string.
Seems to be down at the minute; I'll check later..


With wprintf() as I said, which also takes wide format strings.

Yes, wide *format* strings but like printf it can print either wide or
multi-byte strings.
Yes. I forgot my source uses TXT/ANSI 8-bit format. But converting to
both Unicode and UTF-8 didn't really help as, not knowing the compiler
switches necessary (if they even exist), they caused even more errors.

Well, unless you are using some weird MS version of wprintf, it won't
work unless you use the right conversion specifier: %ls.

You initially wanted to help so maybe you don't care, but if you want to
try to sort out why this is not working for you, post some code and the
errors you get. I was able to guess a problem from one piece of output
but that was just luck.
However this is why I suggested constructing the string rather than
using string literals.

That makes writing a program that speaks anything but "computer US" very
hard. It's some much simpler if L"..." just works as it should.
Alternatively you can use multi-byte sequences and forget about wide
strings for a very large number of applications.
 
R

Ray

Ben said:
No. The problem here is, I think, setting the stream orientation before
setting the locale. I think the IO system need to know the locale when
it sets up the stream's orientation.
A locale setting should almost always be one of the first things a
program does. Put the locale setting first and I think it will work.

You are absolutely right.

Another attempt, and I'm no longer counting how many:

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <assert.h>
#include <locale.h>

int main(){
wchar_t str[8];
str[0] = 0x20AC;
str[1] = 0;

/* assert(fwide(stdout, 1) > 0); */
assert(setlocale(LC_CTYPE, "en_US.UTF-8") != 0);
wprintf(L"str:%ls code:%X \n",str, str[0]);
}

Does in fact print, as hoped:

str:€ code:20AC

Thank you. Now I can get on with my project. This "gotcha" really
needs to be in the man pages of fwide, setlocale, and wprintf.
Also, it seems appropriate to be using LC_CTYPE instead of LC_ALL
so as to NOT mess with the keyboard layout or other settings of the
users.


Bear
 
R

Ray

Ben said:
That's fine, but there is some evidence that the OP wants to use
standard C (for example, fwide is not at all standard is MS C). A
version of C that prints wide strings with %s is not standard C and is
just going to further confuse matters.

Yes, I do. I'm writing something that's going to live for a long time,
and, if successful, be ported to many different systems. Writing for
portability means writing to the standard and not to any particular
compiler.
Alternatively, the system may not be using Unicode at all. C does not
(yet) require Unicode/UTF-8 to be used as the wide and mult-byte
encodings.

This is true. Reliable C source is ascii-only. gcc accepts UTF-8 as
an extension, and MSVC accepts windows code page characters as an
extension. For portability, I'll be using escape codes to set each
utf-8 codepoint outside the ascii range and then mbstowcs to convert
to whatever wide-character representation the local implementation
uses for printing etc.

But, seriously, this whole thread has been about an important
requirement *NOT* mentioned in the man pages.

Bear
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,083
Messages
2,570,591
Members
47,212
Latest member
RobynWiley

Latest Threads

Top