G
Guest
From thread
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/79d767efa42df516
I'll take this at face value and I'll have to suppose that I don't
understand what the streams should do.
I guess then the root of my problem is my expectation that if I use a
std:fstream it will write a char sequence to disk and if I use a
std::wofstream it will write a wchar_t sequence to disk. I presume then
that this is wrong?
I also have to assume that if I write a UTF-16 sequence to std::wcout
then I should not expect it to display correctly on a platform that
uses UTF-16?
The code below summarises my expectation of what I would be able to do,
so I guess my understanding is off. What should the code below do?
#include "stdafx.h" // This header is empty
#include <iostream>
#include <conio.h>
#include <fstream>
int wmain(int /*argc*/, wchar_t* /*argv*/[])
{
std::wcout << L"Hello world!" << std::endl;
// Surname with AE ligature
std::wcout << L"Hello Kirit S\x00e6lensminde" << std::endl;
// Kirit transliterated (probably badly) into Greek
std::wcout << L"Hello \x039a\x03b9\x03c1\x03b9\x03c4" << std::endl;
// Kirit transliterated into Thai
std::wcout << L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17" << std::endl;
//if ( std::wcout )
// std::cout << "\nstd::wcout still good" << std::endl;
//else
// std::cout << "\nstd::wcout gone bad" << std::endl;
_cputws( L"\n\n\n" );
_cputws( L"Hello Kirit S\x00e6lensminde\n" ); // AE ligature
_cputws( L"Hello \x039a\x03b9\x03c1\x03b9\x03c4\n" ); // Greek
_cputws( L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17\n" ); // Thai
std::wofstream wout1( "test1.txt" );
wout1 << L"12345" << std::endl;
//if ( wout1 )
// std::cout << "\nwout1 still good" << std::endl;
//else
// std::cout << "\nwout1 gone bad" << std::endl;
std::wofstream wout2( "test2.txt" );
wout2 << L"Hello world!" << std::endl;
wout2 << L"Hello Kirit S\x00e6lensminde" << std::endl;
wout2 << L"Hello \x039a\x03b9\x03c1\x03b9\x03c4" << std::endl;
wout2 << L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17" << std::endl;
//if ( wout2 )
// std::cout << "\nwout2 still good" << std::endl;
//else
// std::cout << "\nwout2 gone bad" << std::endl;
return 0;
}
I've compiled this on MSVC Studio 2003 and it reports the following
command line switches on a debug build (i.e. Unicode defined as the
character set and wchar_t as a built-in type):
/Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm
/EHsc /RTC1 /MLd /Zc:wchar_t /Zc:forScope /Yu"stdafx.h"
/Fp"Debug/wcout.pch" /Fo"Debug/" /Fd"Debug/vc70.pdb" /W3 /nologo /c
/Wp64 /ZI /TP
If I run this directly from the IDE then it clearly does some odd
narrowing of the output as the Greek cputws() line displays:
Hello ????t
Which to me looks like a failure in the character substitution from
Unicode to what I presume is some OEM encoding. Now don't get wrong, I
think this is a poor default situation for running something on a
Unicode platform (this is on Windows 2003 Server), but it does seem to
be beside the point for this discussion.
If I run it from a command prompt with Unicode I/O turned on (cmd.exe
/u) then the output is somewhat more encouraging, but not a lot:
Hello world!
Hello Kirit Sµlensminde
Hello
Hello Kirit Sælensminde
Hello ΚιÏιτ
Hello คีริท
The _cputws calls all work as I would expect, but std::wcout doesn't
work at all. Worse uncommenting the stream tests shows that there is an
error on std::wcout rendering it unusable from then on. Note also that
it has translated the AE ligature into what looks to me like a Greek
lower case mu. The Greek capital kappa has wedged the stream.
The two txt files are interesting. test1.txt is seven bytes long,
exactly the half the size I would naively expect and test2.txt is 45
bytes long. Exactly the length I'd expect from a char stream that only
went up to, but didn't include, the Greek capital kappa.
Now, if this is all by design then I presume that there is something
fairly simple that I can do to have all of this work in the way that I
naively expect, or does the C++ standard in some way mandate that it is
going to be really hard? Myabe it's a quality of implementation issue
and we just have to buy the library upgrade or write our own codecvt
implementations?
What we've done is to use our own implementation of a UTF-16 to UTF-8
converter (that we know works properly as it drives our web interfaces)
and just send that sequence to a std:fstream. We've had to more or
less give up on meaningful and pipeable console output.
K
http://groups.google.com/group/comp.lang.c++/browse_thread/thread/79d767efa42df516
P.J. Plauger said:In practice they're not broken and you can write Unicode characters.
As with any other Standard C++ library, you need an appropriate
codecvt facet for the code conversion you favor. See our add-on
library, which includes a broad assortment.
I'll take this at face value and I'll have to suppose that I don't
understand what the streams should do.
I guess then the root of my problem is my expectation that if I use a
std:fstream it will write a char sequence to disk and if I use a
std::wofstream it will write a wchar_t sequence to disk. I presume then
that this is wrong?
I also have to assume that if I write a UTF-16 sequence to std::wcout
then I should not expect it to display correctly on a platform that
uses UTF-16?
The code below summarises my expectation of what I would be able to do,
so I guess my understanding is off. What should the code below do?
#include "stdafx.h" // This header is empty
#include <iostream>
#include <conio.h>
#include <fstream>
int wmain(int /*argc*/, wchar_t* /*argv*/[])
{
std::wcout << L"Hello world!" << std::endl;
// Surname with AE ligature
std::wcout << L"Hello Kirit S\x00e6lensminde" << std::endl;
// Kirit transliterated (probably badly) into Greek
std::wcout << L"Hello \x039a\x03b9\x03c1\x03b9\x03c4" << std::endl;
// Kirit transliterated into Thai
std::wcout << L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17" << std::endl;
//if ( std::wcout )
// std::cout << "\nstd::wcout still good" << std::endl;
//else
// std::cout << "\nstd::wcout gone bad" << std::endl;
_cputws( L"\n\n\n" );
_cputws( L"Hello Kirit S\x00e6lensminde\n" ); // AE ligature
_cputws( L"Hello \x039a\x03b9\x03c1\x03b9\x03c4\n" ); // Greek
_cputws( L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17\n" ); // Thai
std::wofstream wout1( "test1.txt" );
wout1 << L"12345" << std::endl;
//if ( wout1 )
// std::cout << "\nwout1 still good" << std::endl;
//else
// std::cout << "\nwout1 gone bad" << std::endl;
std::wofstream wout2( "test2.txt" );
wout2 << L"Hello world!" << std::endl;
wout2 << L"Hello Kirit S\x00e6lensminde" << std::endl;
wout2 << L"Hello \x039a\x03b9\x03c1\x03b9\x03c4" << std::endl;
wout2 << L"Hello \x0e04\x0e35\x0e23\x0e34\x0e17" << std::endl;
//if ( wout2 )
// std::cout << "\nwout2 still good" << std::endl;
//else
// std::cout << "\nwout2 gone bad" << std::endl;
return 0;
}
I've compiled this on MSVC Studio 2003 and it reports the following
command line switches on a debug build (i.e. Unicode defined as the
character set and wchar_t as a built-in type):
/Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm
/EHsc /RTC1 /MLd /Zc:wchar_t /Zc:forScope /Yu"stdafx.h"
/Fp"Debug/wcout.pch" /Fo"Debug/" /Fd"Debug/vc70.pdb" /W3 /nologo /c
/Wp64 /ZI /TP
If I run this directly from the IDE then it clearly does some odd
narrowing of the output as the Greek cputws() line displays:
Hello ????t
Which to me looks like a failure in the character substitution from
Unicode to what I presume is some OEM encoding. Now don't get wrong, I
think this is a poor default situation for running something on a
Unicode platform (this is on Windows 2003 Server), but it does seem to
be beside the point for this discussion.
If I run it from a command prompt with Unicode I/O turned on (cmd.exe
/u) then the output is somewhat more encouraging, but not a lot:
Hello world!
Hello Kirit Sµlensminde
Hello
Hello Kirit Sælensminde
Hello ΚιÏιτ
Hello คีริท
The _cputws calls all work as I would expect, but std::wcout doesn't
work at all. Worse uncommenting the stream tests shows that there is an
error on std::wcout rendering it unusable from then on. Note also that
it has translated the AE ligature into what looks to me like a Greek
lower case mu. The Greek capital kappa has wedged the stream.
The two txt files are interesting. test1.txt is seven bytes long,
exactly the half the size I would naively expect and test2.txt is 45
bytes long. Exactly the length I'd expect from a char stream that only
went up to, but didn't include, the Greek capital kappa.
Now, if this is all by design then I presume that there is something
fairly simple that I can do to have all of this work in the way that I
naively expect, or does the C++ standard in some way mandate that it is
going to be really hard? Myabe it's a quality of implementation issue
and we just have to buy the library upgrade or write our own codecvt
implementations?
What we've done is to use our own implementation of a UTF-16 to UTF-8
converter (that we know works properly as it drives our web interfaces)
and just send that sequence to a std:fstream. We've had to more or
less give up on meaningful and pipeable console output.
K