R
Ralf Goertz
Hi,
since my previous post
<[email protected]> is still
unanswered I'd like to rephrase my question. In order to read/write a
wstring in UTF-8 encoding it is *not* sufficient to imbue the stream
with a locale like "de_DE.UTF-8". Doing so only takes care of facets of
decimal numbers and the like. Rather, one has to call
locale::global("de_DE.UTF-8"). Is this behaviour conforming to the
standard? And if so why? I mean why wouldn't wcin.imbue("de_DE.UTF-8")
make wcin accept UTF-8 multibyte characters while still allowing 5,7 to
be parsed as 5.7?
file wcintest.cc:
-------------
#include <iostream>
#include <string>
#include <locale>
using namespace std;
float f;
wstring euro;
int main(){
locale l("de_DE.UTF-8");
wcin.imbue(l);
locale::global(l); // (*)
wcin>>f>>euro;
wcout.imbue(locale("en_US.UTF-8"));
wcout<<f<<L" "<<euro<<endl;
}
-------------
Calling
$ echo "5,70 €" |./wcintest
in a UTF-8 environment gives
5.70 €
but only if the line marked (*) is present. Otherwise you only get
5.70
It seems as if the encoding part of the locale is ignored by the imbue
calls but I don't see why this should be the case.
I use g++ (GCC) 4.1.0 under linux (i386).
Ralf
since my previous post
<[email protected]> is still
unanswered I'd like to rephrase my question. In order to read/write a
wstring in UTF-8 encoding it is *not* sufficient to imbue the stream
with a locale like "de_DE.UTF-8". Doing so only takes care of facets of
decimal numbers and the like. Rather, one has to call
locale::global("de_DE.UTF-8"). Is this behaviour conforming to the
standard? And if so why? I mean why wouldn't wcin.imbue("de_DE.UTF-8")
make wcin accept UTF-8 multibyte characters while still allowing 5,7 to
be parsed as 5.7?
file wcintest.cc:
-------------
#include <iostream>
#include <string>
#include <locale>
using namespace std;
float f;
wstring euro;
int main(){
locale l("de_DE.UTF-8");
wcin.imbue(l);
locale::global(l); // (*)
wcin>>f>>euro;
wcout.imbue(locale("en_US.UTF-8"));
wcout<<f<<L" "<<euro<<endl;
}
-------------
Calling
$ echo "5,70 €" |./wcintest
in a UTF-8 environment gives
5.70 €
but only if the line marked (*) is present. Otherwise you only get
5.70
It seems as if the encoding part of the locale is ignored by the imbue
calls but I don't see why this should be the case.
I use g++ (GCC) 4.1.0 under linux (i386).
Ralf