UNICODE I/O

A

Ali

I would like to write a program that can handle non-ANSI characters.
The following code does not work, the output is empty:

#include <fstream>
#include <string>
using namespace std;

int main() {
wofstream out(L"log");
if (!out)
exit(127);
wstring s(L"õûíÕÛÍ");
out << s << endl;
out.close();
return 0;
}

The log file is in ANSI char encoding, and probably that is why the
output fails.

Could anyone help me how to do this?

Many thanks.
 
A

Ali

Well, the example string L"õûíÕÛÍ" is not displayed correctly in my
previous e-mail (even though UTF-8 char encoding is used).
 
M

Michael DOUBEZ

Ali a écrit :
Well, the example string L"õûíÕÛÍ" is not displayed correctly in my
previous e-mail (even though UTF-8 char encoding is used).

That is not what the header of the mail says:
First mail:
Content-Type: text/plain; charset=ISO-8859-2
Second mail:
Content-Type: text/plain; charset=ISO-8859-1

Michael
 
A

Ali

Ali a écrit :


That is not what the header of the mail says:
First mail:
Content-Type: text/plain; charset=ISO-8859-2
Second mail:
Content-Type: text/plain; charset=ISO-8859-1

Michael

Dear Michael,

This is mysterious. The web page i have sent the mail from uses UTF-8.
I have no idea how the messages became ISO-8859-1 and 8859-2, and i do
not know how to control the encoding of the messages.

A good canditate for trouble making is letter o" ( ő in HTML
UTF-8). It is displayed correctly in the IDE is wstring s, but is not
written to the file.

Thanks,

Ali
 
M

Michael DOUBEZ

Ali a écrit :
Dear Michael,

This is mysterious. The web page i have sent the mail from uses UTF-8.
I have no idea how the messages became ISO-8859-1 and 8859-2, and i do
not know how to control the encoding of the messages.

A good canditate for trouble making is letter o" ( ő in HTML
UTF-8). It is displayed correctly in the IDE is wstring s, but is not
written to the file.

I guess your problem with writing to a file is the same as writing the
post :)

Have you tried to imbue the local:

typedef std::codecvt_byname<wchar_t, char, std::mbstate_t> WCvt;
const std::locale unicode(std::locale("C"), new WCvt("UCS2"));

int main() {
wofstream out;
out.rdbuf()->pubimbue(unicode);
out.open(L"log");
if (!out)
exit(127);
wstring s(L"őűíÅÅ°Ã");
out << s << endl;
out.close();
return 0;
}

Michael
 
M

Michael DOUBEZ

Michael DOUBEZ a écrit :
Ali a écrit :

I guess your problem with writing to a file is the same as writing the
post :)

Have you tried to imbue the local:

typedef std::codecvt_byname<wchar_t, char, std::mbstate_t> WCvt;
const std::locale unicode(std::locale("C"), new WCvt("UCS2"));

int main() {
wofstream out;
out.rdbuf()->pubimbue(unicode);
out.open(L"log");

Should be out.open("log");

open() takes a char* for the filename.
 
A

Ali

Ali a écrit :







I guess your problem with writing to a file is the same as writing the
post :)

Have you tried to imbue the local:

typedef std::codecvt_byname<wchar_t, char, std::mbstate_t> WCvt;
const std::locale unicode(std::locale("C"), new WCvt("UCS2"));

int main() {
wofstream out;
out.rdbuf()->pubimbue(unicode);
out.open(L"log");
if (!out)
exit(127);
wstring s(L"őűíÅÅ°Ã");
out << s << endl;
out.close();
return 0;

}

Michael

Dear Michael,

Thank you for your help.

The following code works here:

#include <fstream>
#include <string>
#include <locale>
using namespace std;

int main() {
std::locale::global(std::locale(""));
wofstream out(L"log");
out.imbue(std::locale());
if (!out)
exit(127);
wstring s(L"őűíÅÅ°Ã");
out << s << endl;
const bool state = out.good();
out.close();
return 0;
}

I will try your code now.

Thanks,

Ali
 
T

Tobias Blomkvist

I would like to write a program that can handle non-ANSI characters. The
following code does not work, the output is empty:

#include <fstream>
#include <string>
using namespace std;

int main() {
wofstream out(L"log");
if (!out)
exit(127);
wstring s(L"őűíÅÅ°Ã");
out << s << endl;
out.close();
return 0;
}

The log file is in ANSI char encoding, and probably that is why the
output fails.

Could anyone help me how to do this?

Remember that everything "wide" is not automatically unicode.
Try saving the source file in utf-8 and open ordinary non-wide
streams and output the string. Then you need to open the log
file with a program that understands utf-8 and can display the
actual characters.

The following outputs correctly on my terminal when I save the
source file as utf-8:

#include <iostream>

int main() {
std::cout << "őűíÅÅ°Ã" << std::endl;
return 0;
}
 
A

Ali

Ali a écrit :







I guess your problem with writing to a file is the same as writing the
post :)

Have you tried to imbue the local:

typedef std::codecvt_byname<wchar_t, char, std::mbstate_t> WCvt;
const std::locale unicode(std::locale("C"), new WCvt("UCS2"));

int main() {
wofstream out;
out.rdbuf()->pubimbue(unicode);
out.open(L"log");
if (!out)
exit(127);
wstring s(L"őűíÅÅ°Ã");
out << s << endl;
out.close();
return 0;

}

Michael

Dear Micheal,

Just one more question. How can i convert (preferably using C++ STL)
an UTF-16 wstring to an UTF-8 array of char-s, needed for sqlite3?

I guess the solution begins with something similar to the lines

typedef std::codecvt_byname<wchar_t, char, std::mbstate_t> WCvt;
const std::locale unicode(std::locale("C"), new WCvt("UCS2"));

You wrote but i do not undearstand them.

Many thanks,

Ali
 
A

Ali

Dear Micheal,

Just one more question. How can i convert (preferably using C++ STL)
an UTF-16 wstring to an UTF-8 array of char-s, needed for sqlite3?

I guess the solution begins with something similar to the lines

typedef std::codecvt_byname<wchar_t, char, std::mbstate_t> WCvt;
const std::locale unicode(std::locale("C"), new WCvt("UCS2"));

You wrote but i do not undearstand them.

Many thanks,

Ali

The size_t wcstombs(char *s, const wchar_t *pwcs, size_t n) looks like
the function i need, LC_CTYPE should be changed to tell the function i
would like the output in UTF-8.

However i failed to change LC_CTYPE. I tried setlocale(LC_CYTPE,
"en_US.utf8") and variants but none of them worked (MS VS2005). How
shell i do this?

(I am aware of function WideCharToMultiByte and it works fine but this
is an MS function, i would prefer a standard function.)
 
M

Michael DOUBEZ

Ali a écrit :
The size_t wcstombs(char *s, const wchar_t *pwcs, size_t n) looks like
the function i need, LC_CTYPE should be changed to tell the function i
would like the output in UTF-8.

However i failed to change LC_CTYPE. I tried setlocale(LC_CYTPE,
"en_US.utf8") and variants but none of them worked (MS VS2005). How
shell i do this?

(I am aware of function WideCharToMultiByte and it works fine but this
is an MS function, i would prefer a standard function.)


wcstombs should work.

IIRC mbstowcs ans wcstombs are not affected with LC_TYPE category.
Have you tried instead:
setlocale(LC_ALL, "en_US.utf8");

or

std::locale en("en_US.utf8"); //or "en_US.UTF-8"
std::locale::global(en);

Now, if you can convert strings the hard way, without changing the locale:

//the command to convert
std::wstring cmd;
//the locale you want
const std::locale en("en_US.utf8");

//create conversion object
typedef std::codecvt<wchar_t, char, std::mbstate_t> cvt_t;
const cvt_t conv = std::use_facet<cvt_t>(en);
//don't care about state
std::mbstate_t state;

//reserve enough data
std::vector<char> str_output(
(cmd.size() + 1) * conv.max_length()
, '\0');

const wchar_t* input = cmd.c_str();
char* output = &str_output[0];

//proceed to conversion
cvt_t::result res = conv.out(state,
cmd.c_str(), cmd.c_str() + cmd.size(), input,
str_output, &str_output[0] + outbuf.size(), output);

std::string cmd_utf8(outbuf.begin(), outbuf.end());


Perhaps there are other ways.


Michael
 
M

Michael DOUBEZ

Ali a écrit :
The size_t wcstombs(char *s, const wchar_t *pwcs, size_t n) looks like
the function i need, LC_CTYPE should be changed to tell the function i
would like the output in UTF-8.

However i failed to change LC_CTYPE. I tried setlocale(LC_CYTPE,
"en_US.utf8") and variants but none of them worked (MS VS2005). How
shell i do this?

(I am aware of function WideCharToMultiByte and it works fine but this
is an MS function, i would prefer a standard function.)

wcstombs should work.

IIRC mbstowcs ans wcstombs are not affected with LC_TYPE category.
Have you tried instead:
setlocale(LC_ALL, "en_US.utf8");

or

std::locale en("en_US.utf8"); //or "en_US.UTF-8"
std::locale::global(en);

Now, if you can convert strings the hard way, without changing the locale:

//the command to convert
std::wstring cmd;
//the locale you want
const std::locale en("en_US.utf8");

//create conversion object
typedef std::codecvt<wchar_t, char, std::mbstate_t> cvt_t;
const cvt_t conv = std::use_facet<cvt_t>(en);
//don't care about state
std::mbstate_t state;

//reserve enough data
std::vector<char> str_output(
(cmd.size() + 1) * conv.max_length()
, '\0');

const wchar_t* input = cmd.c_str();
char* output = &str_output[0];

//proceed to conversion
cvt_t::result res = conv.out(state,
cmd.c_str(), cmd.c_str() + cmd.size(), input,
str_output, &str_output[0] + str_output.size(), output);

//copy output
std::string cmd_utf8(str_output.begin(), str_output.end());


Perhaps there are other ways.


Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

fstream File i/o 1
Converting EBCDIC to Unicode 3
Unicode I/O 10
Hardcoding a Unicode String(looks not work) 4
Write non-unicode in file? 8
I dont't understand UNICODE issues... 0
Chatbot 0
Text File I/O 2

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,947
Members
47,498
Latest member
log5Sshell/alfa5

Latest Threads

Top