D
Dancefire
Hi, everyone,
I'm writing a program using wstring(wchar_t) as internal string.
The problem is raised when I convert the multibyte char set string
with different encoding to wstring(which is Unicode, UCS-2LE(BMP) in
Win32, and UCS4 in Linux?).
I have 2 ways to do the job:
1) use std::locale, set std::locale::global() and use mbstowcs() and
wcstombs() do the conversion.
2) use platform dependent functions to do the job, such as libiconv in
Linux, or MultiByteToWideChar() and WideCharToMultiByte() in Win32.
At first glance, it might be definitely to choose the solution 1) to
do the job. Since it's really C++ favor, and in details, the codecvt
facet is actually wrap the function by calling libiconv in Linux, and
MultiByteToWideChar() or WideCharToMultiByte() in Win32 (by different
STL implementation) to do the real job.(if my understanding is
correct).
However, I have 2 problems.
First, I have to set the global locale before I do the conversion.
There are 2 side effects, the first effect is when I do the multi-
thread program, changing the global setting will affect the other
thread using different encoding to do the conversion. Yes, I can lock
the conversion, but it make no sense to do, and cause really low
performance.
The second effect is every time I set std::locale::global() is time
consuming, create a locale object and set it to global locale is not a
light job, it does cause a low performance.
Second problem, looks like the system dependent conversion functions
support much more encoding than std::locale() by each STL
implementation. For example, libiconv support UCS-2LE encoding, but g+
+'s locale() doesn't support it. MultiByteToWideChar() support UTF8
conversion, but MSVC(8.0)'s STL std::locale() doesn't support ".65001"
for code page 65001 which is UTF8.
The locale string is not same on different platform might be the third
problem, but I can easily ignore it by #ifdef #endif.
So, back to beginning question, how should I handle the MBCS string in
C++?
Thanks.
I'm writing a program using wstring(wchar_t) as internal string.
The problem is raised when I convert the multibyte char set string
with different encoding to wstring(which is Unicode, UCS-2LE(BMP) in
Win32, and UCS4 in Linux?).
I have 2 ways to do the job:
1) use std::locale, set std::locale::global() and use mbstowcs() and
wcstombs() do the conversion.
2) use platform dependent functions to do the job, such as libiconv in
Linux, or MultiByteToWideChar() and WideCharToMultiByte() in Win32.
At first glance, it might be definitely to choose the solution 1) to
do the job. Since it's really C++ favor, and in details, the codecvt
facet is actually wrap the function by calling libiconv in Linux, and
MultiByteToWideChar() or WideCharToMultiByte() in Win32 (by different
STL implementation) to do the real job.(if my understanding is
correct).
However, I have 2 problems.
First, I have to set the global locale before I do the conversion.
There are 2 side effects, the first effect is when I do the multi-
thread program, changing the global setting will affect the other
thread using different encoding to do the conversion. Yes, I can lock
the conversion, but it make no sense to do, and cause really low
performance.
The second effect is every time I set std::locale::global() is time
consuming, create a locale object and set it to global locale is not a
light job, it does cause a low performance.
Second problem, looks like the system dependent conversion functions
support much more encoding than std::locale() by each STL
implementation. For example, libiconv support UCS-2LE encoding, but g+
+'s locale() doesn't support it. MultiByteToWideChar() support UTF8
conversion, but MSVC(8.0)'s STL std::locale() doesn't support ".65001"
for code page 65001 which is UTF8.
The locale string is not same on different platform might be the third
problem, but I can easily ignore it by #ifdef #endif.
So, back to beginning question, how should I handle the MBCS string in
C++?
Thanks.