P
PEK
I need some code that convert a multi-byte string to a Unicode string,
and Unicode to multi-byte. I work mostly in Windows and know how to
solve it there, but I would like to have some platform independent
code too.
I have tried with mbtowcs/wctombs but I'm not satisfied with the
result. If wctombs finds a character that can't be converted it return
-1, and stops. I would like to replace such of characters with some
special character and convert so much that is possible.
So I have written my own functions, based on mbtowc and wctomb. I have
successfully converted text from and to different codepages (I have
tried 437, 1252 and 949 [Korean, with some characters that takes two
bytes]). So I think the code is OK, but I would appreciate if someone
else look at it (so I have someone to blame ;-).
The code:
void ConvertCharToWstring(const char* from, wstring &to)
{
to = L"";
size_t pos=0;
wchar_t temp[1];
while(true)
{
size_t len = mbtowc(temp, from+pos, MB_CUR_MAX);
//Found end
if(len == 0)
return;
else if(len == (size_t)-1)
{
//Unknown character, this should never happen
pos++;
}
else
{
to += temp[0];
pos += len;
}
}
}
void ConvertWcharToString
(const wchar_t* from, string &to,
bool* datalost, char unknownchar)
{
to = "";
char* temp = new char[MB_CUR_MAX];
while(*from != L'\0')
{
size_t len = wctomb(temp, *from);
//Found end
if(len == 0)
break;
else if(len == (size_t)-1)
{
//Replace with unknown character
to += unknownchar;
if(datalost != NULL)
*datalost=true;
}
else
{
//Copy all characters
for(size_t i=0; i<len; i++)
to += temp;
}
from++;
}
delete [] temp;
}
/PEK
and Unicode to multi-byte. I work mostly in Windows and know how to
solve it there, but I would like to have some platform independent
code too.
I have tried with mbtowcs/wctombs but I'm not satisfied with the
result. If wctombs finds a character that can't be converted it return
-1, and stops. I would like to replace such of characters with some
special character and convert so much that is possible.
So I have written my own functions, based on mbtowc and wctomb. I have
successfully converted text from and to different codepages (I have
tried 437, 1252 and 949 [Korean, with some characters that takes two
bytes]). So I think the code is OK, but I would appreciate if someone
else look at it (so I have someone to blame ;-).
The code:
void ConvertCharToWstring(const char* from, wstring &to)
{
to = L"";
size_t pos=0;
wchar_t temp[1];
while(true)
{
size_t len = mbtowc(temp, from+pos, MB_CUR_MAX);
//Found end
if(len == 0)
return;
else if(len == (size_t)-1)
{
//Unknown character, this should never happen
pos++;
}
else
{
to += temp[0];
pos += len;
}
}
}
void ConvertWcharToString
(const wchar_t* from, string &to,
bool* datalost, char unknownchar)
{
to = "";
char* temp = new char[MB_CUR_MAX];
while(*from != L'\0')
{
size_t len = wctomb(temp, *from);
//Found end
if(len == 0)
break;
else if(len == (size_t)-1)
{
//Replace with unknown character
to += unknownchar;
if(datalost != NULL)
*datalost=true;
}
else
{
//Copy all characters
for(size_t i=0; i<len; i++)
to += temp;
}
from++;
}
delete [] temp;
}
/PEK