unicode std::string

P

Panjandrum

Rolf said:
So you say unicode is only UTF-8? And std::string is always UTF-16?

std::wstring and std::string are both not appropriate for
variable-length character encodings like UTF-8.
 
R

Rolf Magnus

Panjandrum said:
std::wstring and std::string are both not appropriate for
variable-length character encodings like UTF-8.

Yes, but there are unicode encodings that don't need variable-length
characters.
 
O

Old Wolf

std::wstring does not specify any encoding type or any character set.

The most common implementations of std::wstring use Unicode
character set, and either UCS-4, UCS-2 or UTF-16 encoding.
But this is not a requirement.
std::wstring and std::string are both not appropriate for
variable-length character encodings like UTF-8.

std::string is appropriate for UTF-8.

But you must remember that functions like size() and find() will
apply to the encoded bytes, not to the decoded version.

If that was your issue, then you could also say that std::wstring
is not appropriate for any Unicode encoding, because of combining
characters (ie. the string length won't match the number of
display characters).

Of course the correct answer is that you should engage your
brain when manipulating Unicode strings, and be aware of
these issues. Many applications do indeed use std::string for
UTF-8 processing.
 
P

Panjandrum

Old said:
std::string is appropriate for UTF-8.

But you must remember that functions like size() and find() will
apply to the encoded bytes, not to the decoded version.

You mean std::string is appropriate, "just" most member functions don't
work?
If that was your issue, then you could also say that std::wstring
is not appropriate for any Unicode encoding, because of combining
characters (ie. the string length won't match the number of
display characters).

You use normalized UTF-16.
Of course the correct answer is that you should engage your
brain when manipulating Unicode strings, and be aware of
these issues. Many applications do indeed use std::string for
UTF-8 processing.

Disburden the brain by using sufficient tools for the task.
 
M

msalters

Panjandrum schreef:
You mean std::string is appropriate, "just" most member functions don't
work?

"don't work" is nonsense. They work, but the results are not what you
would like.
You use normalized UTF-16.

Since Unicode 4, Unicode characters can no longer be encoded in 16
bits. Even with normalization you need 20 bits, or pairs of 16 bits.
However, there are implementations in which wchar_t is 32 bits.
(Note: wchar_t doesn't haave to be Unicode)

Regards,
Michiel Salters
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,297
Messages
2,571,536
Members
48,282
Latest member
Xyprime

Latest Threads

Top