UTF-8 in char*

J

Jared Dykstra

Christian Bau said:
That is not what I said. UTF8 is better than that. Consider the case
where x, y, and a are single characters that are all three encoded to
two bytes each. It would be possible that the last byte of x matches the
first byte of a, and the first byte of y matches the second byte of a,
so the encoding of xy has the encoding of a as a substring. This is not
the case with UTF8.

.....Yes, but that is not a problem when comparing strings with a
strcmp()-esque logic. It would be a problem if looking for a
substring in a larger string. Hoewever, if strings were compared from
the beginning to end is not a problem because byte comparison would
not start on the 2nd byte of a multibyte character.

This is getting quite hypothetical. Suffice to say that it's not a
problem with UTF-8--which we've both agreed on--and with which the OP
was concerned.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,129
Messages
2,570,770
Members
47,329
Latest member
FidelRauch

Latest Threads

Top