J
Jared Dykstra
Christian Bau said:That is not what I said. UTF8 is better than that. Consider the case
where x, y, and a are single characters that are all three encoded to
two bytes each. It would be possible that the last byte of x matches the
first byte of a, and the first byte of y matches the second byte of a,
so the encoding of xy has the encoding of a as a substring. This is not
the case with UTF8.
.....Yes, but that is not a problem when comparing strings with a
strcmp()-esque logic. It would be a problem if looking for a
substring in a larger string. Hoewever, if strings were compared from
the beginning to end is not a problem because byte comparison would
not start on the 2nd byte of a multibyte character.
This is getting quite hypothetical. Suffice to say that it's not a
problem with UTF-8--which we've both agreed on--and with which the OP
was concerned.