Well, those two are easy because of their fixed character lenghts.
No they're not.
It's really not something I plainly assumed, it's something I've already
*implemented* myself. And it was a pain. Especially if you want halfways
decent recognition of whitespace (UTF-8 provides "hard spaces" etc.) or
quotation marks (the regular " is easy, but there's ? (that one in upper
too) and french >><< ones). Saying "I want the third character" isn't
trivial anymore since there is no direct mapping of characters to bytes.
But exactly the same problems affect UTF-16 and UTF-32. Some
characters require more than one element; some characters may
have multiple representations, etc., etc.
Implementing good character handling, anytime you go beyond the
basic character set, is difficult. I agree. But using UTF-8
doesn't make it substantially any more difficult, and generally
results in much faster code because of better locality. In
fact, I switched from UTF-32 to UTF-8 because it ended up
simpler.
Searching every time in a long text is of order O(n) -
compared to O(1) with ISO8859-1/UTF-16/UTF-32 strings.
Therefore one has to cleverly implement this so searching in
large texts doesn't take too much time.
Just thinking of it upsets my stomach a little, seriously. It
ain't pretty.
Having implemented complicated character handling code in both
UTF-32 and UTF-8, I can assure you that if you do it correctly,
UTF-8 is no more difficult (and maybe even a little bit easier)
than UTF-32. The only real difference is that if you do it
incorrectly, you'll probably hit the problem immediately (even
with purely English text) with UTF-8, where as you'll only hit
it in exceptional cases with UTF-32. (And of course, it really
depends on what you are doing. If you limit yourself to NFC and
European languages, UTF-16/UTF-32 is a probably simpler for
something like an editor, but UTF-8 would still be simpler for
parsing.)