Helmut Richter said:
[...]
I'm going to ignore the rest of this text because you aren't telling
the truth, you know that, I know that, and you know that I know that.
Addition: A discussion of the relative merits of either approach for
handling 'extended characters' could be interesting. However, I'm not
interested in trying to argue for both sides, ie, against my own
standpoint, and these "the Gods have chosen wisely and now it is for
the mortals to obey" declarations of faith (or fandom) are pointless.
My wording of "the Gods have chosen wisely and now it is for the mortals
to obey" was "I, too, have doubts that they chose the best solution."
I have only much more serious doubts that your idea to publish
implementation details as interface would have been better.
But this isn't my idea, that's just a totally generic label you have
chosen to attach to a certain standpoint regarding how 'unicode
strings' should be handled. It is also wrong to refer to this as 'my
idea' since it isn't may idea and to refer to it has not published
because it *is* part of the published documentation of perl. For
instance, to this day, the perlguts manpage contains the following
text:
To fix this, some people formed Unicode, Inc. and produced a
new character set containing all the characters you can
possibly think of and more. There are several ways of
representing these characters, and the one Perl uses is called
UTF-8. UTF-8 uses a variable number of bytes to represent a
character.
http://perldoc.perl.org/perlguts.html#Unicode-Support
Another example: I am mostly using Emacs as text editor. I do know that
when I type the character "ä" or "§" when entering text, exactly this
character will appear in the file in the encoding I choose when saving the
file. I have no idea how this character is stored internally while emacs
is underway. And that's absolutely fine with me. Why should perl not do
likewise?
Because Perl is a programming language and not a text editor and
depending on the kind of program, different strategies for UTF-8
decoding might make sense. A nice discussion of this is available in
the 'Converting the tools' section of this paper:
http://plan9.bell-labs.com/sys/doc/utf.html