P
Peter J. Holzer
RTFS. tr/// is pretty highly optimised; in particular, the 'count
characters' case has its own implementation that does no copying, except
when counting non-SvUTF8 characters in a SvUTF8 string. In that case
obviously every character of the string being counted has to be
individually converted to UTF-32, so there's no allocation but there is
effectively copying.
This sort of inefficiency is unavoidable when using UTF-8 as an internal
representation, which is why certain people are trying so hard to make
perl's internal representation opaque. Everyone now knows that using
UTF-8 was a mistake, but it can't be fixed until people get used to
keeping their fingers out.
In Pike (like Perl a vaguely C-like interpreted language) strings always
consist of elements of equal length: All characters in a string
are either 1 byte or 2 bytes or 4 bytes in length. That may waste some
space if you have a string with lots of ascii characters and one 💩 in
it, but it makes most string operations simpler.
Theoretically, Perl could switch to such a model without breaking
programs (except XS code). Practically ...
hp