D
David Mathog
Does any standard C function support reading or writing UTF-8?
I'm not talking about the trivial case where the text is just the
ASCII subset of UTF-8. Rather, I'm referring to a hypothetical
function that could read UTF-8 when 2, 3, or even 4 byte encodings are
present and store the final unencoded character in, I guess, an array
of 32 bit integers.
I'm guessing that there _might_ be functions for this somewhere
in the C standard because trying to apply typical text manipulations
on a UTF-8 string directly seems to be quite messy and slow.
For instance, even a simple operation like "swap characters 1002->1005
with 2007->2010" would be a pain, you'd pretty much have to
parse from the beginning of the UTF-8 string
just to find the specified ranges, and then they might be different
numbers of bytes. So even though the number of characters is the same
they couldn't just be swapped byte for byte.
Thanks,
David Mathog
I'm not talking about the trivial case where the text is just the
ASCII subset of UTF-8. Rather, I'm referring to a hypothetical
function that could read UTF-8 when 2, 3, or even 4 byte encodings are
present and store the final unencoded character in, I guess, an array
of 32 bit integers.
I'm guessing that there _might_ be functions for this somewhere
in the C standard because trying to apply typical text manipulations
on a UTF-8 string directly seems to be quite messy and slow.
For instance, even a simple operation like "swap characters 1002->1005
with 2007->2010" would be a pain, you'd pretty much have to
parse from the beginning of the UTF-8 string
just to find the specified ranges, and then they might be different
numbers of bytes. So even though the number of characters is the same
they couldn't just be swapped byte for byte.
Thanks,
David Mathog