J
Jake Barnes
I'm afriad the below is almost gibberish to me. What do these 5
formulations have in common? Is it true that they all specify the same
character? How is that possible?
====================================
http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucs
An important note for developers of UTF-8 decoding routines: For
security reasons, a UTF-8 decoder must not accept UTF-8 sequences that
are longer than necessary to encode a character. For example, the
character U+000A (line feed) must be accepted from a UTF-8 stream only
in the form 0x0A, but not in any of the following five possible
overlong forms:
0xC0 0x8A
0xE0 0x80 0x8A
0xF0 0x80 0x80 0x8A
0xF8 0x80 0x80 0x80 0x8A
0xFC 0x80 0x80 0x80 0x80 0x8A
formulations have in common? Is it true that they all specify the same
character? How is that possible?
====================================
http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucs
An important note for developers of UTF-8 decoding routines: For
security reasons, a UTF-8 decoder must not accept UTF-8 sequences that
are longer than necessary to encode a character. For example, the
character U+000A (line feed) must be accepted from a UTF-8 stream only
in the form 0x0A, but not in any of the following five possible
overlong forms:
0xC0 0x8A
0xE0 0x80 0x8A
0xF0 0x80 0x80 0x8A
0xF8 0x80 0x80 0x80 0x8A
0xFC 0x80 0x80 0x80 0x80 0x8A