I'm afriad the below is almost gibberish to me. What do these 5
formulations have in common? Is it true that they all specify the same
character? How is that possible?
====================================
http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucs
An important note for developers of UTF-8 decoding routines: For
security reasons, a UTF-8 decoder must not accept UTF-8 sequences that
are longer than necessary to encode a character. For example, the
character U+000A (line feed) must be accepted from a UTF-8 stream only
in the form 0x0A, but not in any of the following five possible
overlong forms:
0xC0 0x8A
0xE0 0x80 0x8A
0xF0 0x80 0x80 0x8A
0xF8 0x80 0x80 0x80 0x8A
0xFC 0x80 0x80 0x80 0x80 0x8A