Fri, 03 Jun 2011 02:12:18 +0200, /Thomas 'PointedEars' Lahn/:
I don't really see what the "display functionality" has to do with
whether one is permitted to have \uD802\uDD0E in a string literal.
It is impossible (and unnecessary) to know whether the troll is
thoroughly confused or just trying to confuse us. We wrote as if he did
not know the difference between a Unicode character and a JavaScript
character (or, to say it the way troll wants everyone to use no matter
clumsy it is: the character concept as defined and used in the ECMA-262
standard), which corresponds to the Unicode concept of “code pointâ€.
In a string literal, any \uXXXX (where X is a hexadecimal digit) is
allowed, even if it corresponds to an unassigned code point, a code
point designated as noncharacter, or a surrogate code point. Display
functionality indeed has nothing to do with this.
The interpretation of code points is a different issue, and it is a
somewhat grey are what should happen when a string is to be displayed
via alert() or inserted into an HTML document via document.write() for
example. It is absurd to label the behavior of implementations as a
“quirk†if it does the most sensible thing that can be imagined for a
surrogate pair: to interpret it as a single Unicode character.
I tested on Firefox, IE, Opera, Safari, and Chrome, and on all of them,
"\uD802\uDD0E" was treated as U+1090E (i.e., the character denoted by
the surrogate pair D902 DD0E when using UTF-16) when written with
document.write().
Browsers may have difficulties in _rendering_ non-BMP characters, but
then they have the same difficulties when the character is included in
static content, as with the reference . (You need a font like
Code2001 or some very special font to render Phoenician letters. And
something has happened to
www.code2000.net that used to host Code2001.
And the glyph for U+1090E in Code2001 looks odd – rather different from
the representative glyph in the Unicode standard.)