OK. That makes sense. I didn't see that at w3.org.
In HTML 4.01, for example, the rule is presented in a somewhat cryptic
way, namely in an SGML declaration:
<!ELEMENT OPTION - O (#PCDATA) -- selectable choice -->
http://www.w3.org/TR/REC-html40/interact/forms.html#edef-OPTION
HTML5 says it in more commonly understood terms:
"Content model:
Text."
http://www.w3.org/TR/html5/forms.html#the-option-element
However, if you want to know exactly what "text" means here, it gets a
bit more complicated. When you follow the link "Text", you might get
worried about the statement "must not contain control characters other
than space characters" - if you might need characters that control
directionality, like U+200E LEFT-TO-RIGHT MARK. But such a character
passes validation at
http://validator.nu so apparently "control
character" is not meant to cover characters in the category Cf [Other,
Format] in Unicode, probably just Cc [Other, Control].
<option value="xyz">
אבג English
</option> [...]
What happens then is that words have their inherent directionality set
by the directionality properties of characters, so e.g. the Hebrew
letters here run properly right-to-left.
I didn't know that. I had found it a bit of a mystery of why it worked.
Not that I can read a word of Hebrew, but I was told it was correct.
Hebrew is mostly Greek to me, and that's why I use simple Hebrew
characters in my tests - like the first three letters of the Hebrew
alphabet as above. Everyone knows the aleph, right?
<gripe>
I had a great deal of trouble importing unicode utf-8. Microsoft
products like Excel look for a BOM (byte order marker) which I'm told
shouldn't be needed in utf-8, only in utf-16 and utf-24. But this isn't
the first time that MS has insisted on guessing wrong (enctypes).
</gripe>
It is not uncommon to see warnings about BOM, even on W3C pages. The
warnings reflect the shortcomings of very ancient browsers and partly
problems that still exist in PHP, which cannot handle BOM properly when
joining PHP files. But BOM does not cause problems to any *browser* that
you might see in the wild these days. BOM actually *helps* even in a
utf-8 encoded file by making sure that browsers will interpret it as
utf-8, even if HTTP headers are missing or wrong.
And Latin letters run
I've been setting this only in CSS.
HTML5 CR says:
"Authors are strongly encouraged to use the dir attribute to indicate
text direction rather than using CSS, since that way their documents
will continue to render correctly even in the absence of CSS (e.g. as
interpreted by search engines)."
http://www.w3.org/TR/html5/dom.html#the-dir-attribute
Works in IE8 which is the most
primitive browser I have. CanIuse is silent on support.
Well, you could use both HTML dir attribute and CSS direction property.
But you hardly need that.
CanIuse mostly deals with HTML5 novelties (in a rather broad sense), and
the dir attribute is good old HTML 4. According to
http://reference.sitepoint.com/html/core-attributes/dir
there is support in IE 5.5+, which should be enough these days.