Naming rules for JavaScript/HTML scripting

V

VK

09/30/03 Phil Powell posted his "Radio buttons do not appear checked"
question.
This question led to a long discussion about the naming rules applying to
variables, objects, methods and properties in JavaScript/JScript and
HTML/XML elements.
Without trying to get famous :) but thinking it would be interesting to
others I decided to post the following summary:


1. Variable names in JavaScript/JScript
[ Here 'variable' means a name/value pair declared via the "var name =
value" statement. ]
JavaScript and JScript are both built on ECMA-262 language specifications,
approved as international standard ISO/IEC 16262.
The official name for both languages is ECMAScript (in the standardization
papers).
The latest version of ISO/IEC 16262 can be obtained at http://www.iso.org
Unfortunately ISO charges 218 CHF per download (why so much
and why in Swiss Francs ??)
So you either trust me, or pay to ISO, or find a butlegged copy like I did
:)

ECMA-262, Edition 3 Final, paragraph 7.6 "Identifiers", my summary:

Any variable name has to start with
_ (underscore)
$ (currency sign)
a letter from [a-z][A-Z] range
an Unicode letter in the form \uAABB (where AA and BB are hex values)

This start can be followed by any combination of _, $, letters and numbers
both in
[a-z][A-Z] and \u form.

There is no restrictions on the name length except your good common sense.

PROPER variable names:
myVariable
$myVariable (if you like Perl style)
_myVariable (if you like to confuse people)
\u79C1\u306E\u5909\u6570 (???? - means "my variable" in Japanese,
written in \u escaped Unicode characters)

WRONG variable names:
1st (starts with number)
myVariable#2nd (contains illegal character)

P.S. This paragraph specially states that \u-turns do not turn an illegal
character to a legal one.
Either you put "[" as it is or as \u005B - it still doesn't fly.


2. Names of objects on the page (HTML elements, form elements etc.)
[ These names are set via "name" or "id" attributes inside tag]
The rules for "name" and "id" are regulated by HTML specifications
of W3 Consortium. The latest official release of HTML Specification
can be obtained at http://www.w3.org (this time for absolutely free!)

HTML 4.01 Specification, paragraph 6.2, direct quote:
ID and NAME tokens must begin with a letter ([A-Za-z])
and may be followed by any number of letters, digits ([0-9]),
hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
(direct link http://www.w3.org/TR/html401/types.html#h-6.2)

So these rules are much more strict than the ECMAScript rules.


3. Identifier vs. Property confusion
[ Special thanks to <Lasse Reichstein Nielsen> and <Richard Cornford>
who spent their time and energy explaining it to me. ]

If you really want to, then sometimes you CAN give any imaginable name to
any object.
In IE for example you can create checkbox like this:
<input type="checkbox" name="[:)" value="0" checked>
or a div:
<div id="^[:) ]#">...</div>

In IE originally nothing really bad will happen.
The sky will not fall on the earth, checkbox will remain "checkable", and
the div
will display its content on the page.

It happens because of the dual nature of the "name" and "id" attributes.
From one side they are supposed to be the identifiers of the corresponding
objects:
NameYouGave.propertyName or IdYouGave.propertyName
From other side they are just string values of object.name and object.id
properties.
So a good-hearted browser may say: "OK, it cannot be used as an identifier,
but it's still a valid string value for a property, so let it be". I don't
think it's right,
but it is what it is.
Let's take the <input type="checkbox" name="[:)"> sample :
We cannot use its name as identifier:
alert(document.forms[0].[:).name) will make your script very surprised.
But we can access its name as a value:
alert(document.forms[0].elements[0].name)
The question remains who and why would want to assign identifiers
which cannot be used as identifiers?
Another question: should browsers be so liberal? Maybe a better practice
would be to force
authors to follow HTML specs?


4. End
"Any name has to start with a letter and consist of any combination of
letters, numbers and underscores to the total length of 255 characters.
(Where the "letter" means any character from the [a-z][A-Z] range)".

Always and everywhere follow this simple rule, and you will never have to
worry
about boring standards and compatibility issues.
At least not in the naming.
 
R

Richard Cornford

VK said:
09/30/03 Phil Powell posted his "Radio buttons do not
appear checked" question.
This question led to a long discussion about the naming rules
applying to variables, objects, methods and properties in
JavaScript/JScript and HTML/XML elements.
Without trying to get famous :) but thinking it would be
interesting to others I decided to post the following summary:

1. Variable names in JavaScript/JScript
[ Here 'variable' means a name/value pair declared via the
"var name = value" statement. ]
ECMA-262, Edition 3 Final, paragraph 7.6 "Identifiers", my summary:

Any variable name has to start with
_ (underscore)
$ (currency sign)
a letter from [a-z][A-Z] range an Unicode letter in the
form \uAABB (where AA and BB are hex values)

This start can be followed by any combination of _, $,
letters and numbers both in [a-z][A-Z] and \u form.
2. Names of objects on the page (HTML elements, form elements
etc.) [ These names are set via "name" or "id" attributes
inside tag] The rules for "name" and "id" are regulated by
HTML specifications of W3 Consortium. The latest official
release of HTML Specification can be obtained at
http://www.w3.org (this time for absolutely free!)

HTML 4.01 Specification, paragraph 6.2, direct quote:
ID and NAME tokens must begin with a letter ([A-Za-z])
and may be followed by any number of letters, digits ([0-9]),
hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
(direct link http://www.w3.org/TR/html401/types.html#h-6.2)

I was under the misguided impression that this paragraph from types.html
applied to HTML NAME attributes so it might be useful to clarify how
these restrictions do apply. There are two separate concepts here; NAME
and ID attributes and the NAME and ID "tokens" referred to in the
paragraph above. The HTML versions of the HTML 4 DTDs make the
relationship clear. For example:-

<!--============== Generic Attributes ====================-->
<!ENTITY % coreattrs
id ID #IMPLIED -- document-wide unique id --
...
- from the DTD defines the generic attribute - id - (lowercase) as
containing a string that conforms to the definition of the token - ID -
(uppercase) in types.html. This is clear because uppercase ID is a link
with href="../types.html#type-id" and the fragment identifier "#type-id"
is to be found in types.html in the paragraph quoted above. As a result
all HTML id attributes are restricted to the character set and sequence
described in that paragraph.

Similarly:-

<!ELEMENT META - O EMPTY -- generic metainformation -->
<!ATTLIST META
...
name NAME #IMPLIED -- metainformation name --
...
- associates the - name - (lowercase) attribute with the - NAME -
(uppercase) token and it links to href="../types.html#type-name", with
the "#type-name" fragment identifier also present in the above quoted
paragraph.

However:-

<!ELEMENT A - - (%inline;)* -(A) -- anchor -->
<!ATTLIST A
...
name CDATA #IMPLIED -- named link end --
...
- associates the - name - (lowercase) attribute with the - CDATA -
token. The CDATA link uses href="../types.html#type-cdata", and the
"#type-cdata" fragment identifier is to be found in types.html within
the LI element that precedes the LI element in which the above paragraph
appears. That means that the rules that apply to the contents of this
CDATA name attribute are specified by the preceding part of the HTML
specification and the comments about NAME and ID tokens do not apply to
it.

The majority of - name - attributes specified in the DTDs are associated
with the CDATA specification.
So these rules are much more strict than the ECMAScript rules.

3. Identifier vs. Property confusion
[ Special thanks to <Lasse Reichstein Nielsen> and <Richard
Cornford> who spent their time and energy explaining it to me. ]

But apparently still did not get across the important distinction
between ECMA Script Identifiers (to which the restrictions that you
quoted above do apply) and ECMA Script property names to which no
restrictions apply (according to the specification).
If you really want to, then sometimes you CAN give any
imaginable name to any object.
In IE for example you can create checkbox like this:
<input type="checkbox" name="[:)" value="0" checked>

And that one is valid HTML.
or a div:
<div id="^[:) ]#">...</div>

But that one is not as the rules for the ID token apply to the id
attribute.

It happens because of the dual nature of the "name" and "id"
attributes. From one side they are supposed to be the
identifiers of the corresponding objects:

The values of the name an id attributes are not identifiers (in the
sense that ECMA Script defines Identifier), they will be used as
property names in the DOM but ECMA Script places no restrictions on
property names.

The ECMA Script definition of - Identifier - _only_ applies to ECMA
Script source code. It does not impact on the naming of properties in
the DOM (or any other ECMA Script objects).

So a good-hearted browser may say: "OK, it cannot be used as
an identifier, but it's still a valid string value for a property,
so let it be". I don't think it's right,

It conforms to the ECMA Specification so it is right and has nothing to
do with the browser being "good-hearted".
but it is what it is. Let's take the
<input type="checkbox" name="[:)"> sample :

Valid HTML.
We cannot use its name as identifier:
alert(document.forms[0].[:).name) will make your script very
surprised.

It is a very obvious syntax error.
But we can access its name as a value:
alert(document.forms[0].elements[0].name)

And you can access it as:-

document.forms[0]['[:)'].name
-or-
document.forms[0].elements['[:)'].name

- either of which are completely valid ECMA Script Property Accessors.
The question remains who and why would want to assign
identifiers which cannot be used as identifiers?

That question should be "The question remains who and why would want to
assign *property names* which cannot be used as identifiers?". And the
answer is; anyone who wants to (as it is completely ECMA Script legal).
Another question: should browsers be so liberal? Maybe a
better practice would be to force authors to follow HTML specs?

A question that should be addressed to the browser authors as ECMA
Script can cope with any character sequence in DOM property names so it
is not really relevant to this group if the browsers choose to allow
names/ids that are not valid HTML.

On the other hand, the paragraph you quoted still allows valid id
attributes that could not be valid ECMA Script Identifiers. Which means
that it would still be a good thing that ECMA Script allows any property
name to be used if browsers were strict in their interpretation of HTML.
4. End
"Any name has to start with a letter and consist of any combination of
letters, numbers and underscores to the total length of 255 characters.
(Where the "letter" means any character from the [a-z][A-Z] range)".

The ECMA Script specification places no restrictions on the character
set or sequence in property names, only upon characters used as
Identifiers in ECMA Script source code.
Always and everywhere follow this simple rule, and you will never
have to worry about boring standards and compatibility issues.
At least not in the naming.

Otherwise understanding those standards would be the best plan.

Richard.
 
L

Lasse Reichstein Nielsen

Good summary, except that the name attribute doesn't contain a NAME
token, and is not restricted at all.
The question remains who and why would want to assign identifiers
which cannot be used as identifiers?

Me, because it might make sense.

I always write
document.forms['formname'].elements['elemname']
instead of
document.forms.formname.elements.elemname

The two are equivalent if formname and elemname are legal identifiers.
I still prefer the bracket-notation with quoted strings because it
separates the namespaces (HTML vs Javascript). Keeping distinct things
separate is simple defensive programming. It avoids restrictions on
one from interfering with the other. In this case: Javascript identifer
restrictions doesn't interfere with HTML name attributes.

You seem to be advocating that the arbitrary restrictions on
Javascript identifiers should propagate back to the specification of
HTML. I disagree. Keep distinct things separate.
Another question: should browsers be so liberal? Maybe a better
practice would be to force authors to follow HTML specs?

YES! Well, at least a resounding "maybe"!
It won't happen, though, and if it did, 99% of the current web would
become unusable.


/L
 
R

Richard Cornford

PHP is the biggest culprit to the name[] convention. When a form
is submitted, any names with [] are associated as an array. It
can be dealt with manually (making all fields that start with
fieldname an array). Too much work I suppose
:-(

This was an odd decision on the part of the creators of PHP. In JSP (and
Java servlets) the method:-

request.getParameterValues(java.lang.String name)

- returns an array of all of the same-nave values from the request (one
element long if there was only one value, or null if the name does not
appear) and works with any name. So there is nothing in the nature of
HTTP requests that makes it necessary to impose restrictions on the HTML
name attributes used on form elements in order to generate an array in
the server code.

At least it turns out that the PHP authors weren't breaching the HTML
specs when they chose the square bracket characters as the facilitator
of their convenience method, even if it does cause headaches for PHP
authors who attempt client side validation (at least at first).

Richard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top