John said:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You've forgotten that there is a lexical layer of parsing that turns the ^^^^^^^^^
character stream into tokens. It must use lookahead to decide that the + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
character in +3 is a + token and not part of a += token or ++ token.
ISTM you have a reading problem and are therefore preaching to the choir.
You've forgotten that the lexical layer of parsing also has a syntax
specification.
I do not think *he* has forgotten anything.
Explain where in
Identifier ::
IdentifierName but not ReservedWord
etc.
it says that an identifier ends with
[lookahead NotIn identifier character]
which is what you are implying it has. It doesn't. Instead there is a
processing rule in section 7, Lexical Conventions, namely :
"The source text is scanned from left to right, repeatedly taking the
longest possible sequence of characters as the next input element."
My point is that without that processing rule, 'newb' can be parsed into
the tokens 'new' 'b', among many others.
Your logic is flawed. Whatever you may have *meant*, you have *said*:
| But there are other processing rules that are hardly even noticed. For
| instance consider
| a = newb();
|
| According to the syntax specification 'newb' can be parsed as the
| keyword 'new' followed by the identifier 'b', or as just the identifier
| 'newb'. (And several less likely other possibilities).
This statement is simply wrong, and has received rightful criticism in the
process.
According to the Syntactic Grammar, the expression above is to be parsed as
follows:
0. AssignmentExpression
1. LeftHandSideExpression = AssignmentExpression
2. NewExpression = ConditionalExpression
3. MemberExpression = LogicalORExpression
4. PrimaryExpression = LogicalANDExpression
5. Identifier = BitwiseORExpression
6. a = BitwiseXORExpression
7. a = BitwiseANDExpression
8. a = EqualityExpression
9. a = RelationalExpression
10. a = ShiftExpression
11. a = AdditiveExpression
12. a = MultiplicativeExpression
13. a = UnaryExpression
14. a = PostFixExpression
15. a = LeftHandSideExpression
16. a = CallExpression
17. a = MemberExpression Arguments
18. a = PrimaryExpression Arguments
19. a = Identifier Arguments
20. a = newb ()
As Sean has stated correctly, there is no way this could be parsed into
anything else because tokens are separated by whitespace (among other
sequences). Cf. the Lexical Grammar:
InputElementDiv ::
WhiteSpace
LineTerminator
Comment
Token
DivPunctuator
So `newb' can *never* be parsed into two tokens, "new" and "b", which leaves
only CallExpression to be produced from
LeftHandSideExpression :
NewExpression
CallExpression
NewExpression :
MemberExpression
new NewExpression
CallExpression :
MemberExpression Arguments
CallExpression Arguments
CallExpression [ Expression ]
CallExpression . IdentifierName
in step 15. And we do not need the prose's assertion that parsing will take
the longest possible string as the next input element, for that is how a
token is defined in the Lexical Grammar:
Token ::
IdentifierName
Punctuator
NumericLiteral
StringLiteral
PointedEars