FAQ Topic - How can I create a Date object from a String? (2011-02-15)

J

John G Harris

The if-structure, including or not including the else substructure,
should "know" by itself when it's statement is ended.

Depending on the start of the next statement [or the end of file],
is to weak to be called firm structure. Even when else is defined as
innermost-else, the structure could collapse and give erroneus results
when changed by a unaware programmer, who trusts the structuredness of
such language.
<snip>

Consider a statement like this :

if ... else if ... else if ... else if ... else last_statement

You would prefer to follow this with

fi fi fi fi

at the end, always. Does this make it any easier to read and understand?

Also, the statement ends at the point where last_statement ends (which
can only be at a ; or } symbol). There is nothing weak, unstructured, or
dependent on the start of the next statement about this.

John
 
L

Lasse Reichstein Nielsen

John G Harris said:
The if-structure, including or not including the else substructure,
should "know" by itself when it's statement is ended.

Depending on the start of the next statement [or the end of file],
is to weak to be called firm structure. Even when else is defined as
innermost-else, the structure could collapse and give erroneus results
when changed by a unaware programmer, who trusts the structuredness of
such language.
<snip>

Consider a statement like this :

if ... else if ... else if ... else if ... else last_statement

You would prefer to follow this with

fi fi fi fi

at the end, always. Does this make it any easier to read and understand?

Languages with "fi" often also has an "elsif" keyword to handle this case.

/L
 
E

Evertjan.

John G Harris wrote on 27 feb 2011 in comp.lang.javascript:
The if-structure, including or not including the else substructure,
should "know" by itself when it's statement is ended.

Depending on the start of the next statement [or the end of file],
is to weak to be called firm structure. Even when else is defined as
innermost-else, the structure could collapse and give erroneus results
when changed by a unaware programmer, who trusts the structuredness of
such language.
<snip>

Consider a statement like this :

if ... else if ... else if ... else if ... else last_statement

You would prefer to follow this with

fi fi fi fi

Not me, but you perhaps. ;-)


While the idea is sound, Javascript usually does not need nor have a word
in such position, typicallit it does not need a "then" after an "if
(boolean).

I agree that is just semantics, but it pleases the eye to be consistent.
at the end, always. Does this make it any easier to read and
understand?

Structure is not about ease of reading and understanding, but about
parsing consistency.

A consistent closing of a statment is part of a sound structure.
Also, the statement ends at the point where last_statement ends (which
can only be at a ; or } symbol). There is nothing weak, unstructured,
or dependent on the start of the next statement about this.

I showed earlier with an example that that is inconsistent,
since the statement end is only known by the start of the next statement
in that example.

Just saying that it is not so is not very helpful,
better show an error in my reasoning about that example..
 
J

John G Harris

John G Harris wrote on 27 feb 2011 in comp.lang.javascript:


I showed earlier with an example that that is inconsistent,
since the statement end is only known by the start of the next statement
in that example.

If you look at the syntax specification in ECMA 262 you will see that
every statement finishes in one of three ways :
a) with a nested, shorter, statement
b) with a semicolon
c) with a close curly bracket.
In case (a) the nested statement ends in one of three ways ... . As no
statement has infinite length then every statement finishes with case
(b) or (c), i.e with ; or }. QED

Just saying that it is not so is not very helpful,
better show an error in my reasoning about that example..

I'm not sure which example you are referring to. Presumably you are not
talking about automatic semicolon insertion, an additional job the
parser is asked to do because too many people loudly claim that a
'scripting' language does NOT have semicolons.


John
 
E

Evertjan.

John G Harris wrote on 28 feb 2011 in comp.lang.javascript:
If you look at the syntax specification in ECMA 262 you will see that
every statement finishes in one of three ways :
a) with a nested, shorter, statement
b) with a semicolon
c) with a close curly bracket.
In case (a) the nested statement ends in one of three ways ... . As no
statement has infinite length then every statement finishes with case
(b) or (c), i.e with ; or }. QED

Quoting the specs will not help much were they are overlooking the error.

Your QED dos not do much demonstrandum here.
I'm not sure which example you are referring to.

That is bad, because you tried to prove my reasonioing was wrong.
Presumably you are not
talking about automatic semicolon insertion, an additional job the
parser is asked to do because too many people loudly claim that a
'scripting' language does NOT have semicolons.

Presuming will not help here, looking back at my example will.
You will see that the insertion of an ; does not make the structural flaw
go away. The "fi" would, but is alas not available.

I wrote:

## Depending on the start of the next statement [or the end of file],
## is to weak to be called firm structure. Even when else is defined as
## innermost-else, the structure could collapse and give erroneus results
## when changed by a unaware programmer, who trusts the structuredness of
## such language.
##
## Example:
##
## if (boolean)
## if (boolean) statement1
## else statement2
## else statement3
## ;
##
## and:
##
## if (boolean)
## if (boolean) statement1
## // else statement2
## else statement3
## ;
##
## when you take out the line "else statement2"
## the final else [of "else statement3"] will suddenly be part of the
## inner if-construct.
##
## If the if-statement has a structurally properly designed end,
## this would not happen.
##
## Typically an effect of unsound structure.
##
## Adding the now necessary {}s is just patchwork, the primary strutuse
## should be sound in itself.
 
L

Lasse Reichstein Nielsen

Evertjan. said:
Structure is not about ease of reading and understanding, but about
parsing consistency.

I would claim that the two are the same.
If a language is consistent in its grammar, it is easier to read and
understand, because you don't need to consider whether this is an
exception or the rule.
A consistent closing of a statment is part of a sound structure.

I agree.
I also think C (and by inheritance Javascript) is consistent.
It's not structured the way you want it, so you can't see
whether an if statement ends or is continued by an else part
without checking for the else part. You are allowed one token
of lookahed (C can be parsed by a LALR(1) parser).
But it is consistent: All statements end in either ';' or '}'.
No statement needs to end in '};'.
I showed earlier with an example that that is inconsistent,
since the statement end is only known by the start of the next statement
in that example.

True. One token of lookahead is necessary to know if an if-statement
ends here.

Just like one token of lookahead is necessary to see where an
expression ends, e.g., "a = a + 2 * 3 - 4;". Parsing that as an
expression requires you to see the semicolon to know that the
expression "a + 2 * 3 - 4" has ended. The semicolon isn't part of the
expression. It's a property that most languages have (except perhaps
Lisp and Forth style languages).

Allowing this for expressions, but requiring more structure for
statements seems inconsistent.
Just saying that it is not so is not very helpful,
better show an error in my reasoning about that example..

No error. You just have an idea about how things should be that isn't
widely shared.
Claiming that a language that doesn't match your idea is in error is,
for lack of a better word, arrogant.
/L
 
J

John G Harris

True. One token of lookahead is necessary to know if an if-statement
ends here.

Just like one token of lookahead is necessary to see where an
expression ends, e.g., "a = a + 2 * 3 - 4;". Parsing that as an
expression requires you to see the semicolon to know that the
expression "a + 2 * 3 - 4" has ended. The semicolon isn't part of the
expression. It's a property that most languages have (except perhaps
Lisp and Forth style languages).
Allowing this for expressions, but requiring more structure for
statements seems inconsistent.

Another example : You need lookahead to understand the first '+' in +3,
+=, and ++.

No error. You just have an idea about how things should be that isn't
widely shared.
Claiming that a language that doesn't match your idea is in error is,
for lack of a better word, arrogant.

A related thing he is unhappy with is the use of a processing rule to
say that an else belongs to the nearest available if. This rule isn't
needed if the syntax uses fi or similar.

But there are other processing rules that are hardly even noticed. For
instance consider
a = newb();

According to the syntax specification 'newb' can be parsed as the
keyword 'new' followed by the identifier 'b', or as just the identifier
'newb'. (And several less likely other possibilities). It's a processing
rule that says to assume the second case. If every keyword and
identifier ended with a special ending character this rule wouldn't be
needed. ($ of course, so new$b$ is different from newb$ :).

John
 
E

Evertjan.

Lasse Reichstein Nielsen wrote on 01 mrt 2011 in comp.lang.javascript:
No error. You just have an idea about how things should be that isn't
widely shared.
Claiming that a language that doesn't match your idea is in error is,
for lack of a better word, arrogant.

What is the reason for your aggressive response, Lasse?

Do you lack reasoning power and therefore resort to impoliteness?

A supposed error in my reaoning [please read my sentence above again]
is not the same as an error in a language.

And reasoning that something is so,
because it is supposedly "widely shared",
is like trying to prove that god exists,
"because" that idea is also "widely shared".
Even if the latter were true,
that is not a valid way to prove it.
 
J

John G Harris

Actually both += and ++ are separate Punctuators and are passed to the
parser as tokens by the tokenizer.

You've forgotten that there is a lexical layer of parsing that turns the
character stream into tokens. It must use lookahead to decide that the +
character in +3 is a + token and not part of a += token or ++ token.

No it cannot - 'newb' is per the spec an Identifier and can under no
circumstances be interpreted as 'new b' - ecmascript separates tokens
based on whitespace (and the tokens valid characters), and so this
matches the Identifier production and nothing else.

You've forgotten that the lexical layer of parsing also has a syntax
specification. Explain where in

Identifier ::
IdentifierName but not ReservedWord

etc.

it says that an identifier ends with

[lookahead NotIn identifier character]

which is what you are implying it has. It doesn't. Instead there is a
processing rule in section 7, Lexical Conventions, namely :

"The source text is scanned from left to right, repeatedly taking the
longest possible sequence of characters as the next input element."

My point is that without that processing rule, 'newb' can be parsed into
the tokens 'new' 'b', among many others.

John
 
T

Thomas 'PointedEars' Lahn

John said:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You've forgotten that there is a lexical layer of parsing that turns the ^^^^^^^^^
character stream into tokens. It must use lookahead to decide that the + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
character in +3 is a + token and not part of a += token or ++ token.

ISTM you have a reading problem and are therefore preaching to the choir.
You've forgotten that the lexical layer of parsing also has a syntax
specification.

I do not think *he* has forgotten anything.
Explain where in

Identifier ::
IdentifierName but not ReservedWord

etc.

it says that an identifier ends with

[lookahead NotIn identifier character]

which is what you are implying it has. It doesn't. Instead there is a
processing rule in section 7, Lexical Conventions, namely :

"The source text is scanned from left to right, repeatedly taking the
longest possible sequence of characters as the next input element."

My point is that without that processing rule, 'newb' can be parsed into
the tokens 'new' 'b', among many others.

Your logic is flawed. Whatever you may have *meant*, you have *said*:

| But there are other processing rules that are hardly even noticed. For
| instance consider
| a = newb();
|
| According to the syntax specification 'newb' can be parsed as the
| keyword 'new' followed by the identifier 'b', or as just the identifier
| 'newb'. (And several less likely other possibilities).

This statement is simply wrong, and has received rightful criticism in the
process.

According to the Syntactic Grammar, the expression above is to be parsed as
follows:

0. AssignmentExpression
1. LeftHandSideExpression = AssignmentExpression
2. NewExpression = ConditionalExpression
3. MemberExpression = LogicalORExpression
4. PrimaryExpression = LogicalANDExpression
5. Identifier = BitwiseORExpression
6. a = BitwiseXORExpression
7. a = BitwiseANDExpression
8. a = EqualityExpression
9. a = RelationalExpression
10. a = ShiftExpression
11. a = AdditiveExpression
12. a = MultiplicativeExpression
13. a = UnaryExpression
14. a = PostFixExpression
15. a = LeftHandSideExpression
16. a = CallExpression
17. a = MemberExpression Arguments
18. a = PrimaryExpression Arguments
19. a = Identifier Arguments
20. a = newb ()

As Sean has stated correctly, there is no way this could be parsed into
anything else because tokens are separated by whitespace (among other
sequences). Cf. the Lexical Grammar:

InputElementDiv ::
WhiteSpace
LineTerminator
Comment
Token
DivPunctuator

So `newb' can *never* be parsed into two tokens, "new" and "b", which leaves
only CallExpression to be produced from

LeftHandSideExpression :
NewExpression
CallExpression

NewExpression :
MemberExpression
new NewExpression

CallExpression :
MemberExpression Arguments
CallExpression Arguments
CallExpression [ Expression ]
CallExpression . IdentifierName

in step 15. And we do not need the prose's assertion that parsing will take
the longest possible string as the next input element, for that is how a
token is defined in the Lexical Grammar:

Token ::
IdentifierName
Punctuator
NumericLiteral
StringLiteral


PointedEars
 
J

John G Harris

ISTM you have a reading problem and are therefore preaching to the choir.

I certainly have a problem reading that sentence. What on earth were you
trying to say ? And what were the fancy ^ sequences about ?

I do not think *he* has forgotten anything.

Perhaps he never knew in the first place.

Explain where in

Identifier ::
IdentifierName but not ReservedWord

etc.

it says that an identifier ends with

[lookahead NotIn identifier character]

which is what you are implying it has. It doesn't. Instead there is a
processing rule in section 7, Lexical Conventions, namely :

"The source text is scanned from left to right, repeatedly taking the
longest possible sequence of characters as the next input element."

My point is that without that processing rule, 'newb' can be parsed into
the tokens 'new' 'b', among many others.

Your logic is flawed. Whatever you may have *meant*, you have *said*:

| But there are other processing rules that are hardly even noticed. For
| instance consider
| a = newb();
|
| According to the syntax specification 'newb' can be parsed as the
| keyword 'new' followed by the identifier 'b', or as just the identifier
| 'newb'. (And several less likely other possibilities).

I meant what I said and I said what I meant.

This statement is simply wrong, and has received rightful criticism in the
process.

But you have failed to explain why it is wrong.


-alpha-
According to the Syntactic Grammar, the expression above is to be parsed as
follows:

0. AssignmentExpression
1. LeftHandSideExpression = AssignmentExpression
2. NewExpression = ConditionalExpression
3. MemberExpression = LogicalORExpression
4. PrimaryExpression = LogicalANDExpression
5. Identifier = BitwiseORExpression
6. a = BitwiseXORExpression
7. a = BitwiseANDExpression
8. a = EqualityExpression
9. a = RelationalExpression
10. a = ShiftExpression
11. a = AdditiveExpression
12. a = MultiplicativeExpression
13. a = UnaryExpression
14. a = PostFixExpression
15. a = LeftHandSideExpression
16. a = CallExpression
17. a = MemberExpression Arguments
18. a = PrimaryExpression Arguments
19. a = Identifier Arguments
20. a = newb ()
-beta-

Your reply from alpha to beta is totally irrelevant. I am not talking
about the parsing of tokens, which is what you have quoted.

The parser for the syntactic grammar operates on tokens and knows
nothing of whitespace, comments, etc, though it does know about the line
terminator pseudo-token so it can handle the complications of missing
semicolons. If you doubt this try reading ECMA 262, e.g at sec 5.1.4.

As Sean has stated correctly, there is no way this could be parsed into
anything else because tokens are separated by whitespace (among other
sequences).


Cf. the Lexical Grammar:

InputElementDiv ::
WhiteSpace
LineTerminator
Comment
Token
DivPunctuator

So `newb' can *never* be parsed into two tokens, "new" and "b", which leaves
only CallExpression to be produced from


Here is another irrelevant reference to the wrong syntax spec :
LeftHandSideExpression :
NewExpression
CallExpression

NewExpression :
MemberExpression
new NewExpression

CallExpression :
MemberExpression Arguments
CallExpression Arguments
CallExpression [ Expression ]
CallExpression . IdentifierName

in step 15.


And we do not need the prose's assertion that parsing will take
the longest possible string as the next input element, for that is how a
token is defined in the Lexical Grammar:

Token ::
IdentifierName
Punctuator
NumericLiteral
StringLiteral

I have carefully searched both ECMA 262 v3 and v5, using the Search
facility, and nowhere does the lexical syntax say that two identifiers
must be separated by anything.

In other words, I'm right, you are wrong, and the "longest possible
string" wording is needed. By the way, the C++ standard has much the
same wording. Perhaps the C++ committee thought it was needed as well.


John

I enjoy writing things that are outrageous but true. People's reactions
are so informative : they separate the loudmouths from the thoughtful.
 
T

Thomas 'PointedEars' Lahn

John said:
I certainly have a problem reading that sentence. What on earth were you
trying to say ?

I have basically said that you have not read Seans posting properly.
And what were the fancy ^ sequences about ?

Using the circumflex sign is a means in Usenet to mark important parts in
quotations. The parts that I have marked, quoted from Sean's posting and
from your, are semantically identical, so there was no good reason for your
gripe at all.
Perhaps he never knew in the first place.

Perhaps you are trying to misunderstand (him).
Explain where in

Identifier ::
IdentifierName but not ReservedWord

etc.

it says that an identifier ends with

[lookahead NotIn identifier character]

which is what you are implying it has. It doesn't. Instead there is a
processing rule in section 7, Lexical Conventions, namely :

"The source text is scanned from left to right, repeatedly taking the
longest possible sequence of characters as the next input element."

My point is that without that processing rule, 'newb' can be parsed into
the tokens 'new' 'b', among many others.

Your logic is flawed. Whatever you may have *meant*, you have *said*:

| But there are other processing rules that are hardly even noticed. For
| instance consider
| a = newb();
|
| According to the syntax specification 'newb' can be parsed as the
| keyword 'new' followed by the identifier 'b', or as just the identifier
| 'newb'. (And several less likely other possibilities).

I meant what I said and I said what I meant.

Then you were/are wrong.
But you have failed to explain why it is wrong.

So now you are blaming me for your stupidity?
-alpha-
-beta-

Your reply from alpha to beta is totally irrelevant.

No, it is not.
I am not talking about the parsing of tokens, which is what you have
quoted.

The parser for the syntactic grammar operates on tokens and knows
nothing of whitespace, comments, etc, though it does know about the line
terminator pseudo-token so it can handle the complications of missing
semicolons. If you doubt this try reading ECMA 262, e.g at sec 5.1.4.

And "newb" cannot ever be parsed into two tokens.

Actually, I should have quoted the InputElementRegExp production. The
InputElementDiv production only applies in syntactic grammar contexts where
a leading `/' or `/=' is permitted.
So `newb' can *never* be parsed into two tokens, "new" and "b", which
leaves only CallExpression to be produced from

Here is another irrelevant reference to the wrong syntax spec :
LeftHandSideExpression :
NewExpression
CallExpression

NewExpression :
MemberExpression
new NewExpression

CallExpression :
MemberExpression Arguments
CallExpression Arguments
CallExpression [ Expression ]
CallExpression . IdentifierName

in step 15.

That is not irrelevant. It shows that NewExpression does not serve to
produce `newb'.
I have carefully searched both ECMA 262 v3 and v5, using the Search
facility, and nowhere does the lexical syntax say that two identifiers
must be separated by anything.

That follows from how (the) source code is parsed. The tokenizer (or:
scanner) sees

a = newb();

and applies

InputElementRegExp ::
WhiteSpace
LineTerminator
Comment
Token
RegularExpressionLiteral

repeatedly, resulting in the following tokens:

"a" " " "=" " " "newb" "(" ")" ";"
| | | | | : : :
Token WhiteSpace Token WhiteSpace Token Token Token Token
| | | : : :
IdentifierName Punctuator IdentifierName: :
: : :
Punctuator :
: :
Punctuator
:
Punctuator
In other words, I'm right, you are wrong,

I am right, and you are deluding yourself into being right because you do
not really understand how a parser works.
and the "longest possible string" wording is needed.

Yes, but not for *this* code.
I enjoy writing things that are outrageous but true.

Most of what you post is off-topic noise; most of the rest can be considered
trolling.
People's reactions are so informative : they separate the loudmouths from
the thoughtful.

Pot, kettle, black.


PointedEars
 
V

VK

That follows from how (the) source code is parsed.  The tokenizer (or:
scanner) sees

  a = newb();

and applies

  InputElementRegExp ::
    WhiteSpace
    LineTerminator
    Comment
    Token
    RegularExpressionLiteral

repeatedly, resulting in the following tokens:

  "a"            " "        "="        " "        "newb" "("   ")"   ";"
   |              |          |          |          |      :     :     :
  Token          WhiteSpace Token      WhiteSpace Token Token Token Token
   |                         |                     |      :     :     :
  IdentifierName            Punctuator            IdentifierName:     :
                                                          :     :     :
                                                          Punctuator  :
                                                                :     :
                                                                Punctuator
                                                                      :
                                                                      Punctuator


By going from the practical programming to a "bored mind hacking" :)
it is still easy to make the tokenizer dizzy at some point by using
StringTeminator as a part of a string. Try for instance
var obj = {'p\0rop' : 'abc'};
and its resulting property, especially Firefox gets puzzled and in
turn puzzling.

As a note to the main topic, Date object and its members are created
to *output* internal machine date values in human readable, locale or
UTC/GMT adjusted text forms. Their purpose never was to take random
input and try to interpret it as a meaningful date representation. It
was fairly stated in all JavaScript manuals since - I believe -
JavaScript 1.0
 
L

Lasse Reichstein Nielsen

John G Harris said:
I have carefully searched both ECMA 262 v3 and v5, using the Search
facility, and nowhere does the lexical syntax say that two identifiers
must be separated by anything.

ES5 Section 7 (Lexical Conventions):
"The source text is scanned from left to right, repeatedly taking the
longest possible sequence of characters as the next input element. "

Since "newb" is longer than "new", both being valid productions of the
lexical grammar, then "newb" is the next input element, and it's an
identifier.
In other words, I'm right, you are wrong, and the "longest possible
string" wording is needed.

It's needed and present.

/L 'Absence of evidence isn't evidence of absence.'
 
T

Thomas 'PointedEars' Lahn

Lasse said:
ES5 Section 7 (Lexical Conventions):
"The source text is scanned from left to right, repeatedly taking the
longest possible sequence of characters as the next input element. "

Since "newb" is longer than "new", both being valid productions of the
lexical grammar, then "newb" is the next input element, and it's an
identifier.

It does not make any sense for a parser to parse one word as a sequence of
two IdentifierName tokens.
It's needed and present.

AISB, it is _not_ needed for *this* code, because a scanner that would not
take the longest possible string for an IdentifierName would be ambiguous,
and defeat its own purpose. Nobody would implement a source code parser
like that.

However, the clarification is needed, for example, for `++', because both
`+' and `++' are Punctuators.
/L 'Absence of evidence isn't evidence of absence.'

Non sequitur.


PointedEars
 
T

Thomas 'PointedEars' Lahn

VK said:
Thomas 'PointedEars' Lahn wrote:
[…]

Why have you even quoted my posting? You have not referred to it.
By going from the practical programming to a "bored mind hacking" :)
it is still easy to make the tokenizer dizzy at some point by using
StringTeminator as a part of a string.

There is no such thing as "StringTe(r)minator". This is not C(++).
ECMAScript string literals are delimited by double or single quotes, nothing
else.
Try for instance var obj = {'p\0rop' : 'abc'};
and its resulting property, especially Firefox gets puzzled and in
turn puzzling.

Nothing whatsoever happens (in Iceweasel/Firefox 3.6.8, JavaScript 1.8.1).

A property named "p\0rop" is created on the Object instance referred to by
`obj', whereas "\0" is interpreted as an escape sequence for the character
U+0000. This complies with the specification of the production
`EscapeSequence : 0' in the ECMAScript Language Specification, Edition 3 and
5, section 7.8.4. As this character (NULL) has no associated glyph, Gecko-
based browsers display the glyph of U+FFFD (REPLACEMENT CHARACTER) — � — if
the property name is retrieved e.g. with Firebug.

The only thing that could be considered strange here is that the character
does not show at all with window.alert() but with document.write().
As a note to the main topic, Date object and its members are created
to *output* internal machine date values in human readable, locale or
UTC/GMT adjusted text forms. Their purpose never was to take random
input and try to interpret it as a meaningful date representation.

Nobody but you implied randomness. The format that should be supported is
specified, though. ES5 finally goes so far as to specify one date format
that must be supported by the constructor and Date.parse(), "a
simplification of the ISO 8601 Extended Format" (see ES5, section "15.9.1.15
Date Time String Format").
It was fairly stated in all JavaScript manuals since - I believe -
JavaScript 1.0

Please spare us your fairytales and misconceptions about what JavaScript is.


PointedEars
 
T

Thomas 'PointedEars' Lahn

Thomas said:
Nothing whatsoever happens (in Iceweasel/Firefox 3.6.8, JavaScript 1.8.1).

That's 1.8.2 (Gecko 1.9.2), of course.

Iceweasel/Firefox 3.5 (Gecko 1.9.1) supported JavaScript 1.8.1, Firefox 4
(Gecko 2) is going to support JavaScript 1.8.5.


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
It does not make any sense for a parser to parse one word as a sequence of
two IdentifierName tokens.

Good thing the specification doesn't allow it then, right?
AISB, it is _not_ needed for *this* code, because a scanner that would not
take the longest possible string for an IdentifierName would be ambiguous,
and defeat its own purpose.

That's exactly why the specification needs the text that makes its
lexical grammar unambiguous. Without it, you could write a frighteningly
stupid scanner *and still be spec compliant*!

An ambiguous specification can be useless. The specification needs
the rule to avoid being ambiguous, so yes, the rule is needed.
Nobody would implement a source code parser like that.

And therefore nobody would specifiy a language to be parsed like that.
Which is what they avoid by adding the sentence.
However, the clarification is needed, for example, for `++', because both
`+' and `++' are Punctuators.

Punctuators?
I don't see the difference. Using the longest valid sequnce is there
to disambiguate different productions.
The difference might be that we think of "words" more readily than
of multi-character operators, so it's more natural to read "newb" as
one word than two, but it's not as obvious that "++" is an increment
operator, not two plus-operators.

/L
 
V

VK

Why have you even quoted my posting?  You have not referred to it.

It was your graph illustration of the straight and unambiguous
tokenizer algorithm. Right below it I'm demonstrating one of the ways
to add a lick of stuff to its boring everyday life :) Com'on, I am
not an instructor in a school for special children for chowing every
single time the purpose and the meaning of everything. Sometimes
either sapienti sat or where is the sapienti? ;-)
There is no such thing as "StringTe(r)minator".  This is not C(++).  
ECMAScript string literals are delimited by double or single quotes, nothing
else.

You've just got "D" for "Computer control code" :-(
Quickly to http://en.wikipedia.org/wiki/Null_character (the first
paragraph) or straight to http://en.wikipedia.org/wiki/C0_and_C1_control_codes
Second attempt tomorrow with working code samples.
Nothing whatsoever happens (in Iceweasel/Firefox 3.6.8, JavaScript 1.8.1)..  

That is what always amused me ever since you came to c.l.j. You seem
sincerely and honestly considering *nux/Gecko installation as the only
existing one, or the only worthing any attention or some crystal ball
to read all other possible limitations and obstacles...

var obj = {'p\0rop' : 'abc'};
for (var x in obj) {
window.alert(i);
window.alert(obj['p\0rop']);
}

Windows Vista SP2 or higher, any not ridiculously old recognizable
browser (that means now 5-6 months old max). Find one matching outcome
for both alerts between any two browsers and I will not pick on 12 hrs
in the row :)

P.S. btw, who is now the "king" (FAQ maintainer)?
And what a hay is going with the trash all around? I never saw such a
spammed out group even in alt.* not talking about Big7. sci.math has 5
times more daily visitors and 10 times lesser spam. Are you under some
vengeance attack?
 
T

Thomas 'PointedEars' Lahn

VK said:
It was your graph illustration of the straight and unambiguous
tokenizer algorithm. Right below it I'm demonstrating one of the ways
to add a lick of stuff to its boring everyday life :) Com'on, I am
not an instructor in a school for special children for chowing every
single time the purpose and the meaning of everything. Sometimes
either sapienti sat or where is the sapienti? ;-)

IOW, you do not know what you are talking about. What a surprise!
You've just got "D" for "Computer control code" :-(

As we all know, a "D" in VK-land is an A+ in the real world, so thank you.
Quickly to http://en.wikipedia.org/wiki/Null_character (the first
paragraph) or straight to
http://en.wikipedia.org/wiki/C0_and_C1_control_codes Second attempt
tomorrow with working code samples.

AISB, outside of VK-land, "\0" does not denote the end of a string value in
ECMAScript implementations. IOW, the character carries no significance
there whatsoever.

Your reference to C0 and C1 control codes of ISO/IEC 8859-1 is even more
hilarious as I just explained that ECMAScript strings are Unicode strings.
But please, do not hesitate to "try again". I can use the extra health.
That is what always amused me ever since you came to c.l.j. You seem
sincerely and honestly considering *nux/Gecko installation as the only
existing one, or the only worthing any attention or some crystal ball
to read all other possible limitations and obstacles...

You want to skip all the bullshit babbling next time. Oh wait, it's you …
var obj = {'p\0rop' : 'abc'};
for (var x in obj) {
window.alert(i);
window.alert(obj['p\0rop']);
}

Windows Vista SP2 or higher, any not ridiculously old recognizable
browser (that means now 5-6 months old max). Find one matching outcome
for both alerts between any two browsers and I will not pick on 12 hrs
in the row :)

Often Wrong, `i' is undeclared in your example. The outcome of this code
will without the shadow of a doubt be a ReferenceError exception being
thrown in the very first loop, and since that exception is not handled,
execution will end. (It is rather typical for you that you would not have
noticed this since either you never test your allegations (nor specify the
exact conditions of your tests), or you manage to misinterpret that which
you are observing along the lines of your ongoing fantasies about how the
language(s) ought to be.)

And if you meant `x' instead of `i', then it should not surprise you that
both alerts show different values in *any* scriptable browser, because the
first one would show the property *name* (although, AISB, without the
replacement character for U+0000), and the second one shows the property
*value*, "abc". `for…in' is _not_ the same as `for each' (E4X).

That aside, it is hard to believe that Firefox for Vista SP2 or higher would
behave differently than Firefox for Linux/GTK as it is the same source code
there. At least Firefox 3.0 for Windows on Wine does exactly as I
described. But I will check this for Windows 7 at the next opportunity.
P.S. btw, who is now the "king" (FAQ maintainer)?

Fortunately, not you.
And what a hay is going with the trash all around? I never saw such a
spammed out group even in alt.* not talking about Big7. sci.math has 5
times more daily visitors and 10 times lesser spam. Are you under some
vengeance attack?

Please spare us your delusions. TIA.


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top