Remove trailing comments exercise

C

Csaba Gabor

Thomas 'PointedEars' Lahn said:
Lasse Reichstein Nielsen wrote:
var re = /('(?:[^']|\\')*')/g;
alert(re.exec(code)[0]);

It alerts the string "'abc\\'", i.e., it does end at the first
"'", even if the quote is escaped.

The above recognizes from one single quote to
the final single quote in the string. One may
just as well write var re = /('.*?')/g
The reason it does so is that [^'] matches backslash as well, and
with a higher priority than what comes after, so it matches the
backslash as well.

The immediate fix of swapping the alternatives:
var re = /('(?:\\'|[^'])*'/g;

The above recognizes from a single quote to either the next
single quote not preceded by a backslash if such a single
quote exists; else to the last single quote. To observe:

var code = "abc'def\\'ghi'jkl\\\\'mno\\\\'pqr";
var re = /'(?:\\'|[^'])*'/g
alert (code.replace(re, "XXX"));
and giving \\' priority over [^'], will match "\\'" as a non-string-ender,
but will also ignore "\\\\'". It's necessary to know whether there is an
even number of backslashes before the quote in order to know whether it's
escaped or not. The RegExp below is the simplest one I have found to do that.

Lasse, to me, the RegExp below looks identical to the first
one above. So in the absence of me seeing it, here is a
regular expression that recognizes single quoted strings.
It will match from a single quote to the next single quote
not preceded by an odd number of backslashes.

var re = /'(?:\\.|[^\\'])*'/g
/* 'foo \\' */
var code = "'foo \\\\' '";
/* ["'foo \\'", "'foo \\'"] */
/('(?:[^']|\\')*')/.exec(code)
Glad to be of service :)
ECMAScript syntax is ... interesting. Context depending lexing combined
with semicolon-insertion gives ample room to make mistakes :)

var b=2,g=1;
var a = 84
/b/g; // <- it's division :)

This is highly interesting, where the interpretation of that
final line also depends on what comes before it. For example:

var b=2,g=1;
var a = 84;
/b/g; // <- it's a regular expression :)

or

whole(truth) /b+c/g; // division
vs.
while(truth) /b+c/g; // RegExp

I wonder about other examples of (non embedded) code being
interpreted differently depending on what precedes it.


Also, while your example of [^] works on my FF1.5, it does
not complile on my IE 6. Ie. adding
var re=/[^]/;
results in an error message from IE.
 
C

Csaba Gabor

Thomas 'PointedEars' Lahn said:
Lasse Reichstein Nielsen wrote:
 var re = /('(?:[^']|\\')*')/g;
 alert(re.exec(code)[0]);
It alerts the string  "'abc\\'", i.e., it does end at the first
"'", even if the quote is escaped.

The above recognizes from one single quote to
the final single quote in the string. One may
just as well write var re = /('.*?')/g

final => next. Sorry about that

If the ? in the RegExp I supplied is omitted, then
it captures till the final single quote
 
V

VK

Thomas said:
It is really merely an issue to recognize and ignore string literals first,
then to recognize and ignore RegExp initializers outside of them.  My
replace function already implements the former; adapting it to also take
care of the latter is left as an exercise to the reader.

Your replace function so far converts a syntactically correct source
into syntactically incorrect one:
/foobar//foobar
comes to
/foobar
which is "unterminated regular expression literal"

P.S. It is a bit of fun to watch people making a robust parser
algorithm for an algorithmically unparseable matter. But keep going, I
have more...
 
T

Thomas 'PointedEars' Lahn

VK said:
Your replace function so far converts a syntactically correct source
into syntactically incorrect one:
/foobar//foobar
comes to
/foobar
which is "unterminated regular expression literal"

If you had paid attention, you would have known that I am aware of the
RegExp issue.
P.S. It is a bit of fun to watch people making a robust parser
algorithm for an algorithmically unparseable matter.

It is not algorithmically unparseable. Otherwise there would be no script
engine that accepts RegExp initializer, would there? The context in which
`/' is not recognized as the start of a RegExp initializer is grammatically
well-defined, and if you had cared to read the Specification you would have
known.
But keep going, I have more...

You would.


PointedEars
 
L

Lasse Reichstein Nielsen

[correct description of how the regexps work]
and giving \\' priority over [^'], will match "\\'" as a non-string-ender,
but will also ignore "\\\\'". It's necessary to know whether there is an
even number of backslashes before the quote in order to know whether it's
escaped or not. The RegExp below is the simplest one I have found to do that.

Lasse, to me, the RegExp below looks identical to the first
one above. So in the absence of me seeing it, here is a
regular expression that recognizes single quoted strings.
It will match from a single quote to the next single quote
not preceded by an odd number of backslashes.

var re = /'(?:\\.|[^\\'])*'/g

My mistake. The "RegExp below" that I was referring to was one that I
had written in a double-quoted message, but I managed to remove that
quote before posting.

It was indeed equivalent to the one you wrote here (I think it had the
alternative in the opposite order, but that's not important since they
are mutually exclusive.
This is highly interesting, where the interpretation of that
final line also depends on what comes before it. For example:

var b=2,g=1;
var a = 84;
/b/g; // <- it's a regular expression :)

or

whole(truth) /b+c/g; // division
vs.
while(truth) /b+c/g; // RegExp

I wonder about other examples of (non embedded) code being
interpreted differently depending on what precedes it.

There are a few:
An object literal, {foo: 42}, is alos a valid statement block
with a labeled expression statement. In an expression context,
it can only be the object literal, in a statement context, it
can only be the statement block, and since expressions can be
statements (ExpressionStatement) there is a rule that says that
an ExpressionStatement cannot begin with "{" (or "function").

Also, while your example of [^] works on my FF1.5, it does
not complile on my IE 6. Ie. adding
var re=/[^]/;
results in an error message from IE.

Tsk, tsk. :)

/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top