Remove trailing comments exercise

Csaba Gabor · Nov 4, 2009

I'm looking for a
function stripEndComments(code) {
// remove trailing comments and whitespace from
/* the end of code, which is presumed to be valid
// javascript */
... }

My previous post at
http://groups.google.com/group/comp.lang.javascript/browse_frm/thread/2aa9a60623eb5883/
may amount to more than just an exercise, so I am
slicing off part of it into an independent exercise
(and this one IS just an exercise).

Assume the use of the function
function checkSyntax(code) {
// returns false if code is not syntactically OK
// returns browser's (string) interpretation of the code if it's
OK,
// encapsulated in an anonymous function
try {
var f = new Function(code);
return f.toString(); }
catch (err) { return false; } } // syntax error

Some examples:
foo + bar // two comments /* or one? *//
=> foo + bar

"Foo" + "bar" /* three */ // lines
// of comments /* should all be
/* stripped off *////
=> "Foo" + "bar"

For the rambunctious: remove trailing empty statements, too:
code = "baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;"
=> baz/* junk */+borf; fubar

Csaba Gabor from Vienna

SAM · Nov 4, 2009

Le 11/4/09 12:51 PM, Csaba Gabor a écrit :

I'm looking for a
function stripEndComments(code) {
// remove trailing comments and whitespace from
/* the end of code, which is presumed to be valid
// javascript */
... } (...)
For the rambunctious: remove trailing empty statements, too:
code = "baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;"
=> baz/* junk */+borf; fubar

I get,
Firefox.3 :
baz + borf;
fubar;
IE.5, 6 and 7 :
baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;

not yet finished ?

Csaba Gabor · Nov 4, 2009

Le 11/4/09 12:51 PM, Csaba Gabor a écrit :

I get,
Firefox.3 :
baz + borf;
fubar;
IE.5, 6 and 7 :
baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;

not yet finished ?

Hi SAM, what you have shown is what FF/IE returns if
you put the mentioned strings into a function and then
do a .toString() on it. FF cleans all comments
whereas IE leaves them in.

However, in this exercise, I'd like to strip the TRAILING
comments only, in an as browser independent fashion
as possible (without recasting the code string into
a different form). The part to the right of the
=> above indicates the string that the desired
function, stripEndComments, should return.
Therefore, you can use checkSyntax as a false vs.
nonempty-string check, but I don't think you'll find
the actual nonempty string return values useful for the
purposes of this exercise.

Richard Cornford · Nov 4, 2009

I'm looking for a
function stripEndComments(code) {
// remove trailing comments and whitespace from
/* the end of code, which is presumed to be valid
// javascript */
... }

My previous post at ...
may amount to more than just an exercise, so I am
slicing off part of it into an independent exercise
(and this one IS just an exercise).

Assume the use of the function
function checkSyntax(code) {
// returns false if code is not syntactically OK
// returns browser's (string) interpretation of the code if it's
OK,
// encapsulated in an anonymous function
try {
var f = new Function(code);
return f.toString(); }
catch (err) { return false; } } // syntax error

Some examples:
foo + bar // two comments /* or one? *//
=> foo + bar

"Foo" + "bar" /* three */ // lines
// of comments /* should all be
/* stripped off *////
=> "Foo" + "bar"

For the rambunctious: remove trailing empty statements, too:
code = "baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;"
=> baz/* junk */+borf; fubar

This problem includes the problem of not reacting to comment
delimiters whenever they appear in strings in the source code. For
example, stripping everything from the // to the end of the line in
the following would be disastrous:-

var prefixToIRI = {
'xsd':'http://www.w3.org/2001/XMLSchema',
'env':'http://schemas.xmlsoap.org/soap/envelope/',
'xsi':'http://www.w3.org/2001/XMLSchema-instance',
'xml':'http://www.w3.org/XML/1998/namespace',
'xmlns':'http://www.w3.org/2000/xmlns'
};

So for this task it seems necessary to identify the string literals
within the source, which is getting towards tokenising the source.
Tokenising the source was already implied in the task of verifying the
syntax of the code (along with identifying comments) so maybe this
stage should not be separated from the previous task if you genuinely
want all the comments removed.

Richard.

Csaba Gabor · Nov 4, 2009

This problem includes the problem of not reacting to comment
delimiters whenever they appear in strings in the source code. For
example, stripping everything from the // to the end of the line in
the following would be disastrous:-

I don't want to strip all comments, just those at the
very tail end of the code string (as the 3rd example suggests).
For example:
foo(); // comment1
bar(); // comment2
=>
foo(); // comment1
bar()

var prefixToIRI = {
'xsd':'http://www.w3.org/2001/XMLSchema',
'env':'http://schemas.xmlsoap.org/soap/envelope/',
'xsi':'http://www.w3.org/2001/XMLSchema-instance',
'xml':'http://www.w3.org/XML/1998/namespace',
'xmlns':'http://www.w3.org/2000/xmlns'

};

So for this task it seems necessary to identify the string literals
within the source, which is getting towards tokenising the source.

Hopefully, we can stay away from tokenising. If we do have to
enter the business of tokenising (in any substantive way) to
solve this problem, it would no longer be an exercise.
Perhaps it is better to use the browser's embedded parser to help out.

Tokenising the source was already implied in the task of verifying the
syntax of the code (along with identifying comments) so maybe this
stage should not be separated from the previous task if you genuinely
want all the comments removed.

Removing all the comments would seem to be a messier
problem (which I haven't thought about in this context).
I've done this (removed all comments) in the past for
PHP code, and it was around 60 lines of somewhat
intricate code (in parsing the original code string).
But I do not advocate such approach for this exercise.

Stevo · Nov 4, 2009

Csaba said:
I'm looking for a
function stripEndComments(code) {
// remove trailing comments and whitespace from
/* the end of code, which is presumed to be valid
// javascript */
... }

My previous post at
http://groups.google.com/group/comp.lang.javascript/browse_frm/thread/2aa9a60623eb5883/
may amount to more than just an exercise, so I am
slicing off part of it into an independent exercise
(and this one IS just an exercise).

Why are you talking about this as an exercise all the time? Is that your
way of getting people to write your code for you? Pretend it's just
an abstract exercise for fun?

SAM · Nov 4, 2009

Le 11/4/09 1:37 PM, Csaba Gabor a écrit :

Hi SAM, what you have shown is what FF/IE returns if
you put the mentioned strings into a function and then
do a .toString() on it. FF cleans all comments
whereas IE leaves them in.

Yes (the function checkSyntax() you've given).

However, in this exercise, I'd like to strip the TRAILING
comments only, in an as browser independent fashion
as possible (without recasting the code string into
a different form).

javascript:alert("baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;
;;".replace(/\/[/\*][^\*]+\*\/|\s+|\s*;(?=\s*

/g,''))

==> baz+borf;fubar;;

can't remove the last ';'

The part to the right of the
=> above indicates the string that the desired
function, stripEndComments, should return.
Therefore, you can use checkSyntax as a false vs.
nonempty-string check, but I don't think you'll find
the actual nonempty string return values useful for the
purposes of this exercise.

(not yet understood what is "the" purpose ... comments no ... but yes)

javascript:alert("baz/* junk */+borf; fubar ; /* more junk */ ; ;; ;
;;".replace(/\/[/\*][^\*]+\*\/(?=\s*

|\s+|;(?=\s*

/g,''))

==> baz/*junk*/+borf;fubar;;

abozhilov · Nov 4, 2009

I'm looking for a
function stripEndComments(code) {
š // remove trailing comments and whitespace from
š /* the end of code, which is presumed to be valid
š // javascript */
š ... }

Something like this?

code.replace(/\s*;[\s;]*/g, ';\n').replace(/^\/(?:\/[^\n]+|\*[^\/*]*?\*
\/)/gm, '');

Csaba Gabor · Nov 4, 2009

I'm looking for a
function stripEndComments(code) {
Å¡ // remove trailing comments and whitespace from
Å¡ /* the end of code, which is presumed to be valid
Å¡ // javascript */
Å¡ ... }

Click to expand...

Something like this?

code.replace(/\s*;[\s;]*/g, ';\n').replace(/^\/(?:\/[^\n]+|\*[^\/*]*?\*
\/)/gm, '');

You might be able to figure out a way to do this
with regular expressions, but I'm thinking that
it will be VERY messy because you will have to
account for strings and regular expressions such as:
var code = "var messy='it was windy/*sunny*'+" and */cold/*"

The first part of your code fails on:
var code = "var semi=' ; ; ; '";

While the second replace fails on
var code = "var k=i + j /* // */";

Thomas 'PointedEars' Lahn · Nov 4, 2009

Csaba said:
abozhilov said:

Csaba said:

Å¡ // remove trailing comments and whitespace from
Å¡ /* the end of code, which is presumed to be valid
Å¡ // javascript */
Å¡ ... }

Click to expand...

Something like this?

code.replace(/\s*;[\s;]*/g, ';\n').replace(/^\/(?:\/[^\n]+|\*[^\/*]*?\*
\/)/gm, '');

Click to expand...

You might be able to figure out a way to do this
with regular expressions, but I'm thinking that
it will be VERY messy

How fortunate then that you don't know what you are talking about.
It is rather easy to do if you do it properly. For example:

code = code.replace(
/('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm,
function(m, p1, p2, p3, p4) {
return (p3 || p4) ? "" : m;
});

because you will have to
account for strings and regular expressions such as:
var code = "var messy='it was windy/*sunny*'+" and */cold/*"

The concatenation here is rather pointless. Any tokenizer or parser will
see this equivalent to

var code = "var messy='it was windy/*sunny*' and */cold/*"

And

var messy='it was windy/*sunny*' and */cold/*

is not syntactically correct to begin with. Which also points out that
there is not Regular Expression here.

PointedEars

Thomas 'PointedEars' Lahn · Nov 4, 2009

Thomas said:
Csaba said:

[...] you will have to account for strings and regular expressions such
as:
var code = "var messy='it was windy/*sunny*'+" and */cold/*"

Click to expand...

^ ^ ^

The concatenation here is rather pointless. [...]

In fact, there is no concatenation here because it ...

is not syntactically correct to begin with. Which also points out that
there is not Regular Expression here.

PointedEars

Dr J R Stockton · Nov 4, 2009

In comp.lang.javascript message <7766145b-786d-478a-8a6e-08f2e27826ba@l2
g2000yqd.googlegroups.com>, Wed, 4 Nov 2009 03:51:10, Csaba Gabor

I'm looking for a
function stripEndComments(code) {
// remove trailing comments and whitespace from
/* the end of code, which is presumed to be valid
// javascript */
... }

Whitespace is trivial.

You must recognise strings, and not count // or /* within them.
You must allow for RegExp literals such as /slash=\//.
Remove all /* ... */ comment; or only if last on one line?

Csaba Gabor · Nov 5, 2009

You might be able to figure out a way to do this
with regular expressions, but I'm thinking that
it will be VERY messy because you will have to
account for strings and regular expressions such as:

var code = "var messy='it was windy/*sunny*'+" and */cold/*"

Oops, I see I've made a transcription error. It should read:
var code = "var messy='it was windy/*sunny*'+' and */cold/*'"

But the following may be slightly more interesting:
var code =
"var mess='it\\'s windy//*sunny*'+' & */cold/*' //asdf"

Thomas 'PointedEars' Lahn · Nov 5, 2009

Csaba said:
Oops, I see I've made a transcription error. It should read:
var code = "var messy='it was windy/*sunny*'+' and */cold/*'"

Still no RegExp here:

var messy='it was windy/*sunny* and */cold/*'
^ ^

But the following may be slightly more interesting:
var code =
"var mess='it\\'s windy//*sunny*'+' & */cold/*' //asdf"

You are still on the wrong track.

var mess='it\\'s windy//*sunny* & */cold/*' //asdf
^ ^

It is really merely an issue to recognize and ignore string literals first,
then to recognize and ignore RegExp initializers outside of them. My
replace function already implements the former; adapting it to also take
care of the latter is left as an exercise to the reader.

PointedEars

Lasse Reichstein Nielsen · Nov 5, 2009

Thomas 'PointedEars' Lahn said:
....
How fortunate then that you don't know what you are talking about.
It is rather easy to do if you do it properly. For example:

code = code.replace(
/('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm,
function(m, p1, p2, p3, p4) {
return (p3 || p4) ? "" : m;
});

The ('(?:[^']|\\')*') part fails to recognize the end of the following
string literal:
'foo \\'
and will match up to the next "'". Ditto for double-quoted strings.
Try
('(?:[^'\\]|\\[^])*')
(Here I'm also allowing backslash-newline in string literals, even
though it's not in the standard, otherwise replace "[^]" with ".").

And it's easy to add standard (not-single-line) comments as well:
(\/\*(?:[^*]*\*+)*\/)

This only works in the absence of regexp literals.
RegExps are harder to recognize, because it's the syntactic starting
point that distinguishes the starting slash from a division.
E.g.,
/foo + 42/g
might be a RegExp, if occuring in an expression context, but not
if it occurs where an operator is expected:
bar/foo + 42/g
(I.e., it's not tokenizable without context information).

And if you can't recognize regexps, you can mess up the recognition
of comments and strings as well.

/L

Csaba Gabor · Nov 5, 2009

Thomas 'PointedEars' Lahn said:
Thomas 'PointedEars' Lahn said:

Csaba Gabor wrote: ...
How fortunate then that you don't know what you are talking about.
It is rather easy to do if you do it properly. For example:

Click to expand...

code = code.replace(
/('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm,
function(m, p1, p2, p3, p4) {
return (p3 || p4) ? "" : m;
});

Click to expand...

The ('(?:[^']|\\')*') part fails to recognize the end of the following
string literal:
'foo \\'
and will match up to the next "'". Ditto for double-quoted strings.
Try
('(?:[^'\\]|\\[^])*')
(Here I'm also allowing backslash-newline in string literals, even
though it's not in the standard, otherwise replace "[^]" with ".").

Very interesting. I've not seen that [^] construct in
javascript before. With a PHP regular expression if ] is
the first character following the ^ in a character class,
it means to exclude the right closing bracket ]. Evidently,
PHP's [^]] translates to [^\]] in JS

And it's easy to add standard (not-single-line) comments as well:
(\/\*(?:[^*]*\*+)*\/)

Or: (\/\*.*?(?=\*\/)..)
though I have not extensively tested it

This only works in the absence of regexp literals.
RegExps are harder to recognize, because it's the syntactic starting
point that distinguishes the starting slash from a division.
E.g.,
/foo + 42/g
might be a RegExp, if occuring in an expression context, but not
if it occurs where an operator is expected:
bar/foo + 42/g
(I.e., it's not tokenizable without context information).

And if you can't recognize regexps, you can mess up the
recognition of comments and strings as well.

Indeed. Thanks for that nice reply Lasse. I would be highly
curious to see a reg exp variant developed to completion.
Perhaps there should be a separate 'Remove all comments' thread.

My solution to the 'Remove trailing comments' exercise follows.
My reason in posing the exercise was to highlight that in the
best spirit of programming, one may use the browser's syntax
checking capabilities to do the heavy lifting, rather than
having to parse the entire code string manually.

Reminder, I only want to remove the final comments at the end of
the code, and not at the end of each line. In short, I want to
be able to get at the last code that actually "does something"
(or might be doing something).

After getting rid of trailing whitespace and vacuous lines,
we consider that there exactly three situations. The final
characters are either:
1) Part of a comment started by //
2) The end of a comment started by /*
3) Not a comment

How to test for this (and what to do when we know which case)?

syntaxCheck(code + ' x y') will pass iff case 1 holds
and we have a // style comment. In that situation find
the previous //, strip the final / and perform the test
(on the stripped version). If it passes, recurse (since
we're still in the comment). If it fails, strip off one
more character from the end (the first / of the // pair),
and recurse on that. We can't be too greedy in the
passes case because we may have situations like ///

If case 1, above, does not hold, and the code does not
end with */, then it is evidently not part of a comment,
so it is case 3, and we are done.

Otherwise, find the prior /*. It is either the start
of the comment or in the middle of it. To test for
this, replace the /*...*/ with */
If this passes the syntax check, then we are still
in the middle of a comment, so we recurse on the just
tested string. Otherwise, we're at the start of a
comment so recurse on the just tested string less the
final two characters.

Here's the code:
function stripEndComments(code) {
// Trim trailing comments from code
// First trim whitespace and vacuous statements
code = code.replace(/(\s*

*\s*$/,"");

// Next check for double slash type of comment at end
if (checkSyntax(code + ' x y')) {
var pos=code.lastIndexOf("//"),
cS = checkSyntax(code.substr(0,pos+1) + ' x y');
return stripEndComments(code.substr(0,pos+!!cS)); }

// In this next case there are no more trailing comments
if (code.substr(-2)!="*/") return code;

// Here deal with /* ... /* ... */ comments
var c = code.substr(0,code.lastIndexOf("/*"));
return stripEndComments(c.substr(0,c.length-2*!checkSyntax(c)));
}

Csaba Gabor from Vienna

Csaba Gabor · Nov 5, 2009

My solution to the 'Remove trailing comments' exercise follows.
My reason in posing the exercise was to highlight that in the
best spirit of programming, one may use the browser's syntax
checking capabilities to do the heavy lifting, rather than
having to parse the entire code string manually.

Reminder, I only want to remove the final comments at the end of
the code, and not at the end of each line. In short, I want to
be able to get at the last code that actually "does something"
(or might be doing something).

After getting rid of trailing whitespace and vacuous lines,
we consider that there exactly three situations. The final
characters are either:
1) Part of a comment started by //
2) The end of a comment started by /*
3) Not a comment

Slightly revised code:

function stripEndComments(code) {
// Trim trailing comments from code
// First trim whitespace and vacuous statements
code = code.replace(/[\s;]*\s*$/,"");

// Next check for double slash type of comment at end
if (checkSyntax(code + ' x y')) {
var pos=code.lastIndexOf("//"),
cS = checkSyntax(code.substr(0,pos+1) + ' x y');
return stripEndComments(code.substr(0,pos+!!cS)); }

// In this next case there are no more trailing comments
if (code.substr(code.length-2)!="*/") return code;

// Here deal with /* ... /* ... */ comments
var c = code.substr(0,code.lastIndexOf("/*"));
return stripEndComments(c.substr(0,c.length-2*!checkSyntax(c)));
}

What changed:
code.substr(-2) => code.substr(code.length-2)
since some IEs do not like a negative arguments to .substr()

SAM · Nov 5, 2009

Le 11/5/09 11:20 AM, Csaba Gabor a écrit :

Very interesting. I've not seen that [^] construct in
javascript before. With a PHP regular expression if ] is
the first character following the ^ in a character class,
it means to exclude the right closing bracket ]. Evidently,
PHP's [^]] translates to [^\]] in JS

The characters '(' and '[' have not to be antislashed
when they are between [ ] or ( )
alone the closers ']' ')' have to be

Others characters that could have to be :
o '-' except if it is at the all end
(ie. [m-s-] : one character from m to s or sign -)
o '+' except if it is at the beginning
(ie. [+ms] : character m or s or +)

And it's easy to add standard (not-single-line) comments as well:
(\/\*(?:[^*]*\*+)*\/)

Click to expand...

Or: (\/\*.*?(?=\*\/)..)
though I have not extensively tested it

All depends the way you code ...

var reg = /(\/\*.*?(?=\*\/))/g;
var reg = new RegExp('(/\\*.*?(?=\\*/))','g');

<https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/RegExp>

var myString = 'some blah /* comment ?!; comment-2 /|\ */ + no comment';
myString = myString.replace(reg, '');
alert(myString);

But that Regexp doesn't work ...
This one is a little better :
var reg = new RegExp('(/\\*[^*]*\\*/)','g');

alert(myString.replace(/(\/\*[^*]*\*\/)/g,''));
or :
alert(myString.replace(/\/\*[^*]*\*\//g,''));

Of course, this RegExp doesn't work with :
myString = 'some blah /* comment?!; comment-2* /|\ */ + no comment';
where one '*' is introduced in the comment.

alert(myString.replace(/\/\*([^*]|\*(?!\/))+\*\//g,''));
OK (for "that" string !)

Reminder, I only want to remove the final comments at the end of
the code,

$ : to tell it's the end

and not at the end of each line. In short, I want to
be able to get at the last code that actually "does something"
(or might be doing something).

After getting rid of trailing whitespace and vacuous lines,
we consider that there exactly three situations. The final
characters are either:
1) Part of a comment started by //
2) The end of a comment started by /*
3) Not a comment

var reg = /[\/\s][\/*][^};]*(?![};])$/g;

var strg = 'var f = function(){ foo(); /* comment */} //no se';
alert(strg.replace(reg,''));

var strg = 'var f = function(){ foo(); /* comment */} /*no se*/';
alert(strg.replace(reg,''));

both ==> var f = function(){ foo(); /* comment */}

var strg = 'var f = function(){ foo(); // comment \n} /*no se*/';
alert(strg.replace(reg,''));
==>
var f = function(){ foo(); // comment
}

var strg = 'var f = function(){ foo(); // comment **\n} /*no se*/';
alert(strg.replace(reg,''));
==>
var f = function(){ foo(); // comment **
}

Not tested with IE ...

can try your reg exps and your strings here:
<http://www.regextester.com/>
<http://www.google.com/search?q=tester+regex>
<http://stephane.moriaux.pagesperso-orange.fr/truc/js_regexp_testeur>

Thomas 'PointedEars' Lahn · Nov 5, 2009

Lasse said:
Thomas 'PointedEars' Lahn said:

Csaba said:

abozhilov wrote:
Csaba Gabor wrote:
Ã…Â¡ // remove trailing comments and whitespace from
Ã…Â¡ /* the end of code, which is presumed to be valid
Ã…Â¡ // javascript */
Ã…Â¡ ... }

Click to expand...

...
How fortunate then that you don't know what you are talking about.
It is rather easy to do if you do it properly. For example:

code = code.replace(
/('(?:[^']|\\')*')|("(?:[^"]|\\")*")|(\/\/.*)|(\s+$)/gm,
function(m, p1, p2, p3, p4) {
return (p3 || p4) ? "" : m;
});

Click to expand...

The ('(?:[^']|\\')*') part fails to recognize the end of the following
string literal:
'foo \\'
and will match up to the next "'". Ditto for double-quoted strings.

Not here (Iceweasel 3.5.4, JavaScript 1.8.1). Have you used "'foo \\'" or
"'foo \\\\'" for the test? Because the latter is the representation of
'foo \\' in a string value, while "'foo \\'" as a string value represents
the syntactically invalid 'foo \' (which is why it must be matched up to the
next apostrophe to be a string literal).

/* 'foo \\' */
var code = "'foo \\\\' '";

/* ["'foo \\'", "'foo \\'"] */
/('(?:[^']|\\')*')/.exec(code)

If I am overlooking something, can you explain why the recognition of this
string literal should fail?

[...]
And it's easy to add standard (not-single-line) comments as well:
(\/\*(?:[^*]*\*+)*\/)

This only works in the absence of regexp literals.
RegExps are harder to recognize, because it's the syntactic starting
point that distinguishes the starting slash from a division.
E.g.,
/foo + 42/g
might be a RegExp, if occuring in an expression context, but not
if it occurs where an operator is expected:
bar/foo + 42/g
(I.e., it's not tokenizable without context information).

And if you can't recognize regexps, you can mess up the recognition
of comments and strings as well.

Thank you. I am working on an ECMAScript-compliant source code parser and
you have given me quite something to think about.

PointedEars

Lasse Reichstein Nielsen · Nov 6, 2009

Thomas 'PointedEars' Lahn said:
Lasse said:

The ('(?:[^']|\\')*') part fails to recognize the end of the following
string literal:
'foo \\'
and will match up to the next "'". Ditto for double-quoted strings.

Click to expand...

Not here (Iceweasel 3.5.4, JavaScript 1.8.1). Have you used "'foo \\'" or
"'foo \\\\'" for the test? Because the latter is the representation of
'foo \\' in a string value, while "'foo \\'" as a string value represents
the syntactically invalid 'foo \' (which is why it must be matched up to the
next apostrophe to be a string literal).

(I'll write all strings as string literals from here, to (try to) avoid
confusion).

To be honest, I didn't test it, and the argument for why it didn't
work was wrong because of that.
It still doesn't work, but for the opposite reason of initial guess:
it doesn't exclude "\\'" from ending the string literal, whereas I had
guessed that it wouldn't correctly recognize "\\\\'" as ending it.

Try:

var code = "'abc\\'def'";
// I.e., code contains two strings literals
var re = /('(?:[^']|\\')*')/g;
alert(re.exec(code)[0]);

It alerts the string "'abc\\'", i.e., it does end at the first
"'", even if the quote is escaped.

The reason it does so is that [^'] matches backslash as well, and
with a higher priority than what comes after, so it matches the
backslash as well.

The immediate fix of swapping the alternatives:
var re = /('(?:\\'|[^'])*'/g;
and giving \\' priority over [^'], will match "\\'" as a non-string-ender,
but will also ignore "\\\\'". It's necessary to know whether there is an
even number of backslashes before the quote in order to know whether it's
escaped or not. The RegExp below is the simplest one I have found to do that.

/* 'foo \\' */
var code = "'foo \\\\' '";

/* ["'foo \\'", "'foo \\'"] */
/('(?:[^']|\\')*')/.exec(code)

If I am overlooking something, can you explain why the recognition of this
string literal should fail?

It works. It's the escaped backslash before a quote that fails:
"'foo \\\\' + 'bar'" that fails

....

Thank you. I am working on an ECMAScript-compliant source code parser and
you have given me quite something to think about.

Glad to be of service

ECMAScript syntax is ... interesting. Context depending lexing combined
with semicolon-insertion gives ample room to make mistakes

var b=2,g=1;
var a = 84
/b/g; // <- it's division

/L

Why is this WordPress comments form not submitting?	1	Jan 12, 2020
Determining the last statement exercise	20	Nov 3, 2009
new operator as a method	5	Aug 13, 2005
PyWart: The problem with "print"	102	Jun 2, 2013
Cross browser horizontal overflow	2	Nov 5, 2006
A Brief Review of jQuery 1.5	13	Feb 13, 2011
Sencha Touch--Support 2 browsers in just 228K!	64	Jul 16, 2010
Request for comments - kgets()	10	Aug 13, 2004

Remove trailing comments exercise

Csaba Gabor

SAM

Csaba Gabor

Richard Cornford

Csaba Gabor

Stevo

SAM

abozhilov

Csaba Gabor

Thomas 'PointedEars' Lahn

Thomas 'PointedEars' Lahn

Dr J R Stockton

Csaba Gabor

Thomas 'PointedEars' Lahn

Lasse Reichstein Nielsen

Csaba Gabor

Csaba Gabor

SAM

Thomas 'PointedEars' Lahn

Lasse Reichstein Nielsen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads