Determining the last statement exercise

C

Csaba Gabor

Suppose you have some javascript statements in a string.
Can you determine whether the entire string is syntactically
valid, and if so, the starting position of the last statement?
In other words,

function lastStatementPos(code) {
  // returns the starting position within code of the last
  // javascript statement, and -1 if code is not syntactiaclly
  // valid.

This came up in a different context today, and I thought
it would make an interesting exercise.

 Csaba Gabor from Vienna

My solution for determining the (position of) the
last javascript statement runs along the lines outlined
in my Nov. 5 response to Lasse's first post at
http://groups.google.com/group/comp.lang.javascript/browse_frm/thread/91262ad01ca356bc/
and also in my response to VK within this thread:
http://groups.google.com/group/comp.lang.javascript/browse_frm/thread/2aa9a60623eb5883/

In the below, the term inject will mean to insert
code at a given point within a larger block of code
and to have it be syntactically valid.

Unless I'm forgetting some cases, javascript statements
end with either ; or } or a newline. The idea will
be to go backwards (KSB = keep searching backwards)
through each viable one of these characters (call the
current character being checked cP). See the
description of KSB below for determining the next
previous viable terminating character) and make 9
checks to decide whether or not we have a last
statement:

1. First, we determine that the character after cP
is not part of a comment (or in other "inert" code
such as a string or regular expression) by injection
of non compiling code. If it's an inert position,
then KSB

2. Find the next non whitespace character, call it
chNext. If chNext is a semicolon or any closing
delimiter such as ], ), or } then we don't actually
have a statement (or it's empty) so KSB

3. If we can not inject an empty loop at this point
(and have the result be syntactically correct), then
then cP does not represent an end of statement, so KSB

4. Starting with cP, find the first non whitespace
character going backwards, call it chPrev, If chPrev
is a semicolon or closing curly brace then we are done,
since those will mark the end of a statement. But
see the description of KSB below for why we don't have
to worry about embeddings.

5. If we can inject a "break; " at this point,
we're in a loop, so KSB.

6. If we can inject an " else; if (x) " at this point,
then we're in an if, so KSB.

7. If the final part of the code through chPrev is
an else, then we're in an if statement, so KSB.

8. If chNext is the first character of an autoincrement
or an autodecrement (++ or --) then it is independent
of the previous statement (since we passed point 3),
so we are done.

9. If chNext is not one of the five
characters +, -, (, [, or / then we are likewise done.
Otherwise, KSB. The other characters are not ambiguous
in their desire to connect to a previous item (such
as .), but those five are. However, if we have gotten
here, and one of those characters appears, then the
prior portion of code is an expression and chNext will
be connected to it. KSB

KSB. To finish off the algorithm description, some
words on determining the prior newline, semicolon,
or right curly brace: If a right curly brace is
encountered, then anything within it cannot be a
last statement. Therefore, in that situation, find
the corresponding opening brace, and continue the
search backwards from there. Finding the matching
(opening) brace is straightforward because one just
searches backwards replacing the text starting from
each candidate opening brace till the closing one
with {x} or {x:y}, and then apply checkSyntax. The
first opening brace to pass such a syntax check is
the matching one. While this does not guarantee
that we will be at the top level (eg. if / nested
loops), at least be can be sure that we are not
embedded within {...}


I have implemented the above description into
working code, though there are some parts still
unexercised. Since this is already a long post, I
won't make it longer by posting the actual code.
However, one can piggyback off this to break up a
piece of code into all its component statements,
even within loops, conditionals, and blocks. That,
however, is a bit trickier since one can't simply
recurse with lastStatementPos at the lower levels,
because the code within a block won't necessarily
pass a syntax check (eg. if you try to work with:
{ var foo=7;
break; }
without including the relevant loop, checkSyntax
will be displeased (ie. it won't pass a syntax
check). try/catch also require a bit of care.


In any case, direct syntax parsing is more
efficient, but the method outlined here and in
removeTrailingComments, that of using javascript's
own syntax checking upon injection of a judicious
code snippet is sometimes far simpler than having
to write a parser from scratch. For example,
the method is easily adapted to removing all
comments from a given block of code. In addition,
it also solves the problem of injecting a return,
when possible, before the last statement under
either of the two possible definitions mentioned
(since if the final statement is a block, one can
just recurse into it).

Csaba Gabor from Vienna
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top