I think that you don't have to enforce any particular indentation style or
amount of space on each line - only that it is consistent between begin and
end.
If we define the 'nesting depth' as the number of module / class / def / do
/ if sections we are within (i.e. the number of matching 'end's we expect to
see), then:
- at the start of each line, count the number of spaces. Ignore lines which
consist entirely of whitespace.
R1: if the nesting depth is the same as the previous line, then raise a
warning if the number of spaces is not the same as the previous line
R2: if the nesting depth is greater than the previous line, then remember
the indentation of this line associated with this nesting depth (e.g. on a
stack)
R3: if the nesting depth is less than the previous line, then raise a
warning if the number of spaces is not the same as the last line with the
same nesting depth
*ROFL* Yes! What you have described here is salient structure
(well, missing, IIRC, the line continuation rule and the tab-size
rule). It's a wonderful system, and everybody thinks they use it. But
back in the early nineteen eighties or so someone who didn't understand
the system decided it would be easier to "find pairs" if the closing
token were indented as if it were a member of the enclosing block.
Thus, the algorithm as you stated it would complain
# hello [] R1: check indentation == 0
class Foo [] R1: check indentation == 0
def m [2] R2: push 2
wibble [2,6] R2: push 2
bibble [2,6] R1: check indentation == 6
end [2] R3: check indentation == 2
^
at the point marked "^" because (at the start of that line) the nesting
is the same as at was for the previous two lines. The same problem
occurs throughout the example. There are, as always, at least two ways
to fix this.
One would be to implement the rules as you stated them, adding the
half-step rule (indentations intermediate between the two are used for
syntactic continuation, e.g. else or your chained addition example). I
would love this, since that's the style I use (I was already out of
school before the sea change) but I suspect most if not all of you would
hate it.
Another way would be to do what python did and avoid the problem by
eliminating closing tokens. I suspect more people would like this, but
matz and the majority (so two majorities? yikes!?) would be against
them.
A third solution would be to do what I suggested a few hours ago,
the essence of which was to look not for the closing but the occurrence
of something "outer" (< indented, not <= indented), and only telling
about the most recent one when the problem is detected:
There may be heuristics to get a reasonable "hint" by making
some assumptions; e.g., warn if there is a line less indented
than the first line of an outstanding (open) construct,
excluding here-docs, %_{ constructs, etc., if (and only if)
there is a missing end at eof [or when the error is detected].
This could (I think) be implemented fairly easily by
* caching the location and indentation of a each class,
def, etc. on a stack
* popping from the stack on end
* noting when the first token is lexed from a line if it
was less indented than the most recent outstanding
def/class, etc., and if so noting the fact in a global
* including the information in the global (if any) when
generating the missing end message
But this is only a heuristic, based on the observation that even
people who don't like salient structure tend to use it to some
extent. It would not solve the problem in general, and perhaps
not even in a typical case, for anyone but me and the python
expatriates.
This is still the best of the options I've heard or thought of.
Instead of using a stack (as we both suggested) it might be easier to
use the parser's tree, which (IIRC) has adequate support for all sorts
of tagging.
One other thing, you'd need to ignore subsequent lines of multi-line
statements:
a,b,c =
1,2,3
foo (
...
)
but I guess the parser must already have a mechanism which infers "the next
line is a continuation of the previous" anyway.
Yes, it does. I'm not sure I grok its fullness, but it is there
and would be accessible. It wouldn't be needed in the soft-checking
third option (or at least, I don't think it would be needed), but it
wouldn't be too hard to watch for if it was.
-- Markus