Edward C. Jones said:
Not kidding. Nothing can be parsed without a grammar. I think parsing
the standard computer languages is a common need. I am sporatically
developing software to automatically generate Pyrex code for wrapping
C libraries in Python. I use ANTLR because it comes with a good C
grammar.
And then there is HTML. I wonder how Mozilla parses all the ill-formed
html that is on the web.
Yes, things can be parsed without a grammar, or at least without a
conventional CFG. Ad hoc parsers are so messy, of course, that we try
to avoid that in modern languages. But I've parsed textual documents
at times with context-sensitive RR(2) approaches and other oddities.
The point is that FORTRAN predates clear understanding of
line-independent lexing and Context Free grammars (CFG's). It uses
constructs which are not handled by the classic
scanner/lexer/parser/AST tools. I don't know how the pros handle
this, but when I run into a non-std grammar, I preprocess to tag it
with additional tokens, and then run it through a std lexer/parser.
Basically a tree re-writer approach.
C++ is (I think) classically lexable, but the semantics are so complex
that parsing (or understanding what to do with the parse) is a pain.
I wasn't in that business, but I understand C compiler vendors bombed
out trying to just upgrade C compilers and had to start fresh with a
much richer type model. SWIG also ran into this.
For parsing of "bad html", see "tidy". Its lexer/parser is ad hoc
(not generated by parser toolkits).