Jorgen Grahn said:
And if I recall correctly there are people experimenting with the gcc
source code in this area. People are interested in using gcc as a C++
parser for use in static analysis, because it's so hard to write one
from scratch. (This might not apply to the C compiler; I don't know
much about this.)
parsing C is not particularly difficult...
a few kloc of code can do the trick, although it may be a little work to
understand how to write it (it helps to first have experience with simpler
languages, like Scheme and JavaScript, as each will give the experience and
a foundation to build on).
(the real evils are deeper in the compiler internals...).
if my server were up right now (it is down recently because internet
bandwidth here is too limited and others complain if I "waste" the bandwidth
over something so trivial as having a webserver running...), I could post a
link to my parser, which can parse C (and also Java and C#), and emits an
XML-based AST (not a token-tree / CST though, if this is what the OP
wanted).
personally my bias is to avoid things like parser generators, as to me they
seem like more of a trick to make people *think* they are making the task
easier for themselves, but setting themselves up for much pain once they get
past simple languages, and into languages with all sorts of bizarre stuff
going on (such as tokens which may or may not exist or may be parsed
differently depending on context, as may exist in languages such as C++ or
C#, or syntax which is ambiguous apart from knowing prior declarations,
such as in C and C++, ...).
personally, I am a fan of hand-written recursive descent, as IME it seems to
work fairly well, and I just haven't really run into problems where parser
generators would seem to be the right tool for the job.
a lexer may make sense to generate from a tool, although personally I don't
really think this is necessary either.
or such...