Building Static Analysis Tools for C++

S

Scott Meyers

C++ is a very tough language to parse, where by "parse" I mean not just
dealing with its syntax, but also doing the semantic analysis to perform
template instantiation and overload resolution. But I get the sense
that there are now several tools that can address this challenge. My
goal in this post is to find out about tools I may not know about so
that I can look into them.

To clarify what I'm looking for, suppose I'd like to write an "autoizer"
program that, given a C++98 program, inserts use of C++0x's "auto" in
all the places where it doesn't change the program's semantics. For
example, given

int x = 10;

the statement would be rewritten as

auto x = 10;

More interestingly, given

int x = f(a, b, c);

it would be rewritten as

auto x = f(a, b, c);

only if the result of f(a, b, c) is int. If f(a, b, c) returned a type
other than int, the statement would remain unchanged.

Whether a program like autoizer is a good idea is beside the point.
What's interesting is that such a program would require the ability to
parse a C++ program, and, ideally, modify the parsed representation in
place. For autoizer, a good tool would make it easy to find statements
in C++ that define a new named variable and initialize it with an
expression.

Here are candidate tools I know of:
- ROSE ( http://www.rosecompiler.org/ )
- Vivacore ( http://www.viva64.com/en/vivacore-library/ )
- gcc plugins ( http://gcc.gnu.org/wiki/plugins )
- Dedydra ( https://developer.mozilla.org/en/Dehydra - based on gcc plugins)
- Clang ( http://clang.llvm.org/features.html )

Are there others I should be aware of? If you wanted to write something
like autoizer, but you didn't want to write a C++ parser, where would
you start?

Thanks,

Scott
 
E

Ebenezer

C++ is a very tough language to parse, where by "parse" I mean not just
dealing with its syntax, but also doing the semantic analysis to perform
template instantiation and overload resolution.  But I get the sense
that there are now several tools that can address this challenge.  My
goal in this post is to find out about tools I may not know about so
that I can look into them.

To clarify what I'm looking for, suppose I'd like to write an "autoizer"
program that, given a C++98 program, inserts use of C++0x's "auto" in
all the places where it doesn't change the program's semantics.  For
example, given

   int x = 10;

the statement would be rewritten as

   auto x = 10;

It is often tempting to do something just because it is
easy to do. In the case above, it's hard to find much
value.
More interestingly, given

   int x = f(a, b, c);

it would be rewritten as

   auto x = f(a, b, c);

only if the result of f(a, b, c) is int. If f(a, b, c) returned a type
other than int, the statement would remain unchanged.

Whether a program like autoizer is a good idea is beside the point.
What's interesting is that such a program would require the ability to
parse a C++ program, and, ideally, modify the parsed representation in
place.  For autoizer, a good tool would make it easy to find statements
in C++ that define a new named variable and initialize it with an
expression.

Here are candidate tools I know of:
- ROSE (http://www.rosecompiler.org/)
- Vivacore (http://www.viva64.com/en/vivacore-library/)
- gcc plugins (http://gcc.gnu.org/wiki/plugins)
- Dedydra (https://developer.mozilla.org/en/Dehydra- based on gcc plugins)
- Clang (http://clang.llvm.org/features.html)

I'm surprised you didn't mention Antlr -- http://www.antlr.org/ .
I'm not using any of these in my work, but it is an interesting
list.


Brian Wood
Ebenezer Enterprises
http://webEbenezer.net
 
S

Scott Meyers

It is often tempting to do something just because it is
easy to do. In the case above, it's hard to find much
value.

As I said, that's not my point. My point is that it would be really
nice if it were easy to write such a tool. Traditionally, it has been
well nigh impossible, because the difficulty of developing a parser
(including semantic analysis, as I mentioned in my original message) was
too high.
I'm surprised you didn't mention Antlr -- http://www.antlr.org/ .

From what I can tell from
http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets , there
does not seem to be any active work on using Antler to parse C++, and
the high-level description of Antlr ("ANTLR provides excellent support
for tree construction, tree walking, translation, error recovery, and
error reporting.") doesn't make any mention of support for semantic
analysis (e.g., overloading resolution in C++), so I'm concerned that
this may not really be a viable tool. A StackOverflow post from
December 2009 (
http://stackoverflow.com/questions/1831529/is-c-code-generation-in-antlr-3-2-ready
) says that this is the case. Does anybody have more recent information?

Scott
 
A

Alf P. Steinbach /Usenet

* Scott Meyers, on 25.06.2011 06:46:
If you wanted to write something like
autoizer, but you didn't want to write a C++ parser, where would you start?

I think I'd start by nosing around in the technology surrounding similar tools,
such as syntax tooltips in editors (like Visual C++ IntelliSense). Surely they
must be leveraging some C++ parsing utility. Perhaps part of the compiler?

If you find out, please keep us informed! :)


Cheers & TIA.,

- Alf
 
S

Scott Meyers

I think I'd start by nosing around in the technology surrounding similar
tools, such as syntax tooltips in editors (like Visual C++
IntelliSense). Surely they must be leveraging some C++ parsing utility.
Perhaps part of the compiler?

In the case of Visual Studio, MS has made known that they actually use
the EDG front end for their intellisense, even though they use their own
front end for the compiler itself. Which means I should add EDG to the
list. Thanks for the suggestion.

Scott
 
B

Balog Pal

Scott Meyers said:
Here are candidate tools I know of:
- ROSE ( http://www.rosecompiler.org/ )
- Vivacore ( http://www.viva64.com/en/vivacore-library/ )
- gcc plugins ( http://gcc.gnu.org/wiki/plugins )
- Dedydra ( https://developer.mozilla.org/en/Dehydra - based on gcc
plugins)
- Clang ( http://clang.llvm.org/features.html )

Are there others I should be aware of? If you wanted to write something
like autoizer, but you didn't want to write a C++ parser, where would you
start?

Intellisense in VS provides a plenty of info, and plugins like Visual Assist
have access to it. That looks like the easiest path. The problem is that IS
info, while usable interactively, is inprecise, so would hardly be fit for a
real tool.

The next thing would be MS brose info. It is created in compilation, and
supposed to have everything. However it seems abandoned since whatever, and
I know no tools that can read it. If you could obtain info from MS, even
just description of the format, it would be king.

And of course I'd expect gcc to be usable to emit types and cooked call
lists instead of object code, with a properly made backend. Dunno if that
is the same thing you listed as gcc plugins or not.
 
S

Scott Meyers

Intellisense in VS provides a plenty of info, and plugins like Visual
Assist have access to it. That looks like the easiest path. The problem
is that IS info, while usable interactively, is inprecise, so would
hardly be fit for a real tool.

And it's not cross-platform, which would be a desirable goal for a real
tool.

Scott
 
Ö

Öö Tiib

Here are candidate tools I know of:
- ROSE (http://www.rosecompiler.org/)
- Vivacore (http://www.viva64.com/en/vivacore-library/)
- gcc plugins (http://gcc.gnu.org/wiki/plugins)
- Dedydra (https://developer.mozilla.org/en/Dehydra- based on gcc plugins)
- Clang (http://clang.llvm.org/features.html)

Are there others I should be aware of?  If you wanted to write something
like autoizer, but you didn't want to write a C++ parser, where would
you start?

I would start with tools that are made based on gcc or clang C++
parsers. Clang feels better for static analysis, gcc is more
conforming. There is also Elsa parser that like i understand was made
purely with static analysis in mind, but i am not sure how conforming
it is or how well maintained. http://scottmcpeak.com/elkhound/sources/elsa/
 
I

Ira Baxter

Scott Meyers said:
C++ is a very tough language to parse, where by "parse" I mean not just
dealing with its syntax, but also doing the semantic analysis to perform
template instantiation and overload resolution. [snip]
Whether a program like autoizer is a good idea is beside the point. What's
interesting is that such a program would require the ability to parse a
C++ program, and, ideally, modify the parsed representation in place. For
autoizer, a good tool would make it easy to find statements in C++ that
define a new named variable and initialize it with an expression.

Here are candidate tools I know of:

Are there others I should be aware of? If you wanted to write something
like autoizer, but you didn't want to write a C++ parser, where would you
start?

Consider our DMS Software Reengineering Toolkit, and its C++ Front
End.
See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html
and http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html.

DMS provides generic parsing, tree building, symbol table support,
anti-parsing (regeneration
of compilable source text including comments), pattern matching and
source-to-source
transformation capability, including various kinds of flow analysis
(control-, data-,
call graph, points-to-, range-analysis), configured by a langauge
definition
module.

DMS's C++ front end defines C++ to DMS, and gives it the ability to
carry
out
full C++ parsing including complete name and type resolution.
Presently
the flow analysis isn't connected to the front end; we can only peddle
so
fast :-}
However, it has already been used for a variety of automated analysis
and
mass change
activities for C++ systems. (The flow analysis machinery has been
used
on C systems of 18,000 compilation units).

The front end handles a variety of C++ dialects: ANSI, GCC 3/4/,
MSVS.
We are putting polishing touches on our C++0X dialect as I write this
message.
 
I

Ira Baxter

Scott Meyers said:
C++ is a very tough language to parse, where by "parse" I mean not just
dealing with its syntax, but also doing the semantic analysis to perform
template instantiation and overload resolution. [snip]
Whether a program like autoizer is a good idea is beside the point. What's
interesting is that such a program would require the ability to parse a
C++ program, and, ideally, modify the parsed representation in place. For
autoizer, a good tool would make it easy to find statements in C++ that
define a new named variable and initialize it with an expression.

Here are candidate tools I know of:

Are there others I should be aware of? If you wanted to write something
like autoizer, but you didn't want to write a C++ parser, where would you
start?

Consider our DMS Software Reengineering Toolkit, and its C++ Front
End.
See http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html
and http://www.semanticdesigns.com/Products/FrontEnds/CppFrontEnd.html.

DMS provides generic parsing, tree building, symbol table support,
anti-parsing (regeneration
of compilable source text including comments), pattern matching and
source-to-source
transformation capability, including various kinds of flow analysis
(control-, data-,
call graph, points-to-, range-analysis), configured by a langauge
definition
module.

DMS's C++ front end defines C++ to DMS, and gives it the ability to
carry
out
full C++ parsing including complete name and type resolution.
Presently
the flow analysis isn't connected to the front end; we can only peddle
so
fast :-}
However, it has already been used for a variety of automated analysis
and
mass change
activities for C++ systems. (The flow analysis machinery has been
used
on C systems of 18,000 compilation units).

The front end handles a variety of C++ dialects: ANSI, GCC 3/4/,
MSVS.
We are putting polishing touches on our C++0X dialect as I write this
message.
 
S

Scott Meyers

Consider our DMS Software Reengineering Toolkit, and its C++ Front
End.

Thanks for reminding me of this; I'd forgotten.

It's not clear from your description whether your toolkit does template
instantiation and overload resolution out of the box, i.e, if I write
"f(x, y, z)" in my source code, your toolkit will tell me which f is
being invoked (including the case where f is instantiated from a
template). Can you please clarify this point?

Thanks,

Scott
 
N

Noah Roberts

In the case of Visual Studio, MS has made known that they actually use
the EDG front end for their intellisense, even though they use their own
front end for the compiler itself. Which means I should add EDG to the
list. Thanks for the suggestion.

I don't know if it's a symptom of intellisense or the front end it uses,
but I've found it of limited use in actual C++ development. My C++
projects are riddled with underlined bits and "errors" in the report
view from intellisense, yet compiles cleanly with warnings on full. It
seems to be especially confused by templates and overloads, which you
seem interested in.

YMMV but at least with my experience I have my doubts that it'll serve
your needs.
 
S

Scott Meyers

I don't know if it's a symptom of intellisense or the front end it uses,
but I've found it of limited use in actual C++ development. My C++
projects are riddled with underlined bits and "errors" in the report
view from intellisense, yet compiles cleanly with warnings on full. It
seems to be especially confused by templates and overloads, which you
seem interested in.

Well, they are kind of core to C++, and they're also the kind of thing
that makes analyzing C++ a difficult problem.
YMMV but at least with my experience I have my doubts that it'll serve
your needs.

Thanks for the warning.

Scott
 
B

Balog Pal

Ira Baxter said:
It will do the overload on the template instantiation and tell you
precisely
which f is being invoked. It doesn't actually instantiate the template.

Then how it takes SFINAE in consideration, that is IMO needed for the
decision?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top