Writing a C++ Style Checker

I

ids

Hi,

I'm new to Perl. I'm trying to use Perl to write a C++ Style Checker
to validate various coding standards followed in our organization.

Some of the things I need to do in this tool include:
- verifying whether identifier naming conventions have been followed
- differentiating between member variables and local variables
(because their naming conventions are different)
- determining method/function boundaries
- identifying control structures such as 'if', 'while' etc to see
whether they are written with code blocks (i.e. { }) all the time
- check whether statements are more than a given width (say 100
column)
- etc. etc.

This need not go in to semantics of the program; what I need is a
basic style checker.

What I see is that parsing line by line independently is not going to
help. This parser needs to build context and remember stuff across
lines to satisfy above goals.

Are there any existing Perl based style checkers? If not, can you give
some advice on how best to structure this program? Or else can you
give some good references on *design* aspects of Perl?

(I have a C/C++ background. So I'm familiar with OO design. I'm trying
to develop a *similar mental model* for Perl programs.)

Thanks in advance,
Ishan.
 
B

Ben Bullock

I'm new to Perl. I'm trying to use Perl to write a C++ Style Checker
to validate various coding standards followed in our organization.

Some of the things I need to do in this tool include:
- verifying whether identifier naming conventions have been followed
- differentiating between member variables and local variables
(because their naming conventions are different)
- determining method/function boundaries
- identifying control structures such as 'if', 'while' etc to see
whether they are written with code blocks (i.e. { }) all the time

Well, as a start (probably misses some cases):

print "Aw shucks" if ($mycode =~ /(if|while)[^{}]*?;/s);
- check whether statements are more than a given width (say 100
column)

That doesn't sound hard:

#!/usr/bin/perl
use warnings; use strict;
while (<>) {
print "Line $.: Oops! Too long!\n" if (/^.{100,}$/);
}
- etc. etc.

This need not go in to semantics of the program; what I need is a
basic style checker.

What I see is that parsing line by line independently is not going to
help.

Well, it can do some of this stuff very rapidly.
This parser needs to build context and remember stuff across
lines to satisfy above goals.

Are there any existing Perl based style checkers? If not, can you give
some advice on how best to structure this program? Or else can you
give some good references on *design* aspects of Perl?


(I have a C/C++ background. So I'm familiar with OO design. I'm trying
to develop a *similar mental model* for Perl programs.)

Are you sure your problem is difficult enough to warrant developing a
mental model? Sounds like a relatively simple job for Perl to me. Why not
just code something up and see how it goes?
 
B

Ben Morrow

Quoth ids said:
Hi,

I'm new to Perl. I'm trying to use Perl to write a C++ Style Checker
to validate various coding standards followed in our organization.

Some of the things I need to do in this tool include:
- verifying whether identifier naming conventions have been followed
- differentiating between member variables and local variables
(because their naming conventions are different)
- determining method/function boundaries
- identifying control structures such as 'if', 'while' etc to see
whether they are written with code blocks (i.e. { }) all the time
- check whether statements are more than a given width (say 100
column)
- etc. etc.

This need not go in to semantics of the program; what I need is a
basic style checker.

What I see is that parsing line by line independently is not going to
help. This parser needs to build context and remember stuff across
lines to satisfy above goals.

This is going to be seriously hard work. What you need is a parser for
C++, and as C++ is a *very* complex language this is not going to be
easy to get right, unless you are content to only recognize simple
constructions without parsing the code properly.

There is a Parse::RecDescent grammar for some subset of C++ included in
the Inline::CPP distribution. You may find it useful to start there.

Alternatively, you may be able to persuade your compiler to do the
parsing for you. While some of your criteria above (such as blocks on
ifs) will be lost by such an approach, you may be able to handle these
with a relatively simple parser, leaving the hard work of 'is this
identifier a local or member variable' to the compiler.

What you need to do is either persuade your compiler to produce some
intermediate parsed form of output (such as from gcc's -fdump-* and -d*
options), or compile objects with debugging info and then parse that.
One example of a (now very old) program that does this is c2ph in the
Perl distribution, which was intended to allow access to C structures by
parsing stabs debugging information.
Are there any existing Perl based style checkers?

There is Perl::Critic, for style-checking Perl, but that is based on the
excellent PPI, a Perl module that parses Perl, which took a *lot* of
work to produce.

Basically: you have set yourself an *extremely* hard problem :(. You can
either produce a very 'shallow' and rather incomplete solution, or
produce a proper solution only after a lot of work. OTOH, a proper C++
parser for Perl would probably be a good thing... :)

Ben
 
B

Ben Bullock

This is going to be seriously hard work. What you need is a parser for
C++, and as C++ is a *very* complex language this is not going to be
easy to get right, unless you are content to only recognize simple
constructions without parsing the code properly.

Similarly, one also needs to completely parse the English language in order
to write a spelling checker. For example to check for "right" misspelt
as "write" requires one to comprehensively parse all possible English
sentences, distinguish between verbs and adjectives, and detect the word
"right" where a verb should be. Making such a spelling checker is going to
be seriously hard work too - maybe it will take the rest of your life. But
a simple system which catches 99% of errors is a few lines of code.
 
B

Ben Morrow

Quoth Ben Bullock said:
Similarly, one also needs to completely parse the English language in order
to write a spelling checker. For example to check for "right" misspelt
as "write" requires one to comprehensively parse all possible English
sentences, distinguish between verbs and adjectives, and detect the word
"right" where a verb should be. Making such a spelling checker is going to
be seriously hard work too - maybe it will take the rest of your life. But
a simple system which catches 99% of errors is a few lines of code.

Heh, yes, of course. However, what 'ids' is looking for is more akin to
a grammar checker than a spellchecker; in fact, it's similar in spirit
to the MSWord 'grammar' checker (many of its admonitions are more
matters of style than incorrect grammar per se), which does actually
have a pretty good grasp of English grammar nowadays.

In any case, distinguishing a local variable from a class member (or,
indeed, identifying a declaration at all in C++) is going to require a
good deal more than a bit of pattern matching, which was really my
point.

Ben
 
I

ids

This is going to be seriously hard work. What you need is a parser for
C++, and as C++ is a *very* complex language this is not going to be
easy to get right, unless you are content to only recognize simple
constructions without parsing the code properly.

There is a Parse::RecDescent grammar for some subset of C++ included in
the Inline::CPP distribution. You may find it useful to start there.

Alternatively, you may be able to persuade your compiler to do the
parsing for you. While some of your criteria above (such as blocks on
ifs) will be lost by such an approach, you may be able to handle these
with a relatively simple parser, leaving the hard work of 'is this
identifier a local or member variable' to the compiler.

What you need to do is either persuade your compiler to produce some
intermediate parsed form of output (such as from gcc's -fdump-* and -d*
options), or compile objects with debugging info and then parse that.
One example of a (now very old) program that does this is c2ph in the
Perl distribution, which was intended to allow access to C structures by
parsing stabs debugging information.


There is Perl::Critic, for style-checking Perl, but that is based on the
excellent PPI, a Perl module that parses Perl, which took a *lot* of
work to produce.

Basically: you have set yourself an *extremely* hard problem :(. You can
either produce a very 'shallow' and rather incomplete solution, or
produce a proper solution only after a lot of work. OTOH, a proper C++
parser for Perl would probably be a good thing... :)

Well, I guess what I need is something in between the two ends that
two of you proposed.

What BenB suggested is not sufficient. Applying pattern matchings on a
line by line basis is not going help. For example, it won't allow me
to recognize a function implementation.

In a BNF grammar we can define a function using something similar to
the following.

func_impl : type_spec IDENTIFIER '(' opt_param_list ')' block ;

block : '{' opt_statement_list '}' ;

// define the other non terminals here

In order to do this, you need to *build state* as and when you scan
through. i.e. you need to remember that you saw the opening brace in a
previous line and you find the matching end brace in another line
below. This gets even difficult because of nested blocks. So you need
to keep pushing and popping braces.

Building of state requires some well organized data structures. I can
visualize how this is to be done with a language like C++. What you
need is a set of classes with a suited inheritance structure. I don't
know how to do it with Perl. That's why asked about a mental model. I
suppose Perl does support the OO paradigm, but I didn't find any
material to read about it. I mean, I would like to read "OOA/D with
Perl" sort of thing.

The alternative here is to use Flex/Bison with C++. The problem is the
complexity of the grammar to handle C++. Why I thought I would try
with Perl is because of the powerful pattern matching ability. But,
whether I use Flex/Bison or I use Perl, the need to parse the grammar
is still there. That's what BenM has said.

I will try starting off with a very simple thing and then expanding
it.

Thanks for your help.

Cheers,
Ishan.
 
H

Helmut Wollmersdorfer

ids said:
What I see is that parsing line by line independently is not going to
help. This parser needs to build context and remember stuff across
lines to satisfy above goals.

You can start with simple "one-liners" to catch the "low hanging fruits".

My first Perl script was something like

if ($line =~ m/$regex_pattern/) {
print "$line\n";
}

to find errors in a very large XML file.
Are there any existing Perl based style checkers? If not, can you give
some advice on how best to structure this program?

You should look into the source code and documentation of PPI and
Perl::Critic. Both have a very nice architecture.

Helmut Wollmersdorfer
 
B

Ben Bullock

What BenB suggested is not sufficient. Applying pattern matchings on a
line by line basis is not going help. For example, it won't allow me
to recognize a function implementation.

Well, if it gets you 50% of the result for 1% of the effort ...
In a BNF grammar we can define a function using something similar to
the following.

func_impl : type_spec IDENTIFIER '(' opt_param_list ')' block ;

block : '{' opt_statement_list '}' ;

// define the other non terminals here

In order to do this, you need to *build state* as and when you scan
through. i.e. you need to remember that you saw the opening brace in a
previous line and you find the matching end brace in another line
below. This gets even difficult because of nested blocks. So you need
to keep pushing and popping braces.

How much state do you really care about? I don't think you need to "parse
C++" to do this. Incidentally I once wrote a program to create C header
files from ANSI C code:

http://sourceforge.net/projects/cfunctions

This gets out function declarations from the C source based on a lex (flex)
parser and one stack written in C by basically discarding anything inside
a function. (If I had done it in Perl it would have been easier. I don't
recommend writing it in C.)
Building of state requires some well organized data structures. I can
visualize how this is to be done with a language like C++. What you
need is a set of classes with a suited inheritance structure. I don't
know how to do it with Perl. That's why asked about a mental model. I
suppose Perl does support the OO paradigm, but I didn't find any
material to read about it. I mean, I would like to read "OOA/D with
Perl" sort of thing.

I don't know if it's what you want, but there are two O'Reilly books on
Perl objects, "Learning Perl Objects, References and Modules" and
"Intermediate Perl".
The alternative here is to use Flex/Bison with C++. The problem is the
complexity of the grammar to handle C++. Why I thought I would try with
Perl is because of the powerful pattern matching ability. But, whether I
use Flex/Bison or I use Perl, the need to parse the grammar is still
there. That's what BenM has said.

But I expect you can just throw away most of the grammar - you don't need
to parse it but rather just discard most of it. If you're just
using the program to automatically check for style mistakes, where I
assume that it is not a fatal problem if you turn up a false positive or
miss one or two badly named variables, "lazy" methods can do most of the
job.
 
T

Ted Zlatanov

i> What BenB suggested is not sufficient. Applying pattern matchings on a
i> line by line basis is not going help. For example, it won't allow me
i> to recognize a function implementation.

i> In a BNF grammar we can define a function using something similar to
i> the following.

i> func_impl : type_spec IDENTIFIER '(' opt_param_list ')' block ;
i> block : '{' opt_statement_list '}' ;
i> // define the other non terminals here

If you feel comfortable with this kind of grammar definition, definitely
look at the existing Parse::RecDescent solutions and try to extend one
of them.

I would suggest that coding standards are much easier to enforce by peer
review, especially since peer review will catch many errors (bugs and
inefficiencies) that an automatic checker won't. So if that's your
goal, please consider what you'll accomplish versus what you are
actually trying to do.

Ted
 
B

Ben Morrow

Quoth ids said:
In a BNF grammar we can define a function using something similar to
the following.

func_impl : type_spec IDENTIFIER '(' opt_param_list ')' block ;

block : '{' opt_statement_list '}' ;

// define the other non terminals here

In order to do this, you need to *build state* as and when you scan
through. i.e. you need to remember that you saw the opening brace in a
previous line and you find the matching end brace in another line
below. This gets even difficult because of nested blocks. So you need
to keep pushing and popping braces.

There are several parser modules on CPAN. Parse::RecDescent is very
flexible but rather slow; Parse::Yapp is a direct clone of yacc, which
it sounds like you're familiar with. The grammar in Inline::CPP is
written with P::RD, and may well be sufficient for your needs.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,782
Latest member
ThomasGex

Latest Threads

Top