Hi guys,
I wrote a program to strip excess white spaces and comments from C
source code. Please check it out. Any comments would be appreciated.
https://github.com/fangfufu/C-unformatter
I took only a brief look, so my remarks may be incomplete.
In no particular order:
- Interchange lines 219-221 with 222-224.
- The tests at line 246 are wrong because of the `char' type.
- I think check_preprocessor_statements() will fail if the
source starts with white space (e.g., newlines) followed by '#'.
- I think lines 168-173 will mess up source constructs like
`x = y / *ptr;', turning them into (unterminated) comments.
- Trigraph sequences and "digraphs" aren't handled properly.
(This could be considered a feature rather than a bug.)
- I don't think white space before what you call "tokens" will
be removed. For example, it looks like `puts ( "Hello" ) ;' will
become `puts ("Hello");' rather than `puts("Hello");'.
- Lines that end with a backslash-newline pair aren't handled
properly.
- Lines 119-141 are a *terrible* idea! One crummy little
I/O error (or bug!), and you can kiss your source code good-bye!
- Speaking of I/O errors, rip_file() is careful to detect
them but not so careful about closing FILE streams afterward.
(In fact, it never closes the overwritten input which is its
principal output, so never gets a chance to detect errors in
closing -- but since the original source is already trashed by
then it may not make much difference. Even if all goes well,
though, rip_file() leaks an open FILE stream for each source it
processes; feed it enough sources and it may well run out.)
(Hmmm: I wonder what happens if you mention the same source
file name twice on the command line ...)
- Higher-level remark: I think the program might be simpler
if re-cast as a state machine, instead of spreading the logic
across a whole bunch of brittle-looking functions. ("Brittle"
because there's always this question about whether the function
has or has not swallowed the current character, and perhaps more;
that's the sort of thing that's easy to lose track of.) This looks
more like a job for one simple loop surrounding a big `switch'
statement, with cases corresponding to the current context.