Remove the comments and excess white space in C source code

F.F. · Sep 20, 2013

Hi guys,
I wrote a program to strip excess white spaces and comments from C
source code. Please check it out. Any comments would be appreciated.

https://github.com/fangfufu/C-unformatter

Cheers,
F.F.

Eric Sosman · Sep 20, 2013

Hi guys,
I wrote a program to strip excess white spaces and comments from C
source code. Please check it out. Any comments would be appreciated.

https://github.com/fangfufu/C-unformatter

I took only a brief look, so my remarks may be incomplete.
In no particular order:

- Interchange lines 219-221 with 222-224.

- The tests at line 246 are wrong because of the `char' type.

- I think check_preprocessor_statements() will fail if the
source starts with white space (e.g., newlines) followed by '#'.

- I think lines 168-173 will mess up source constructs like
`x = y / *ptr;', turning them into (unterminated) comments.

- Trigraph sequences and "digraphs" aren't handled properly.
(This could be considered a feature rather than a bug.)

- I don't think white space before what you call "tokens" will
be removed. For example, it looks like `puts ( "Hello" ) ;' will
become `puts ("Hello");' rather than `puts("Hello");'.

- Lines that end with a backslash-newline pair aren't handled
properly.

- Lines 119-141 are a *terrible* idea! One crummy little
I/O error (or bug!), and you can kiss your source code good-bye!

- Speaking of I/O errors, rip_file() is careful to detect
them but not so careful about closing FILE streams afterward.
(In fact, it never closes the overwritten input which is its
principal output, so never gets a chance to detect errors in
closing -- but since the original source is already trashed by
then it may not make much difference. Even if all goes well,
though, rip_file() leaks an open FILE stream for each source it
processes; feed it enough sources and it may well run out.)
(Hmmm: I wonder what happens if you mention the same source
file name twice on the command line ...)

- Higher-level remark: I think the program might be simpler
if re-cast as a state machine, instead of spreading the logic
across a whole bunch of brittle-looking functions. ("Brittle"
because there's always this question about whether the function
has or has not swallowed the current character, and perhaps more;
that's the sort of thing that's easy to lose track of.) This looks
more like a job for one simple loop surrounding a big `switch'
statement, with cases corresponding to the current context.

Tim Rentsch · Sep 22, 2013

F.F. said:
Hi guys,
I wrote a program to strip excess white spaces and comments from C
source code. Please check it out. Any comments would be appreciated.

In addition to Eric Sosman's list (and overlap in some
cases), I would list these problems:

1. Some spaces that can be taken out aren't.

2. Some cases where spaces must be left in are not,
eg, return/**/ 0;

3. Comments are not removed from preprocessor
directives.

4. Line boundaries ignored when deciding whether
a '#' starts a preprocessor directive.

5. Preprocessor directives after regular program
text don't have a newline inserted before them.
Or apparenly only sometimes don't, eg

int main(){
#define FOO 1

misbehaves.

6. There needs to be a final newline added if the
last output line is non-empty (which it almost
always will be in real programs).

7. The formatting program generally assumes its
input is well-formed C source, with little or
no effort to detect bad input.

8. Approach is generally too simplistic to be
completely effective, especially if it matters
what happens with spaces in macro expansions,
which it does in some programs because of how
the stringizing operator works.

Tim Rentsch · Sep 22, 2013

Eric Sosman said:
[how to process C source to remove spaces, comments, etc]

- Higher-level remark: I think the program might be simpler
if re-cast as a state machine, instead of spreading the logic
across a whole bunch of brittle-looking functions. ("Brittle"
because there's always this question about whether the function
has or has not swallowed the current character, and perhaps more;
that's the sort of thing that's easy to lose track of.) This looks
more like a job for one simple loop surrounding a big `switch'
statement, with cases corresponding to the current context.

That turns out to be a lot harder than it might seem, because of
interactions between the different levels of textual processing
(trigraphs, line splicing, comments, preprocessor lines, etc),
not to mention the question of when adjacent tokens can be
safely agglutinated.

One of the comments on the Python video made me laugh out loud	1	Jul 31, 2024
How can I remove the extra space marked in the image attached to my Email HTML template?	2	Feb 25, 2023
How does DataFrame.interpolation() work (in its source code)?	0	Mar 26, 2022
Why is this WordPress comments form not submitting?	1	Jan 12, 2020
program which removes comments from C source	9	Jun 14, 2011
Remote SSH and Configuring code help	0	Dec 13, 2023
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Trackball emulation using the mouse.	0	May 22, 2023

Remove the comments and excess white space in C source code

F.F.

Eric Sosman

Tim Rentsch

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads