reformat to tool/editor-compliant C style?

  • Thread starter Hallvard B Furuseth
  • Start date
H

Hallvard B Furuseth

I'd like to reformat an open source package (OpenLDAP) to a new C style.
8M code, 0.3M lines. With some currently hopeless formatting rules like
tab-width=4. I'm sure that made sense once to save disk space for the
source code, or something:-( The project's normal rule is "don't
reformat unnecessarily" since it makes source control merge/diff
difficult. That hasn't helped the style over time either.

Anyway, can someone recommend a "tool-friendly" style, which Indent,
Emacs, Vi or whatever all can be easily configured to produce? Doesn't
hurt if whatever is used on Windows can it too. Any clever ideas to
keep future code more or less conformant to the chosen style, without
getting too anal about it?

I thought I'd just pick something close to Emacs "stroustrup" style and
give Indent the options for that, but at least Gnu Indent isn't willing
to produce just the same. In particular, it aligns variable
declarations like this:
int foo;
int **bar;
with the declared identifier aligned. Which is cool, but it's not what
one naturally produces with an editor. Have I missed some Indent option
to prevent it? Or some other indent program, maybe? Or are there
"active" Emacs and Vi modes that help produce this?

I don't see much point in re-indenting to some indentation style only to
see newly written code immediately deviate from it. Though I guess one
could just say "follow The Style everyone" and be anal about it.

Or maybe we should just not indent variable and struct member names:-(
Looks far less readable to me though.


We'll likely keep some quirks of the current formatting anyway, like
foo_bar_baz(one, two,
three);
instead of
foo_bar_baz(one, two,
three);
due to long functions with long parameter lists passed long argument
names.


How much work am I looking at? Have anyone here done a similar job? I
don't normally put too much work into OpenLDAP over a short time. Even
with Indent it looks like it'll be a fair amount of work to re-indent
and then prettify what Indent produces. In particular comments and the
too many deeply nested functions. On the other hand, the job of
splitting up big functions and making sure the change really didn't
affect program logic looks tedious too. I can't test all the changes
either, and I have no idea how some of the code works.

And the headers will likely have to be done by hand anyway, due to macro
magic in public declarations.
 
N

Nick Keighley

I'd like to reformat an open source package (OpenLDAP) to a new C style.
8M code, 0.3M lines. With some currently hopeless formatting rules like
tab-width=4.

what's wrong with that?!

I'm sure that made sense once to save disk space for the
source code, or something:-(

wouldn't replacing spaces with tabs do that? Were disks ever
*that* small? I've worked on old mini-computers and even
then we didn't try to save space at the *source* level.
The project's normal rule is "don't
reformat unnecessarily" since it makes source control merge/diff
difficult.

use a decent configuration control system and make sure
you don't mix source code changes with reformats. Clearly label
the reformat versions.

can't help you with indent

<snip>
 
H

Hallvard B Furuseth

Nick said:
what's wrong with that?!

Tab width 8 is the norm elsewhere, at least in the Unix world and with
published ASCII text. So some code gets written with tab-width 4, some
with 8. Even if everyone sets indentation = tab-width, code still gets
misaligned. This written with tab-width 4 (I've substituted spaces):

int foo; /* hi there */
const char *barbaz[256]; /* and here */

becomes this when displayed with tab-width 8:

int foo; /* hi there */
const char *barbaz[256]; /* and here */
wouldn't replacing spaces with tabs do that?

Not if tab width > indentation level. (e.g. 8 and 4).
Were disks ever *that* small? I've worked on old mini-computers and
even then we didn't try to save space at the *source* level.

Who knows. If the point was not to save space, I have no idea at all
what the point was.
use a decent configuration control system and make sure
you don't mix source code changes with reformats. Clearly label
the reformat versions.

Actually I'm not sure what a configuration control system is, as opposed
to source control. This project is still using CVS, and I'm not going
to fight about that. In any case the reformatting changes will
certainly not pay attention to the "don't reformat" rule:)
 
J

Johannes Bauer

Hallvard said:
Who knows. If the point was not to save space, I have no idea at all
what the point was.

The point of using tabs is that everyone can easily convert to something
convenient. I find a value of 4 spaces perfect, some prefer 2, some
prefer 8. I've also seen 3.

You suggest that you switch to 8 spaces (i.e. no tabs) which seems to me
like a really not that good idea. After all everyone should be able to
display it the way he/she wants - not the way the programmer liked it.
Code indented by spaces is an annoying pest, IMHO.

Regards,
Johannes
 
Z

zoot51

You suggest that you switch to 8 spaces (i.e. no tabs) which seems to me
like a really not that good idea. After all everyone should be able to
display it the way he/she wants - not the way the programmer liked it.
Code indented by spaces is an annoying pest, IMHO.

If you wish to maintain vertical aligment for whatever reason, then
you have to use spaces.

Of course having vertical alignment in source code is a matter of
personal taste

Reducing indent (tab) size increases the number of nested blocks that
dont require line wrapping, or allows the use of longer variable
names.

Increasing the indent size reduces the risk of vertical alignment
being upset if the tab size is changed as possibly only a single tab
is required to maintain alignment
 
C

CBFalconer

Hallvard said:
.... snip ...

Anyway, can someone recommend a "tool-friendly" style, which
Indent, Emacs, Vi or whatever all can be easily configured to
produce? Doesn't hurt if whatever is used on Windows can it too.
Any clever ideas to keep future code more or less conformant to
the chosen style, without getting too anal about it?

I use indent v2.9 with the following configuration file (indent.pro
on Windoze). Indent is available to run on anything. The
following is a single line:

-kr -l66 -i3 -bad -di16 -lc66 -nce -ncs -cbi0 -bbo -pmt -psl -ts1
-cdw -ppi 3
 
I

Ian Collins

Jack said:
Apparently you never coded C under CP/M 80 on single density 8" floppy
disks.
Forget the disks, getting the source in memory was enough of a challenge!
 
H

Hallvard B Furuseth

Mark said:
Hallvard said:
So some code gets written with tab-width 4, some
with 8. Even if everyone sets indentation = tab-width, code still gets
misaligned. This written with tab-width 4 (I've substituted spaces):
int foo; /* hi there */
const char *barbaz[256]; /* and here */
becomes this when displayed with tab-width 8:
int foo; /* hi there */
const char *barbaz[256]; /* and here */

So reset your editor to use tab widths of 4. Job done.

I should have clarified: The code was _originally_ written with tab
width 4. But by now it's sprinkled with quite a bit written with tab
widht 8. Open-source project, several authors... So it gets wrong
either way.
 
H

Hallvard B Furuseth

I said:
I'd like to reformat an open source package (OpenLDAP) to a new C style.
8M code, 0.3M lines. With some currently hopeless formatting rules like
tab-width=4. (...)

Okay, I've seriously miscommuniated here...

If I do this, I'll reformat the code to what will become the project's
new style, and commit the results. Assuming it gets OKed. When the
changes quiet down we'll presumably go back to the "don't reformat
unnecessarily" rule.

The project probably had a consistent style once. Who knows what it was
exactly. Currently it's not consistent. The code looks best with
tab-width 4, but some of it has tab-width 8. There are strange spacing
conventions around parens - often space inside parens, except when there
aren't. Function names start at first column, except when they don't...

I suppose a new project rule could be "always use indent before
committing, except for .h files."

Anyway, tab-width 4 seems to be the major reason the style grows
inconsistent. Code written with tab-width 8 does get committed every
now and then. And we're mainly Unix people. So -

Mark said:
Er no, but thats a religious war so don't go there.

Actually, please go there. If I can _not_ expect consistency to improve
if I reformat with width 8, I'd certainly like to know. I know Windows
is different, but I thought that was with richer formats than ASCII.

Still, yeah, I was afraid "this is a waste of time" would be the correct
answer.


Johannes said:
The point of using tabs is that everyone can easily convert to something
convenient. I find a value of 4 spaces perfect, some prefer 2, some
prefer 8. I've also seen 3.

That's a good point, and it'd have worked if the code had not followed
non-whitespace with tabs for alignment.

I notice uncrustify which Roland posted can help with that - it has
options to indent with tabs but align with spaces. Thanks!

Also it might make the job more manageable to get rid of lines that
depend on tab-width - e.g. move comments from the right of code to the
previous line.
You suggest that you switch to 8 spaces (i.e. no tabs) which seems to me
like a really not that good idea.

That's not what I meant, I just wanted to illustrate what some of the
code looked like. Whatever tab-width I choose.
After all everyone should be able to display it the way he/she wants -
not the way the programmer liked it. Code indented by spaces is an
annoying pest, IMHO.

Well, after I was done it'd be indented by both spaces and tabs. Unless
I rearrange it to primarily have indent-width = tab-width = 8 and
hopefully work otherwise. Would need to split functions so they don't
run past 80 chars though...
 
R

Richard

Hallvard B Furuseth said:
Okay, I've seriously miscommuniated here...

If I do this, I'll reformat the code to what will become the project's
new style, and commit the results. Assuming it gets OKed. When the
changes quiet down we'll presumably go back to the "don't reformat
unnecessarily" rule.

The project probably had a consistent style once. Who knows what it was
exactly. Currently it's not consistent. The code looks best with
tab-width 4, but some of it has tab-width 8. There are strange spacing
conventions around parens - often space inside parens, except when there
aren't. Function names start at first column, except when they don't...

I suppose a new project rule could be "always use indent before
committing, except for .h files."

Anyway, tab-width 4 seems to be the major reason the style grows
inconsistent. Code written with tab-width 8 does get committed every
now and then. And we're mainly Unix people. So -



Actually, please go there. If I can _not_ expect consistency to improve
if I reformat with width 8, I'd certainly like to know. I know Windows
is different, but I thought that was with richer formats than ASCII.

Still, yeah, I was afraid "this is a waste of time" would be the correct
answer.




That's a good point, and it'd have worked if the code had not followed
non-whitespace with tabs for alignment.

I notice uncrustify which Roland posted can help with that - it has
options to indent with tabs but align with spaces. Thanks!

Also it might make the job more manageable to get rid of lines that
depend on tab-width - e.g. move comments from the right of code to the
previous line.


That's not what I meant, I just wanted to illustrate what some of the
code looked like. Whatever tab-width I choose.


Well, after I was done it'd be indented by both spaces and tabs. Unless
I rearrange it to primarily have indent-width = tab-width = 8 and
hopefully work otherwise. Would need to split functions so they don't
run past 80 chars though...

I strongly advise against using tabs. Any editor worth its salt treats
spaces like tabs.
 
C

CBFalconer

Mark said:
That doesn't make sense to me. A Tab is a tab - your editor should
convert 0x09 into whatever tabspace you've defined in your editor.

Presumably you mean that some moron has tab-to-spaced your code
twice, once at 4s/t and once at 8s/t.

In which case why not just run it through indent, setting
appropriate tabs (not spaces)?

Your software seems to be absorbing the message attributions.
Please fix.
 
H

Hallvard B Furuseth

Mark said:
That doesn't make sense to me. A Tab is a tab - your editor should
convert 0x09 into whatever tabspace you've defined in your editor.

Yes, and either way existing code gets displayed misaligned.
Presumably you mean that some moron has tab-to-spaced your code twice,
once at 4s/t and once at 8s/t.

Maybe some places, but mostly "some moron" has written code which looks
fine with tab-width 8 but not 4. Maybe because it wasn't documented
anywhere that the code was written with tab-width 4.

If someone writes this with tab-width = indentation = 4, it gets aligned
fine:
int foo; /* ... */
struct Bar bar[256]; /* ... */
And if someone else writes this with tab-width = indentation = 8, it
gets aligned fine:
int baz; /* ... */
struct Bar quux[256]; /* ... */
but when you have both variants mixed up in the code base, there is no
tab-width which will align all of the code fine. With tab-width 8, the
first example looks like
int foo; /* ... */
struct Bar bar[256]; /* ... */
With tab-width 4, the 2nd example looks like
int baz; /* ... */
struct Bar quux[256]; /* ... */

Also, of course, occasional lines are indented with tabs+spaces
(tab-width 8, indentation 4).
In which case why not just run it through indent, setting appropriate
tabs (not spaces)?

First, I don't know how other people _write_ code, in particular with
non-Emacs. Thus the question of what kind of style is easy to produce
with other editors.

Second, because indent isn't smart enough. With macro magic, multiline
macros, too creative formatting and macros etc. So I'll have to walk
through the result and clean up. Like this piece of formatting which
is just semantically wrong:

/* try foo */
if (foo) {
handle foo;

/* otherwise try bar */
} else if (bar) {
handle bar;

/* oh dear */
} else {
error();
}

Also indent doesn't really understand C, so I suppose it could make
semantic changes. Not sure that's worth worrying about though.
Probably more likely that I'd make a typo when cleaning up.
 
H

Hallvard B Furuseth

[Argh, forgot to untabify an example. Superseding article...]

Mark said:
That doesn't make sense to me. A Tab is a tab - your editor should
convert 0x09 into whatever tabspace you've defined in your editor.

Yes, and either way existing code gets displayed misaligned.
Presumably you mean that some moron has tab-to-spaced your code twice,
once at 4s/t and once at 8s/t.

Maybe some places, but mostly "some moron" has written code which looks
fine with tab-width 8 but not 4. Maybe because it wasn't documented
anywhere that the code was written with tab-width 4.

If someone writes this with tab-width = indentation = 4, it gets aligned
fine:
int foo; /* ... */
struct Bar bar[256]; /* ... */
And if someone else writes this with tab-width = indentation = 8, it
gets aligned fine:
int baz; /* ... */
struct Bar quux[256]; /* ... */
but when you have both variants mixed up in the code base, there is no
tab-width which will align all of the code fine. With tab-width 8, the
first example looks like
int foo; /* ... */
struct Bar bar[256]; /* ... */
With tab-width 4, the 2nd example looks like
int baz; /* ... */
struct Bar quux[256]; /* ... */

Also, of course, occasional lines are indented with tabs+spaces
(tab-width 8, indentation 4).
In which case why not just run it through indent, setting appropriate
tabs (not spaces)?

First, I don't know how other people _write_ code, in particular with
non-Emacs. Thus the question of what kind of style is easy to produce
with other editors.

Second, because indent isn't smart enough. With macro magic, multiline
macros, too creative formatting and macros etc. So I'll have to walk
through the result and clean up. Like this piece of formatting which
is just semantically wrong:

/* try foo */
if (foo) {
handle foo;

/* otherwise try bar */
} else if (bar) {
handle bar;

/* oh dear */
} else {
error();
}

Also indent doesn't really understand C, so I suppose it could make
semantic changes. Not sure that's worth worrying about though.
Probably more likely that I'd make a typo when cleaning up.
 
N

Nick Keighley

Mark McIntyre writes:
That doesn't make sense to me. A Tab is a tab - your editor should
convert 0x09 into whatever tabspace you've defined in your editor.

Yes, and either way existing code gets displayed misaligned.
Presumably you mean that some moron has tab-to-spaced your code twice,
once at 4s/t and once at 8s/t.

Maybe some places, but mostly "some moron" has written code which looks
fine with tab-width 8 but not 4.  Maybe because it wasn't documented
anywhere that the code was written with tab-width 4.

If someone writes this with tab-width = indentation = 4, it gets aligned
fine:
        int                     foo;                    /* ... */
        struct Bar      bar[256];               /* ... */
And if someone else writes this with tab-width = indentation = 8, it
gets aligned fine:
        int             baz;            /* ... */
        struct Bar      quux[256];      /* ... */
but when you have both variants mixed up in the code base, there is no
tab-width which will align all of the code fine.  With tab-width 8, the
first example looks like
        int                     foo;                    /* ... */
        struct Bar      bar[256];               /* ... */

I think I could live with that...


With tab-width 4, the 2nd example looks like
        int             baz;            /* ... */
        struct Bar      quux[256];      /* ... */

Also, of course, occasional lines are indented with tabs+spaces
(tab-width 8, indentation 4).
In which case why not just run it through indent, setting appropriate
tabs (not spaces)?

no. *remove* all the tabs. Its the tabs that cause the problem.

First, I don't know how other people _write_ code, in particular with
non-Emacs.  Thus the question of what kind of style is easy to produce
with other editors.

Does emacs format your code that much? The editor I use fiddles
with the indentation a bit but otherwise leaves things alone
(if it tried to do anything else the feature would be rapidly
disabled- or the editoe dumped).

Second, because indent isn't smart enough.  With macro magic, multiline
macros, too creative formatting and macros etc.  So I'll have to walk
through the result and clean up.

I havn't used indent for a while but generally I thought it was pretty
good.
It tended to messs up some of my comment blocks.

 Like this piece of formatting which
is just semantically wrong:

*semantically* wrong? Why? I hate K&R style { on the same
line as the code and comments outside the block. But the semantics
seem
plain enough.

        /* try foo */
        if (foo) {
                handle foo;

        /* otherwise try bar */
        } else if (bar) {
                handle bar;

        /* oh dear */
        } else {
                error();
        }

with my layout this would look like this.

if (foo)
{
handle foo;
}
else if (bar)
{
handle bar;
}
else
{
error();
}

or even

if (foo)
handle foo;
else if (bar)
handle bar;
else
error();

is *that* "semantically* wrong? I'm beginning to wonder
if one of us has a non-standard meaning for "semantic"...

Also indent doesn't really understand C,
ok...

so I suppose it could make
semantic changes.  

not by *my* definition of "semantic". I've never seen
indent screw up a correct peoce of code. The layout may
be strange but in my experience the code did the same thing
before and after.
Not sure that's worth worrying about though.

it would sure worry me if it was happening!

Probably more likely that I'd make a typo when cleaning up.

yes


--
Nick Keighley

"To every complex problem there is a simple solution... and it is
wrong."
-- Turski
 
H

Hallvard B Furuseth

Nick said:
With tab-width 8, the
first example looks like
        int                     foo;                    /* ... */
        struct Bar      bar[256];               /* ... */

I think I could live with that...

Well, two lines are hardly a problem. 20 gets a bit more annoying.
no. *remove* all the tabs. Its the tabs that cause the problem.

Well, that's one option.
If people refrain from re-inserting tabs...
Does emacs format your code that much? The editor I use fiddles
with the indentation a bit but otherwise leaves things alone
(if it tried to do anything else the feature would be rapidly
disabled- or the editoe dumped).

Emacs is programmable and very (too?) configurable, there is e.g. a
key binding to re-indent a block of code according to my chosen style.
In C files, the Tab key by default re-indents the line. It can be
configured to not do that, or to only do it when there is only
whitespace to the left of the cursor. Or I could write a function
which inserts spaces following non-tab characters but tab otherwise.
There is a key which inserts /* */ at the configured comment column
or moves an existing comment to that column. Etc.

I've hardly used other editors for C files, so I don't know how much
help they give and what kind of styles are natural to produce with
those. (E.g. if "indent with tab, align to a configured comment column
with spaces" is easy.)
*semantically* wrong? Why? I hate K&R style { on the same line as the
code and comments outside the block. But the semantics seem plain
enough.

I meant the comment for one if-test is inside the previous {}, so there
little reason to expect any program to be able to indent it right.
Indentation gives (editing the quoted text a bit):

/* try foo */
if (foo) {
handle foo;

/* otherwise try bar */ <<< whoops <<<
} else if (bar) {
handle bar;
(...)

Yet it would have been so easy to write it "indent-friendly":

/* try foo */
if (foo) {
handle foo;
}

/* otherwise try bar */
else if (bar) {
handle bar;
(...)
not by *my* definition of "semantic". I've never seen
indent screw up a correct peoce of code. The layout may
be strange but in my experience the code did the same thing
before and after.

I've seen old documentation that warned that it could happen.
Getting confused and inserting/removing whitespace in strings or macro
arguments, or something. However I expect the keyword here is "old".
 
A

August Karlstrom

Johannes said:
The point of using tabs is that everyone can easily convert to something
convenient. I find a value of 4 spaces perfect, some prefer 2, some
prefer 8. I've also seen 3.

You suggest that you switch to 8 spaces (i.e. no tabs) which seems to me
like a really not that good idea. After all everyone should be able to
display it the way he/she wants - not the way the programmer liked it.
Code indented by spaces is an annoying pest, IMHO.

I agree wholeheartedly. It is more flexible to use logical indentation
steps (tabs). Moreover, tabs are easier to edit than sequences of spaces
when you do not rely on automatic indentation. To make editing with a
simple editor easier I also never insert newlines in comment paragraphs
but use line wrapping instead.


August
 
A

August Karlstrom

Richard said:
I strongly advise against using tabs. Any editor worth its salt treats
spaces like tabs.

Well, I disagree. In my opinion ease of editing with the simplest editor
should be the goal here (to make the code "editor independent"). In this
case tabs are preferable.


August
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top