Stylistic questions on UNIX C coding.

J

Jerry Friedman

They're both equally clear, but they are communicating different things.
One of them is telling us something we might not already know, about an
entity already known to be your brother.  The other is telling us something
we might not already know, about an entity already known to be Fred Thompson.

Similarly, there's a big difference between "the person who committed the
murder was the butler" and "the butler committed the murder".  The former
is giving you information about a murder you already know about, the other
is giving you information about a butler you already know about.
....

For those interested in what linguists have said about this, the terms
seem to be "topic" and "comment" or "theme" and "rheme". In English,
the topic usually goes before the comment. Of course even in a single
language the matter is complicated, and different linguists use
different approaches and different terminology.
 
K

Keith Thompson

Tim Rentsch said:
Keith Thompson said:
Ian Collins said:
Tim Rentsch wrote:
Anand Hariharan wrote:
<snip>
Haven't seen anyone point this out:

Rather than -

#define MAXNUMFILES 1024

- prefer -

const int MaxNumFiles = 1024;


That way your preprocessor won't do as much damage.
Fine in C99, I think, but an issue in C90 if he's using it to define
an array size.
It's a problem in C99 too, if the array is defined at file scope or it
has internal linkage. There are other reasons why it's not a great
idea in C99. They stem from the fact that MaxNumFiles is not
permitted as part of a constant expression. [snip elaboration]

Minor clarification -- MaxNumFiles is _permitted_ as part of a constant
expression, albeit an implementation-specific constant expression;
it just isn't _required_ to be a portable constant expression.

What? You could say just about any nonsense is permitted as part of
an implementation-specific expression.

That doesn't alter that fact that in C90 or C99, MaxNumFiles is not
permitted as part of a constant expression.

I think Tim is referring to C99 6.6p10:

An implementation may accept other forms of constant expressions.

Quite so.
(I just noticed that this doesn't use the term
"implementation-defined" implying, I think, that an implementation
can accept other forms of constant expressions but isn't required
to document them.)

My belief is that such forms of constant expressions still count
as language extensions. If they are they must be documented,
because extensions are required to be documented.

Then the only purpose of 6.6p10 is to permit implementations *not* to
issue diagnostics on the use of "other forms of constant expressions".
For example:

const int x = 42;
switch (...) {
case x:
...

Normally "case x:" is a constraint violation, but if the
implementation accepts it under 6.6p10, no diagnostic is required.
Plausible, but I'm not 100% convinced either that this is what the
standard says or that it's a good idea. For one thing, the lack
of a diagnostic would make it more difficult to write portable code.
Yes, diagnostics are still required for using any extensions that
is a syntax error or a constraint violation, but not not if they
don't, and that also includes "other forms of constant expressions".
(In other words my assessment here agrees with Keith's.)

I'm not sure mine does. Another reasonable interpretation of 6.6p10
is that it permits implementations to support other forms of constants;
for example:

const int x = 0b10_1010;
/* 0b denotes binary, underscores are ignored */

This is obviously a syntax error, requiring a diagnostic, for an
implementation that doesn't support it. Can an implementation not
emit a diagnostic and point to 6.6p10 to justify it? Or must it
issue a diagnostic and make it an extension under 4p6?

I'd be happy to accept the committee's intent here if I knew what
it was.

[snip]
It seems clear that the point is to allow additional forms of
constant expression without absolutely insisting on generating a
diagnostic; in other words to leave the question of diagnostics
up to the discretion of the implementation. Without 6.6p10 any
other forms of constant expression wouldn't meet the Standard's
definition, and if used in places that need constant expressions
would cause constraint violations.

Personally, I'd prefer to drop the permission in 6.6p10 and allow
other forms of constant expression, like any other extension, under
4p6. A conforming implementation would have to issue a diagnostic
for anything that would be a constraint violation or syntax error
without the extension. If you don't want the diagnostic, invoke
the compiler in a not-quite-conforming mode.
 
E

Ersek, Laszlo

Ben Bacarisse said:
(e-mail address removed) (Ersek, Laszlo) writes:
Either way, it's not really at the heart of the question unless you
think the switch *caused* the typo.

Yes, I thought (and think) that that was not unpossible. If my memory
serves, both Keith and Seebs have pointed out that they need to reorder
reverse-ordered relational operators in their heads (sorry if I'm making
this up now) or that it causes extra mental load or something to that
effect. So the typo may be related to the reversing process.

Cheers,
lacos
 
E

Ersek, Laszlo

Richard Heathfield said:
Ersek, Laszlo wrote:

Again, your comment would only be relevant if this were English, which
it isn't.

Exactly -- I'm a staunch member of the "7 == x" camp. I just tried to
explain why, as I perceive, Seebs thinks what he thinks. (Sorry if this
qualifies as bad etiquette.)

Cheers,
lacos
 
I

ImpalerCore

Exactly -- I'm a staunch member of the "7 == x" camp. I just tried to
explain why, as I perceive, Seebs thinks what he thinks. (Sorry if this
qualifies as bad etiquette.)

Are you a staunch member of the '7 == x' because of style, or because
of the possibility of catching '7 = x' errors? I personally don't
like the style simply because it *is* more mind taxing, and it's a
result of not seeing code in that style, not writing code in that
style, and not thinking about code in that style, for about 8 years
between C++ and C, that my brain has been conditioned that way. I
would be bold enough to say that a large majority of C programmers are
conditioned that way, simply from other people's code and books that
I've seen.

Perhaps you would allow me to scramble the characters on your keyboard
in a way of my choosing that is better? Let's take 'asdfjkl;' and
make it 'arstenil' since those d, f, j, k, and ';' characters are used
much less often, and it requires more finger movement, causing more
cases of carpal tunnel syndrome. Would you be willing to give that a
try? The example is a bit hyperbolish, but that's how it feels to me
looking at '7 == x'.

And the thing is that you may actually like that keyboard better, but
everyone else will still use the standard keyboard, and still complain
if they have to use your keyboard. The inertia of the original style
'x == 7' is too large enough to overcome unless that style has
generated so many problems that I'm motivated to try out '7 == x'. If
you suffered from carpal tunnel syndrome, you may be tempted to give
the new keyboard a try.

I actually went through this process with single line if statements.

if ( condition )
statement;

I have been bitten by this syntax (spending hours pouring over code
where my brain is not registering the syntax error) so many times that
the bracket style

if ( condition ) {
statement;
}

has become my convention. It has saved me time because for whatever
reason, my brain can deal with the syntax errors better.

Lastly, I have no problem with someone wanting to use '7 == x' if it
really does help them from a style perspective. I have an issue when
people use the style just to catch '7 = x' errors when 'x == 7' would
be easier to understand. Hey, I still use 'char* p' simply because I
started in C++ and moved to C, and it's easier for me to use. I'm
sure there are a host of people that will belittle me for still using
that syntax, and I realize I may be in the minority, but I'll still
keep using it since it's more work to relearn and restyle the code
I've already written than to appease the majority.

Again, if I have to work with a team, and it's decided to use 'char
*p', I will gladly acquiesce, but unless I'm forced to code that way
for a long time, I don't foresee myself changing my ways.

Best regards,
John D.
 
S

Seebs

Not entirely convinced. In each case, we wish to know whether [Fred] and
[the man who shot my chicken] are the same person.

No.

In one case, we want to know what Fred did. In the other, we want to know
what happened to the chicken.

For another example: Consider the difference between...
That bottle contains a liter of water.
A liter is the amount of water in that bottle.

One is telling you something about the bottle; the other is offering a
definition of a liter.

Think of it in terms of definitions, and the relationship is clearer; when
you say "let x be any positive integer", that's not the same thing as saying
"any positive integer is x". Descriptions don't have quite as strong a
directionality, but they still do have a noticeable polarity.

In some cases, it's not just the relationship that's being expressed, but
also something about which participants in the relationship are of interest
to us. There is a substantial difference in focus between "the capacitor
grounded through Jim" and "Jim grounded the capacitor". One is a statement
about the failure of an electrical device to continue working, one is a
statement about the failure of a biological device to continue working. That
they describe the same circumstance doesn't mean they describe the same
*focus*.

-s
 
S

Seebs

But you must be careful not to assign additional meaning if it was
unintended by the author.

You can never avoid both Type 1 and Type 2 errors completely. For the most
part, the benefits of assuming that idioms are intentional massively outweigh
the costs of perceiving an idiom was not intended.
If you read my code, it would be unfortunate if you would conclude, from the
bare fact that I wrote the addition as (a+b), that a is the larger of
the two.

And I wouldn't draw any such conclusion. I might, however, think that a
was in some way the "base" value.

a is in theory equally equivalent to both *(a+i) and *(i+a). However,
it is EXTREMELY unusual to write i+a -- because we know that we are adding
an offset to a pointer, not adding a pointer to an offset. That they do
the same thing doesn't matter. The pointer is the one which tells us what
the operation is doing.
But that perception could be misleading.

Yes, it could.

Heuristics don't have to be 100% reliable to be worth using.
Another reason might be that "i < 10" and "10 >= i" are not the same thing.

Whoops. >, not >=. I'm bad at symmetry sometimes. Or alternatively,
probably, "9 >= i".

-s
 
S

Seebs

I don't see how you get that y can't vary, just from seeing if(x != y).

I don't.

I just get the impression that it is much less likely to be the member
whose value is in question.

And I'd guess that if you took a random sample of C code (say, a hundred lines
from each of a couple thousand projects), you'd find that this was true
in the VAST majority of cases. Well over 90%.

Note that a limit like "max" is invariant *in context*, normally.

while (i < len)

usually means that, for the duration of the loop, i will be changing and len
won't. Not always, but it's true often enough that exceptions are surprising.
It's been done both ways before.

But one of them much, much, more often.
Then, much of the time, you expect the wrong thing, since much of the
time there is no invariant involved. If you were reading my code, you
would very often find that, where an invariant /is/ involved in a
comparison for equality, it's on the left. So your preconceptions would
mislead you.

With your code, yes. So I'd realize you were one of the people who does
it the other way. Same as, if I went to England and noticed people driving
on the wrong side of the road, after a couple of hours I'd be used to that.
I am always suspicious of "we do it this way because most people do it
this way". Sturgeon's Law and all that. I'm not saying we should do Y
*because* most people do X; but following the crowd *just* for the sake
of following the crowd is very often a bad idea, because the crowd is
very often heading in the wrong direction.

There is no significant objective advantage to either... But, as with the
side of the road, there is a substantial advantage to having everyone agree.
Sure. So which one is the constant you were expecting to see?

Neither. But I'm fine with a heuristic being imperfect. They are.

It's just that, IF there is one that's more changeable than the other, I
expect it to be on the left.
That's a rather weak basis for claiming a "standard idiom".

It works for 1TBS.

And again, consider why it is that every test in K&R, and in King's book (I
believe), and anywhere else I've looked, uses the same idiom. One of the
reasons is that, having seen it that way, people tend to expect it and write
it that way, and the net result is additional communication.

It's like indentation. We indent code to make it reasonable. We do not,
to borrow your phrase, "keep in mind what the brackets do with their contained
blocks" and ignore indentation conventions; we use them because presenting
things with consistent structure makes it easier for people to chunk the
information and process it effectively.

Imagine that you have two code bases, which are precisely identical, except
that in one of them, this idiom is used consistently wherever it applies, and
in the other, the first code base has been copied, and then half of the
comparisons have had their operands flipped (and the operator changed if
needed). Now have a large number of people try to find a number of difficult
bugs in them.

I would bet you that you would notice a measurable performance cost to the
reliability and speed of debugging from losing the idiom, especially if the
tests were significant to the bugs. It's only a few cycles here and there, or
one slot of short term memory, but sometimes that's what makes the difference
between catching something and not catching it.

-s
 
S

Seebs

... It's either me, or now two of you not noticing (or ignoring) that
"10 == i" satisfies the second but not the first.

I often introduce fencepost errors when I swap relationships. No idea
why. It's the same as my mysterious inability to learn to reliably add
single-digit numbers. (I can do it about 95% of the time, give or take.)
Because programmatically, "i" changes over time, 10 does not, and when
one reads out loud the controlling expression in English, the subject of
that sentence ("the actor") should be the entity that is acting.

That's pretty much it.

-s
 
S

Seebs

Yes, I thought (and think) that that was not unpossible. If my memory
serves, both Keith and Seebs have pointed out that they need to reorder
reverse-ordered relational operators in their heads (sorry if I'm making
this up now) or that it causes extra mental load or something to that
effect. So the typo may be related to the reversing process.

It probably is. Actually, that's clarified it. Here's the thing.

(x < 10)
!(x >= 10)

That kind of reversal requires adding/removing an =. So if I'm flipping
a comparison relationship, I'm likely to add/remove an =. It turns out
that merely changing the direction isn't the same thing as flipping.

(x < 10)
(10 > x)
!(x >= 10)
!(10 <= x)

.... Yup. I got the >< in the second pair wrong on the first try, had to
go back and think about them.

I can do one flip or the other, usually, but if two of them are on offer,
I tend to screw them up.

-s
 
S

Seebs

This is C we're discussing, not English. It is folly to pretend that the
rules of English apply to C.

Actually, I'm not exactly talking about English. I'm talking about the
underlying cognitive structures English (and every other language) maps to.
So far as I know, regardless of language, humans distinguish between the
topic and the comment (thanks to another poster for providing the
terminology).
Ignoring. Probably a minor thinko on his part, no big deal.

Not a big deal, but oddly, somewhat related -- it's the kind of mistake that
shows up as a side-effect of added complexity. If you add enough parentheses
to an expression, people will start mismatching them or putting them in the
wrong places because they can't track them automatically anymore, or because
the automatic tracking fails.

For me, swapping the "natural" order of a comparison (I expect the "topic"
to be first) is one extra layer, similar to an indirection, extra set of
parentheses, or whatever. I don't know how common that is, but I'm pretty
sure it's not going to change in the forseeable future.

-s
 
K

Keith Thompson

ImpalerCore said:
Are you a staunch member of the '7 == x' because of style, or because
of the possibility of catching '7 = x' errors? I personally don't
like the style simply because it *is* more mind taxing, and it's a
result of not seeing code in that style, not writing code in that
style, and not thinking about code in that style, for about 8 years
between C++ and C, that my brain has been conditioned that way. I
would be bold enough to say that a large majority of C programmers are
conditioned that way, simply from other people's code and books that
I've seen.
[...]

I have a question for those who like and/or use the '7 == x;' style.

The usual rationale for the '7 == x' style is that it makes it
easier to catch errors where you type "=" rather than "==".

Would you even consider writing '7 == x' rather than 'x == 7'
if C's equality and comparison operators were more distinct?

For example, consider a hypothetical C-like language in which the
equality operator is spelled "=" and the assignment operator is
"<-", and "==" is a syntax error. (Yes, that would quietly break
"x<-1"; let's ignore that.)

In such a language, would you ever write "if (7 = x)" in preference
to "if (x = 7)"? If so, why?
 
I

Ian Collins

Keith said:
I have a question for those who like and/or use the '7 == x;' style.

The usual rationale for the '7 == x' style is that it makes it
easier to catch errors where you type "=" rather than "==".

Good development practices (unit testing) will catch that typo.
Would you even consider writing '7 == x' rather than 'x == 7'
if C's equality and comparison operators were more distinct?

For example, consider a hypothetical C-like language in which the
equality operator is spelled "=" and the assignment operator is
"<-", and "==" is a syntax error. (Yes, that would quietly break
"x<-1"; let's ignore that.)

In such a language, would you ever write "if (7 = x)" in preference
to "if (x = 7)"? If so, why?

While not a programming language, most unit test harnesses (at least all
those I've used) express test assertions as ASSERT( expected, actual )
so after using them for a while, placing the invariant on the left
doesn't look so bad. Despite many years exposure to this testing style,
I still place the invariant on the right in conditionals....
 
A

Andrew Poelstra

Yes, I thought (and think) that that was not unpossible. If my memory
serves, both Keith and Seebs have pointed out that they need to reorder
reverse-ordered relational operators in their heads (sorry if I'm making
this up now) or that it causes extra mental load or something to that
effect. So the typo may be related to the reversing process.

(I hope I haven't munged up the attributions; they were formatted
funnily and I was having a tough time reading them, so I shuffled
some >'s around.)

As Seebs points out in his reply to this article,
(i > 10) == !(i <= 10),
while
(i > 10) == (10 < i),
which is quite a different beast.

Also, the idiomatic "count up to 10" loop is:
for(i = 0; i < 10; ++i)

while "count down to 10" is:
for(i = 10; i >= 10; ++i)

For these reasons I didn't see the typo, even after lacos said that
there /was/ one, a fact that initially escaped me completely. Like
Keith and Seebs, I need to mentally reorder these kind of expressions
to understand them.

I do that simply by reading the words backwards, since reading and
writing backwards is much easier for me than re-ordering and then
"reading" different text from what I see. So I find it interesting
that I made the same mistake.
 
R

Rick Jones

In comp.unix.programmer Ike Naar said:
It doesn't matter, but anybody can fool themselves that it does.
And then Alice convinces herself that 3+x is ugly and unreadable,
Bob opts for x+3 being error-prone and unreadable, and now what
should Carol write?

The ISO, or perhaps her elected representative - there aught to be a
law right?-)

rick jones
 
K

Keith Thompson

Andrew Poelstra said:
Also, the idiomatic "count up to 10" loop is:
for(i = 0; i < 10; ++i)

while "count down to 10" is:
for(i = 10; i >= 10; ++i)

I don't think that's what you meant.

[...]
 
K

Keith Thompson

Andrew Poelstra said:
:)

for(i = 10; i >= 0; ++i)

vi makes copy/paste too easy to do without looking at the
screen.

So that's counting down *from* 10, not *to* 10 (speaking of
copy/paste).

Also, the "up to 10" loop executes 10 times, for values from 0 to 9
inclusive, while the "down from 10" loop executes 11 times, for values
from 10 down to 0 inclusive.

Finally, it's an infinite loop if i is of an unsigned type.
 
A

Andrew Poelstra

Andrew Poelstra said:
[...]
Also, the idiomatic "count up to 10" loop is:
for(i = 0; i < 10; ++i)

while "count down to 10" is:
for(i = 10; i >= 10; ++i)

I don't think that's what you meant.

:)

for(i = 10; i >= 0; ++i)

vi makes copy/paste too easy to do without looking at the
screen.

So that's counting down *from* 10, not *to* 10 (speaking of
copy/paste).

Also, the "up to 10" loop executes 10 times, for values from 0 to 9
inclusive, while the "down from 10" loop executes 11 times, for values
from 10 down to 0 inclusive.

Finally, it's an infinite loop if i is of an unsigned type.

Plus, I used in ++i instead of --i. My goodness.

In real life I always do:

i = 10; /* or max, or probably max - 1 */
while(--i)
...
 
E

Ersek, Laszlo

ImpalerCore said:
Are you a staunch member of the '7 == x' because of style, or because
of the possibility of catching '7 = x' errors?

It became my style after using it consciously for a while. The original
reason was the one you mention as second.

I personally don't like the style simply because it *is* more mind taxing,
and it's a result of not seeing code in that style, not writing code in
that style, and not thinking about code in that style, for about 8 years
between C++ and C, that my brain has been conditioned that way. I would
be bold enough to say that a large majority of C programmers are
conditioned that way, simply from other people's code and books that I've
seen.

I think I agree. I also think I don't care about how other people, with whom
I don't share a team, write their code :) I will not change my style just to
follow an unrelated majority.

And the thing is that you may actually like that keyboard better, but
everyone else will still use the standard keyboard, and still complain
if they have to use your keyboard.

They have every reason to complain *now*, if they try to use my keyboard
:) Under Linux, I have my own keymap, under Windows, I have my own
keyboard layout dll, created with Microsoft's Keyboard Layout Creator
(available at no charge on their site). My layout dates back to DOS,
where I modified / programmed a TSR in Turbo Pascal so I could input
accented characters *and* source code. I touch-type (with completely
unofficial finger patterns), so when people try to look at the keyboard
for help, they are even more baffled, because I never care what "skin"
the keyboard has, as my layout is independent from that skin.

The inertia of the original style 'x == 7' is too large enough to overcome
unless that style has generated so many problems that I'm motivated to try
out '7 == x'. If you suffered from carpal tunnel syndrome, you may be
tempted to give the new keyboard a try.

Usually, I adapt quickly to new styles if I'm convinced about their benefits
(or when I realize that I have no choice). I didn't start out with "7 ==
x", I trained myself to write that way. It was inconvenient first, but then
it became natural. I picked up brace style, initialization style etc etc
the same way, gradually. I chose to adopt them.

My touch-typing patterns are completely unofficial and QWERTY dependent
(or more precisely, dependent on my own QWERTY-derivative),*and I
luckily never suffered from RSI, so I have no incentive to try out other
keyboard styles.

I actually went through this process with single line if statements.

if ( condition )
statement;

I have been bitten by this syntax (spending hours pouring over code
where my brain is not registering the syntax error) so many times that
the bracket style

if ( condition ) {
statement;
}

has become my convention. It has saved me time because for whatever
reason, my brain can deal with the syntax errors better.

Same here (except no spaces around "condition" between the parentheses),
but for other reasons: I like to be able to jump to the end of the block
in my editor, and also to put a comment on the closing brace
occasionally.

Again, if I have to work with a team, and it's decided to use 'char
*p', I will gladly acquiesce, but unless I'm forced to code that way
for a long time, I don't foresee myself changing my ways.

Same here. If we formalize an in-house standard (= coding style) that
prescribes "x == 7", I'll adapt. Consistency is more important than my
idiosyncracies.

Cheers,
lacos
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top