compare a large number of variables

Michael Wojcik · Aug 20, 2005

No that doesn't work. In the case of one variable with a==1, value==1
then a^value is false.

As it should be. Try reading for comprehension.

This really calls for a multi-way equality comparison:

if ( ! (a == b == c == d == ... == z == value) )

Obviously the semantics are wrong the way the equality operator is
currently defined, but I've sometimes wanted to use an expression like
this.

Try COBOL, which does support it, and perhaps you'll see why many
COBOL experts avoid it. (Actually, the COBOL syntax would be "if
value not = a or b or c".)

Except that the logical or can't be replaced with a bitwise or.

Who said that it could be?

Its purpose is not immediately clear and it depends on unsigned integers.

The former sufficies for lousiness. The latter may as well,
depending on other constraints which are not specified here.

Both issues can be dealt with by appropriate commenting. More typical
expressions would avoid the need for commenting but I don't think that
makes it "lousy" code - it's perfectly serviceable, just a little atypical
and obscure.

I note your opinion on the matter but continue to disagree. It's
*not* "perfectly serviceable": it imposes unnecessary restrictions
on domain and comprehensibility for no clear gain.

That's quite a sweeping claim (as to likelihood of needing a multi-way
test) and I haven't given much thought to its merits but I'm curious by
what reasoning you make it.

This is a more ambitious claim than, say, your claim about efficiency
in your previous post, which did not account for, say, branch
misprediction penalties, branching overhead, and code expansion due
to the additional tests? (I'm not claiming that performing all of
the operations is necessarily faster, or even proposing any
probability that it would be, but if *I* were going to make a claim
about efficiency I'd certainly want to qualify it with comments about
the mechanisms at play in various processor architectures and the
like.)

My claim is based on experience, and I believe it's plausible. Feel
free to present a counterargument.

Netocrat · Aug 21, 2005

As it should be. Try reading for comprehension.

Fair enough, I miscalled it - there was nothing wrong with your code.

Try COBOL, which does support it, and perhaps you'll see why many
COBOL experts avoid it. (Actually, the COBOL syntax would be "if
value not = a or b or c".)

I don't know COBOL. Why do COBOL experts avoid it?

[...]

This is a more ambitious claim than, say, your claim about efficiency in
your previous post, which did not account for, say, branch misprediction
penalties, branching overhead, and code expansion due to the additional
tests? (I'm not claiming that performing all of the operations is
necessarily faster, or even proposing any probability that it would be,
but if *I* were going to make a claim about efficiency I'd certainly
want to qualify it with comments about the mechanisms at play in various
processor architectures and the like.)

I did consider those things (although I can't prove it to you), however I
didn't feel it necessary to mention each one specifically as you would
have done. I did qualify my claim with a generic statement intended to
cover issues like that: "You never know though - on some hardware it might
be; or the compiler might rewrite it for you."

My claim is based on experience, and I believe it's plausible. Feel
free to present a counterargument.

I don't want to argue about it. My question was based on curiosity, not
disagreement. If your answer is "experience", that ends the discussion on
my side.

Michael Wojcik · Aug 22, 2005

I don't know COBOL. Why do COBOL experts avoid it?

There are various reasons, most of which are grounded in either
issues with language syntax or potential for producing unclear
code. Here are some I've read:

1. Parentheses are forbidden in abbreviated-condition expressions,
because they can introduce ambiguities. (So the standard says. I
don't have an example handy, but it's probably not hard to come up
with one.)

2. Many implementations allow parentheses in abbreviated conditions
despite the standard, but that can lead to code that silently behaves
differently on different implementations.

3. COBOL's use of operators that are composed of multiple words can
lead to confusion between some operators and abbreviated conditions.
For example:

if a equals b or a greater or equal 5

looks like it might contain one or two abbreviated conditions, but
in fact it contains none. Thus some people avoid both abbreviated
conditions and operators like "greater or equal" in favor of their
symbolic equivalents (for implementations that support the latter).

4. Similarly, because the COBOL-85 standard was ambiguous about the
behavior of "greater than or equal to" and the like in abbreviated
conditions, some didn't handle them correctly at all. There's a
comment in the 2002 standard regarding this problem.

5. They can easily be abused. Here's an example from a post to
comp.lang.cobol:

if 1 greater or equal 0 or -1 and 2 equal 0

This evaluates to true, in the implementation the poster used,
because it's parsed as:

if (1 >= 0) or (1 >= -1 and 2 = 0)

Now something like the former expression is clearly confusing for
the maintainer.

6. At least one member of the standards committee, Chuck Stevens, has
written "I believe the simple and straightforward rules of COBOL
syntax in the area of abbreviated combined conditions have actually
been broken for over forty years, and it's way too late to change
them now". I don't recall Chuck's reasons for that statement, off
the top of my head, but I believe he made a good case. One of his
objections, I think, is that abbreviating the condition in English
seems to imply (for most readers) a degree of association which COBOL
does not enforce; that's why the example in #5 is confusing.

7. COBOL already provides enough ways to obscure the evaluation of a
conditional statement, such as testing if a variable contains any of
a number of values, the set of which is represented by an identifier
declared in the data division. (And those have special behavior in
abbreviated conditions, too.) And when you get to the EVALUATE
statement and its WHEN clauses, well, discarding some of the syntactic
sugar starts to look like a good idea.

8. Part of the design of COBOL is its verbosity: data formats and
program logic are spelled out, in part so they can be compared to
specifications written in a natural language. Abbreviated conditions
often work counter to that.

Anyway, you get the idea. It's been discussed various times in
comp.lang.cobol.

[re whether a multiway test is likely not the best design]
My claim is based on experience, and I believe it's plausible. Feel
free to present a counterargument.

Click to expand...

I don't want to argue about it. My question was based on curiosity, not
disagreement. If your answer is "experience", that ends the discussion on
my side.

I was probably a bit too short. I tried to consider various cases
where this logic might be necessary, and in all of the ones that
occurred to me it seemed there was an easier solution. For example,
when assigning a value to c,d,e..., update a counter of the number of
variables in that list which have a value unequal to the value of a.
(Then all the values are equal iff the value of the counter is 0.)
Obviously there are concurrency issues with that, if the program is
not single-threaded, and it's possible that overhead (including
possibly poor locality of reference) makes it a suboptimal solution
for the OP; but we don't have sufficient information to do more than
guess.

In short, it seemed to me that only under a fairly unlikely set of
constraints would all of the following be true:

1. This test would have to be performed frequently.
2. This test would be performed in a time-critical portion of the
program.
3. The outcome of the test would often vary (ie, that it wouldn't
usually be either true or false). (If this is false, then the
precomputation can generally be structured to be more efficient for
the likelier case.)

And if they're not all true, then it should be possible to compute
the outcome of the test outside the time-critical portion of the
program.

--
Michael Wojcik (e-mail address removed)

O sometimes, nevertheless,
The labourer at his instrument or tractor,
Bending into a state of merge with objects,
Finds the same love that, from a machine of sex,
Steps down as Venus to her invoker. -- George Barker

Netocrat · Aug 22, 2005

Netocrat said:
Netocrat said:

I don't know COBOL. Why do COBOL experts avoid it?

Click to expand...

There are various reasons, most of which are grounded in either issues
with language syntax or potential for producing unclear code. Here are
some I've read:
[...]
Anyway, you get the idea. It's been discussed various times in
comp.lang.cobol.

I do get the idea - that was some list. The main theme seems to be
confusion.

Putting aside the utility of such an operator, which I guess you would
argue against, do you think that this confusion might be avoided by
limiting a multi-way C comparison to those cases where parentheses were
absent? In other words, I'm proposing that (a == b == c) would be a
multi-way comparison whereas ((a == b) == c) would be a comparison of (a
== b) against c.

The only potential issue I can see is when a, b and c are complex
expressions rather than variables.

The alternative would be a new operator for multi-way comparisons, such as
===, which would be used similarly to the (a?b:c) construct except that
the ? and : would both be replaced by === and it could be used for more
than three operands.

Obviously the former suggestion is never going to happen because it would
break too much existing code, but the second seems workable to me - and it
would allow the compiler to generate the most efficient code - deciding
between assembly versions of the bitwise constructs suggested in this
thread vs successive "test and branch"es, taking into account branch
misprediction etc.

[re whether a multiway test is likely not the best design]

Click to expand...

Click to expand...

[...]
I was probably a bit too short. I tried to consider various cases where
this logic might be necessary, and in all of the ones that occurred to
me it seemed there was an easier solution. For example,

[snip rest of reasoning]

That was very cogent reasoning of the style I was hoping to elicit.

Michael Wojcik · Aug 23, 2005

I do get the idea - that was some list. The main theme seems to be
confusion.

Putting aside the utility of such an operator, which I guess you would
argue against, do you think that this confusion might be avoided by
limiting a multi-way C comparison to those cases where parentheses were
absent? In other words, I'm proposing that (a == b == c) would be a
multi-way comparison whereas ((a == b) == c) would be a comparison of (a
== b) against c.

Ouch. I wouldn't want parentheses to change the behavior of the
operator.

The alternative would be a new operator for multi-way comparisons, such as
===, which would be used similarly to the (a?b:c) construct except that
the ? and : would both be replaced by === and it could be used for more
than three operands.

How would this work? The implementation can't process it as a series
of independent comparisons, as in:

op1 === op2 evaluates to the value of op2 if op1 and op2 have
the same value, or 0 otherwise

so that you have the value to compare with the next expression when
you have a chain of === expressions (which is the only time you'd
want to use them). The problem is that you can't distinguish between
the "false" value and the case where both operands have that value.
If "false" is 0 - and for consistency and correct operation of
conditional expressions it has to be - then the following doesn't
work:

int a = 0, b = 0;

if (a === b) puts("it works");

For === to work, the grammar has to expand the phrase to include all
the "===" operations in the chain, so all the operands can be
evaluated, then their values compared, then the result of all the
chained === comparisons computed. (That result will have to be 1 or
0, both for consistency with other relational operators and to
sensibly handle the case where all operands have the value 0.)

Nothing else in C works that way. It doesn't fit with the rest of
the grammar. Nor is it clear what happens with parenthesization or
complex operands that include sequence points (eg function calls),
or how it interacts with operators like comma and the short-
circuiting logicals.

If C had a notion of "false" that was distinct from any possible
operand value, then the definition of === as an operator that
returned either "false" or the value of its operands could work
nicely. But C doesn't, and it's hard to see how any language
would have such a construct - because programmers would want to
be able to store that "false" result in some kind of variable!

So while it's easy to grasp intuitively how === should work, at
least for the obvious cases, I think it's a poor fit for the formal
definition of the language. That's the problem COBOL ran into with
abbreviated conditionals (and some other features): the effort to
formalize the definition of the language came very late, and many
features proved impossible to formalize consistently and accurately
while maintaining the behavior that users intuitively expected. In
some cases that only became apparent after a feature was incorporated
into the specification, requiring revisions and clarifications in
later versions of the spec.

In APL and its relatives, you can apply the equality operator across
a vector, but that still doesn't do what you want (it does consecu-
tive evaluations, so you end up with the same result you have in C).
You can take an "outer product" of equality, though, and then if any
elements of that are 0 then at least two elements are not equal; and
it's easy to collapse that into a single boolean value.

The "outer equality product" produces a matrix of comparisons. In
the case of {1 2 2 2}, for example, it'd look like:

1 0 0 0
0 1 1 1
0 1 1 1
0 1 1 1

Then applying logical-and to the rows of that array yields a vector:

0 0 0 0

and applying it to that vector yields 1 iff all elements of the
original were equal.

Of course, we really only need one row of the outer product, since
each row compares one element in the original vector with itself and
each of the others; so I could just drop the other rows and apply
logical-and to it. That would be faster and more elegant, but I just
thought of it as I was writing this. (Another way would be to take
the minimum and maximum value in the original vector and compare
them. Those are basic APL operations, too.)

So, if you just add some APL to C, you'd be able to do what you want.

The Wikipedia article on APL conveys the flavor of the language well.

Netocrat · Aug 23, 2005

Ouch. I wouldn't want parentheses to change the behavior of the
operator.

Fair enough - it was a somewhat inelegant hack. Existing use precludes
this change anyway.

How would this work?

My wording was apparently unclear. By "similarly to the (a?b:c)
construct" I meant to indicate how parentheses would affect the
expression, not how its value would be determined.

[...]

For === to work, the grammar has to expand the phrase to include all the
"===" operations in the chain, so all the operands can be evaluated,
then their values compared, then the result of all the chained ===
comparisons computed. (That result will have to be 1 or 0, both for
consistency with other relational operators and to sensibly handle the
case where all operands have the value 0.)

Yes, that's exactly what I intended.

Nothing else in C works that way.

On its own - which admittedly I don't think is how you intended the
statement to be read - that's not reason enough to reject the proposal.
I'd consider the question "does it contradict the spirit of C?" more
relevant in isolation. Arguably the ability to fairly easily construct a
similar statement using existing operators is cause to answer yes.

On the other hand I think that the advantage of a new operator is that it
would make it clearer to a compiler and reader what is intended and allow
for more concise code, which *is* in keeping with C's spirit.

It doesn't fit with the rest of the grammar. Nor is it clear what
happens with parenthesization

I see something of a precedent in the ?: construct - at least for
parentheses.

or complex operands that include sequence
points (eg function calls),

Is "identically as for the single-way equality operator" a sufficient
definition? I haven't given it much thought.

or how it interacts with operators like comma

Probably the comma operator would have higher precedence than the ===
operator. So, for example, the expression:
a === b, c === d, e === f

would mean:
a === (b, c) === (d, e) === f

Whereas:
a === b, c == d, e === f

would mean:
a === (b, c == d, e) === f

There may be other ways to define the precedence/rules but again I haven't
thought deeply about it.

and the short- circuiting logicals.

Probably these operators would have higher precedence. So:
a === b && c || d === e && f || g === h
would mean:
a === (b && c || d) === (e && f || g) === h

If C had a notion of "false" that was distinct from any possible operand
value, then the definition of === as an operator that returned either
"false" or the value of its operands could work nicely.

Right, as I wrote above my wording was unclear and that's not what I
intended.

[...]

So while it's easy to grasp intuitively how === should work, at least
for the obvious cases, I think it's a poor fit for the formal definition
of the language.

Do my above attempts to clarify and better define the semantics affect
your opinion? Granted they're not very formal and may be missing some
cases, but I can't see any glaring deficiencies.

[...]

In APL and its relatives, you can apply the equality operator across a
vector, [...]
The "outer equality product" produces a matrix of comparisons. [...]
Then applying logical-and to the rows of that array yields a vector [...]
and applying it to that vector yields 1 iff all elements of the
original were equal.

That part of the language sounds a little similar to MatLab where
variables are by default arrays, although having last used it several
years ago I can't recall how the equality operator worked.

[...]

So, if you just add some APL to C, you'd be able to do what you want.

Judging from the Wikipedia article, there's only one way that any APL
could be added to C...

Who's got the gaffer tape?

Richard Bos · Aug 24, 2005

[ Guys! Learn to snip... ]

Probably the comma operator would have higher precedence than the ===
operator. So, for example, the expression:
a === b, c === d, e === f

(BTW: ew, Icon...)

would mean:
a === (b, c) === (d, e) === f

Right now, the comma operator has the lowest precedence of any operator.
I believe that this is for a reason, and should remain that way.

Richard

Netocrat · Aug 24, 2005

[ Guys! Learn to snip... ]

I thought we kept a reasonable balance between minimal yet sufficient
content for each post to stand alone. I'll take your comment as need to
reconsider. I rather expected an OT call before a more snippage call.

(BTW: ew, Icon...)

Indeed - I did try to come up with something more useful but the
choice of symbols is limited and the only other decent choice I could
see at short notice (~=) is afaik already used by C++.

Right now, the comma operator has the lowest precedence of any operator.
I believe that this is for a reason, and should remain that way.

Yes you're right, perhaps precedence isn't the right word. I was
modelling the operator after the behaviour of ?: which in this respect
behaves the same way. i.e.
a ? b, c : d, e

means:
a ? (b, c) : (d, e)

(or at least that's how gcc treats it, I haven't consulted the standard to
confirm)

The alternative is that the comma operator instead delimits a multi-way
equality comparison. So now:
a === b, c === d, e === f

would mean:
(a === b), (c === d), (e === f)

Both are different to the semantics of the single equality operator, but
given that it takes more operands I don't see a way around that.

Michael Wojcik · Aug 25, 2005

On its own - which admittedly I don't think is how you intended the
statement to be read - that's not reason enough to reject the proposal.

True. Let me rephrase: the C standard defines the language using a
formal grammar, which decomposes into simple expressions. In my
opinion, the semantics of the proposed operator prevents it from
fitting comfortably into this grammar, because it can't be reduced
to a series of applications of itself.

That is, you can't have a rule like:

multiway-equality-expression:
equality-expression
multiway-equality-expression === equality-expression

(see the grammar in Annex B of 9899-1990) because the chain of
multiway-equality-expressions has to be evaluated as a whole.

That's really my objection - I don't like what it does to the
grammar.

On the other hand I think that the advantage of a new operator is that it
would make it clearer to a compiler and reader what is intended and allow
for more concise code, which *is* in keeping with C's spirit.

I'll agree concision is a design goal for C (albeit in moderation -
there are more concise languages), though it's a concision of the
language itself, and not necessarily of programs written in it. C
generally avoids providing syntactic sugar, preferring to make most
things explicit; thus we have malloc rather than something like
new, and case fall-through rather than syntax for multiple cases,
and so forth.

I could be persuaded that multiway comparison is sufficiently
explicit to be C-like, though, if other problems were solved.

Probably the comma operator would have higher precedence than the ===
operator. So, for example, the expression:
a === b, c === d, e === f

would mean:
a === (b, c) === (d, e) === f

I suspect that won't work well. There's a sequence point after
evaluating the LHS of the comma operator. I suspect that will
lead to some ugly confusion.

Probably these operators would have higher precedence.

They'd have to, if the comma operator has higher precedence.
Currently comma has the lowest precedence (in effect - see various
discussions about what "precedence" really means in C), so if
multiway comparison is lower than comma, it's lower than everything
else, too.

But again that means you've got sequence points in the middle of
evaluating your operator. That's not unusual in itself; for
example there are two sequence points in:

strlen(x) == strlen(y)

before the equality can be evaluated. However, my feeling is that
the "chaining" nature of === makes handling sequence points more
complicated.

Maybe what you really want is an operator that's like the comma
operator (including having a sequence point after the LHS is
evaluated), except that rather than discarding the result of the
LHS, it compares it with the RHS. I'd give this precedence right
above the comma operator. However, this still has the chaining
problem - the implementation has to treat the result of the
comparison differently depending on whether it is itself an
operand to the new operator.

Really, I think this is best handled as a variadic function:

#include <stdarg.h>
int AllEqual(int *Op, ...)
{
int Result;
int Value;
va_list Args;

if (!Op) return 1; /* all members of empty set are equal */

va_start(Args, Op);
Value = *Op;
for (Result = 1; Op && Result; Op && (Result = (*Op == Value)))
Op = va_arg(Args, int *);
va_end(Args);

return Result;
}

Then you have:

if (! AllEqual(&v, &a, &b, &c, (int *)0))
/* one or more of a, b, c is not equal to v */

It doesn't give you the automatic type conversions that you have with
==, but on the other hand it does enforce type safety, which may be
more valuable in a case like this - since if you're comparing a whole
series of variables to a single value, they're likely to all be of
the same type anyway.

Judging from the Wikipedia article, there's only one way that any APL
could be added to C...

Well, there are ASCII-only APL derivatives, notably J. J uses many
of the punctuation characters that C uses, but it would be possible
to bolt J onto C by delimiting the J expressions. Not that I'd
recommend it.

Netocrat · Aug 25, 2005

That is, you can't have a rule like:

multiway-equality-expression:
equality-expression
multiway-equality-expression === equality-expression

(see the grammar in Annex B of 9899-1990) because the chain of
multiway-equality-expressions has to be evaluated as a whole.

I only have access to the drafts, and Annex B does not occur in the
ANSI C89 draft and it's a library summary in the C99 draft. Did you mean
the Annex/Appendix titled "Language Syntax Summary"?

At any rate I'm not sure what your objection is. It seems to me that the
grammar is not intended to specify such things as order of evaluation -
surely that's up to the semantic and constraint rules.

That's really my objection - I don't like what it does to the grammar.

Perhaps you could explain what I'm missing - how is it any different to
any other compound expression that the grammar describes but whose
behaviour is more completely specified by other descriptive rules?

[...]

I could be persuaded that multiway comparison is sufficiently explicit
to be C-like, though, if other problems were solved.

Keep pointing them out and let's see if any turn out to be intractable.

I suspect that won't work well. There's a sequence point after
evaluating the LHS of the comma operator. I suspect that will lead to
some ugly confusion.

There's a precedent in the ?: construct as I've written elsewhere, in
that:
a ? b, c : d, e
means:
a ? (b, c) : (d, e)

A lot of things in C seem confusing until you understand how they are
defined; I believe that could apply to this situation. Point out any
specific undefinedness that you can see and I'll try to come up with
appropriate semantics.

[...]

However, my feeling is that
the "chaining" nature of === makes handling sequence points more
complicated.

Perhaps that the order of evaluation could be specified as left-to-right,
shortcircuiting if a mismatch is found, which seems to be similar to what
you suggest below.

Maybe what you really want is an operator that's like the comma operator
(including having a sequence point after the LHS is evaluated), except
that rather than discarding the result of the LHS, it compares it with
the RHS. I'd give this precedence right above the comma operator.
However, this still has the chaining problem - the implementation has to
treat the result of the comparison differently depending on whether it
is itself an operand to the new operator.

You're presumably again referring to whether this could be expressed by
the Standard's grammatical rules. Again, I can't see where it's specified
in the standard that this is inconsistent with the current grammar - why
can an intermediary result not be part of the process of creating the
final result of the expression?

Really, I think this is best handled as a variadic function:

#include <stdarg.h>
int AllEqual(int *Op, ...)
{
int Result;
int Value;
va_list Args;

if (!Op) return 1; /* all members of empty set are equal */

va_start(Args, Op);
Value = *Op;
for (Result = 1; Op && Result; Op && (Result = (*Op == Value)))
Op = va_arg(Args, int *);
va_end(Args);

return Result;
}

Then you have:

if (! AllEqual(&v, &a, &b, &c, (int *)0))
/* one or more of a, b, c is not equal to v */

It doesn't give you the automatic type conversions that you have with
==, but on the other hand it does enforce type safety, which may be more
valuable in a case like this - since if you're comparing a whole series
of variables to a single value, they're likely to all be of the same
type anyway.

That depends on what you mean by type safety. Your function expects that
all of the arguments are pointers to int but there will be no compiler
warning if only one of them is instead e.g. a pointer to short (this
falsely returns unequal on little-endian implementations where short is
smaller than int) or pointer to long (this potentially does likewise on
big-endian implementations where long is larger than int). Agreed, they
will typically be the same type but the function's expectation of this
makes it unsafe. Perhaps it would be better named AllIntEqual.

So that's another approach to add to the bitwise comparisons. I'd still
like to see if there are any insoluble problems with a multi-way equality
operator.

[...]

Tim Rentsch · Aug 26, 2005

Mark said:
Tim Rentsch said:

Mark said:

[snip]
As far as I understand, overflow/underflow invokes behaviour not
covered
by the standard, which may include setting all bits to zero.

I thought it was covered... no?

H.2.2 Integer types
[#1] The signed C integer types int, long int, long long
int, and the corresponding unsigned types are compatible
with LIA-1. If an implementation adds support for the LIA-1
exceptional values integer_overflow and undefined, then
those types are LIA-1 conformant types. C's unsigned
integer types are ``modulo'' in the LIA-1 sense in that
overflows or out-of-bounds results silently wrap. An
implementation that defines signed integer types as also
being modulo need not detect integer overflow, in which
case, only integer divide-by-zero need be detected.

Maybe I'm misreading something...

First: Appendix H is informative rather than normative.

Second: The paragraph above only says that signed integer
types *may* behave in a certain way
No, it also states that unsigned integer types DO behave a certain way.

Click to expand...

The subject in question is overflow (or underflow), which
usually is understood to refer to signed integer types rather
than unsigned integer types.

Click to expand...

Then why are we having this discussion?

Apparently, because you haven't been able to express
yourself clearly, because you respond to what you imagine
people are saying rather than what they actually do say, and
because you make statements that either are poorly worded or
just plain wrong; and also because I've been optimistic
enough to think that one or more of those might change
sometime soon.

Then I was correct in my initial statement that in my example
'Overflow is not possible.' - I was concerned after seeing H.2.2 that
my statement was incorrect, so I ammended it to 'not an issue'.
I now retract my ammended statement and stand by my original claim.
Thanks for the citation.

The comment I think you're referring to is one you made in
an earlier post that isn't quoted above; namely:

>
> Overflow is not possible, stare at it for a while and
> see if you can figure out why

As written this statement is wrong. The expression in the
'if' certainly can overflow. It also can result in
undefined behavior even if the subtraction operators don't
result in overflow.

If you meant to say, "this expression can't result in
overflow, because the operands are 'unsigned'", there's
nothing to support that statement. In no posting in the
thread were any variables declared having any unsigned
integer type. On the other hand, several postings did
declare variables of *signed* integer type, including
variables that participated in the comparison (as might be
represented by a,b,c, etc, in the 'if' above).

If you meant to say, "this expression can't result in
overflow, because the '|' operators mean the values must be
unsigned", that's wrong.

If you meant to say, "as long as the variables are of some
unsigned integer type, overflow can't happen", that may be
true, but not really very relevant, since there still is
the possibility of undefined behavior.

If you meant to say, "*if* the operands are all of type
'unsigned int', then no overflow or undefined behavior can
occur", that's true, but then why didn't you just say that?
There is nothing in your posting (the one that has the
excerpt above) that says anything about any unsigned type,
let along 'unsigned int', either in your text or what was
included from other posters. Nor does your earlier posting
(the original one that has the 'if(a-b|...' in it) say
anything about any unsigned type.

But we have been dealing with bit operands the entire time,
so considering the context you should assume we are
dealing with unsigned integer types, no? I have been... which again
is why I earlier stated that 'overflow is not possible'.

You may imagine that the operands were "bit operands", but
there is nothing in what you wrote to indicate that; and,
to the contrary in earlier postings when variables were
declared as 'int' rather than 'unsigned int'. Maybe you
think "bit operators" (such as '|') imply "bit operands"
(meaning unsigned), but of course they don't.

[snip]>

In my document [ISO/IEC 9899:1999 (E)], 3.18 is a definition of
the ceiling function.

Click to expand...

Impossible, there is no such function in the C language. [snip]

It's not only possible, if you get a copy of the C standard
document mentioned above, you'll see that it's true.

Nor do I know what you are reading if it truly defines
a 'ceiling' function.

You *should* know what I'm reading; the reference number
for the document is given above. Presumably what you meant
to say is that you don't know what text is contained in what
I'm reading, and I expect that's true. But you could find
out by getting a copy of the document and reading it yourself.

Incidentally, note that I said 3.18 is a definition of the
ceiling function, not the 'ceiling' function.

Michael Wojcik · Aug 26, 2005

I only have access to the drafts, and Annex B does not occur in the
ANSI C89 draft and it's a library summary in the C99 draft.

Well, yes, it wouldn't be the same thing in C99, since that's a
different standard (9899-1999). I have the real 9899-1990, not
the final draft, so I'm not sure what's in the latter.

Did you mean the Annex/Appendix titled "Language Syntax Summary"?

That's the one.

At any rate I'm not sure what your objection is. It seems to me that the
grammar is not intended to specify such things as order of evaluation -
surely that's up to the semantic and constraint rules.

The grammar determines precedence - see any number of discussions
on that topic that have appeared here. And while Annex B itself is
informative, it merely duplicates normative material from the Syntax
sections in the main body of the standard. If you add a new
operator, the grammar has to be extended to accomodate it.

Perhaps you could explain what I'm missing - how is it any different to
any other compound expression that the grammar describes but whose
behaviour is more completely specified by other descriptive rules?

I believe that if you try to add it to the grammar you'll see what
the problem is. I can't find any simple change to the grammar that
accomodates it and expresses the precedence that you want.

If you believe otherwise, show me what it'd be.

A lot of things in C seem confusing until you understand how they are
defined; I believe that could apply to this situation. Point out any
specific undefinedness that you can see and I'll try to come up with
appropriate semantics.

You're conflating confusion and undefinedness here; they're two
different sources of difficulty. And "seems confusing" is redundant;
confusion is subjective - it's always and only a matter of "seeming".
If a significant number of readers find a construct confusing, then
it is confusing.

I'm not particularly interested in identifying specific situations
where the mooted operator is problematic. I don't like it; I believe
its "chaining" behavior violates the underlying phrase grammar of C.
Unless you get the committee to consider including it, though, I
needn't worry about objecting in any more specific terms, since until
then it's entirely speculative.

But since you asked:

int a, b, c, d, e;
a = b = c = d = e = 2;
if (a === b === (c === d === e))
puts("a parenthesized === evaluates to the value of its operands");
else
puts("a parenthesized === evaluates to its logical value");

Which is the correct behavior, and why? What happens if you have a
multiway comparison in a macro and you substitute it into a
multiway-comparison expression, or if you use a multiway-comparison
expression as the argument to a macro? What will programmers expect?

#define BOTH_TRUE(x, y) (x)===(y)===1
if (BOTH_TRUE(a==b, c===d===e)) ...

Perhaps that the order of evaluation could be specified as left-to-right,
shortcircuiting if a mismatch is found, which seems to be similar to what
you suggest below.

No, I didn't suggest short-circuiting. I don't believe short-
circuiting solves anything here.

You're presumably again referring to whether this could be expressed by
the Standard's grammatical rules. Again, I can't see where it's specified
in the standard that this is inconsistent with the current grammar - why
can an intermediary result not be part of the process of creating the
final result of the expression?

That's not the problem; the problem is that the "final result" depends
on whether the expression is itself the operand of your new operator.

Think of it from the parser's point of view. An equality-expression
(that isn't a relational-expression), for example, always evaluates
to either a 1 or a 0. The parser always generates the same code (or
intermediate representation) for the result of the equality-
expression. For your operator, though, the parser needs to look back
up the parse tree to see whether the result of a === comparison will
be a logical value (ie 0 or 1), or the value of the two operands (if
they're equal and the expression is itself the operand of another
===), or some other magic token (if they're inequal and the expression
is the operand of another ===).

Now, if you put a sequence point after evaluating the LHS of a ===,
and make it short-circuiting, that does simplify parsing: you can
just decompose into a series of nested if's and a temporary for
holding the value of the first operand. However, I think at that
point you've imposed so many restrictions on your operator that it's
will only rarely be useful. (Frankly, I don't see that it'd be all
that useful in the first place. I'd have to see some empirical
analysis of how often it'd likely be useful before I could advocate
it.)

That depends on what you mean by type safety.

Nothing. Ignore that bit. I don't know what I was thinking of
there; of course there's no type safety with a variadic function.

Perhaps it would be better named AllIntEqual.

Actually, in any real use, I expect it would have some much more
sensible name referring to its actual purpose. I still don't believe
it has much general utility.

So that's another approach to add to the bitwise comparisons. I'd still
like to see if there are any insoluble problems with a multi-way equality
operator.

There's a great gulf between "no insoluble problems" and "worth
having", of course.

In Scheme, you could easily create your operator using extend-syntax
(which is a pattern-matching recursive macro facility), by converting
it into a series of cascading if's. Maybe you should try it there
just to see whether it's really worth it. Scheme's a fun language.

--
Michael Wojcik (e-mail address removed)

Therefore, it is possible to enjoy further by using under the
Netscape 2.0. However, Netscape will hangup at sometimes. You
should give it up. -- roro

Netocrat · Aug 27, 2005

]

The grammar determines precedence - see any number of discussions on
that topic that have appeared here.

I was under the misapprehension that the grammar notation itself
determined only syntax, not precedence.

[regarding a multi-way equality operator]

I believe that if you try to add it to the grammar you'll see what the
problem is. I can't find any simple change to the grammar that
accomodates it and expresses the precedence that you want.

Neither can I, partly because I haven't examined closely how the notation
works and is complemented by other parts of the standard. If I come
across a solution I'll post it.

[...]

int a, b, c, d, e;
a = b = c = d = e = 2;
if (a === b === (c === d === e))
puts("a parenthesized === evaluates to the value of its
operands");
else
puts("a parenthesized === evaluates to its logical value");

Which is the correct behavior, and why?

The second option could cause programmer errors due to macros as you later
pointed out, although aside from that I think its behaviour is more
consistent with the behaviour of parentheses in other situations than the
first option.

Similar problems already exist with macros in general for other situations
(e.g. calling a macro with a single expression for which side-effects
unintendedly are evaluated twice). The potential for unintended behaviour
seems to me to lie more with the nature of macros than the proposed
operator.

Errors could be avoided by using a guideline that one should never call a
macro directly with a multi-way comparison expression unless one was
guaranteed of specific macro behaviour.

[...]

[T]he problem is that the "final result" depends on whether the
expression is itself the operand of your new operator.

Think of it from the parser's point of view.

[...]

OK, it's harder for the parser. But given that most modern compilers are
written with optimisation features in mind, and that this construct may
facilitate optimisation, the extra complexity in the parser may be
worthwhile.

Now, if you put a sequence point after evaluating the LHS of a ===, and
make it short-circuiting, that does simplify parsing: you can just
decompose into a series of nested if's and a temporary for holding the
value of the first operand. However, I think at that point you've
imposed so many restrictions on your operator that it's will only rarely
be useful.

For complex expressions you are probably right. For simpler expressions,
the sequence points needn't affect the compiler's ability to optimise and
rewrite the code.

(Frankly, I don't see that it'd be all that useful in the first place.
I'd have to see some empirical analysis of how often it'd likely be
useful before I could advocate it.) [...]
There's a great gulf between "no insoluble problems" and "worth having",
of course.

The only problem that I have no answer for is that it is a bad match for
the standard's grammar description.

As to its general utility, I agree that it isn't clear, although I know
that there have been times in the past when I would have used it had it
been available. Offhand I don't know how frequent they have been.

[...]

Dave Thompson · Aug 27, 2005

If the values are in an array A[N] then you can use:

for (unequal = 0, i = 1; i < N; i++) {
if (A |= A[i-1]) {

unequal = 1;
break;
}
}
if (unequal) different();
else same();

and this will function for ints as well as unsigned.

Click to expand...

- David.Thompson1 at worldnet.att.net

Dave Thompson · Aug 27, 2005

Tim Rentsch said:
Tim Rentsch said:

Mark said:

Indeed, what happens on integer overflow is cited as an
example (3.4.3 p3) in the definition of undefined behavior.
3.18.3 - no?

Click to expand...

In my document [ISO/IEC 9899:1999 (E)], 3.18 is a definition of
the ceiling function.

Click to expand...

Impossible, there is no such function in the C language.
I assume you are referring to the 'ceil' functions?
7.12.9.1 in my text... I am using the last version of the committee draft

Section 3.4 defines different kinds of
behaviors, with 3.4.3 defining "undefined behavior". I don't
know what document you have.

Click to expand...

Nor do I know what you are reading if it truly defines a 'ceiling' function.

This was changed between n869 and actual C99.

n869 has the definition of the term undefined behavior at 3.18. C99
has definitions of impl-def, locale-spec, undefined, and unspecified
behaviors grouped under 3.4, and 3.18 and 3.19 are definitions of the
usual math notation for ceiling and floor -- vertical lines with
inward serifs at top or bottom -- which are used for ceil() and
floor() in n869 7.12.9.1,2 without definition, presumably understood
as standard math, just like + for addition (still) isn't defined; and
used in the same section of C99 for what are now {ceil,floor}{,f,l}();
and also a few other places the same in both documents.

The actual definition of undefined behavior, including the
(nonnormative) note and example, is unchanged.

- David.Thompson1 at worldnet.att.net

websnarf · Aug 27, 2005

Einar said:
Yes, your suggestions work perfectly, but I was mor looking for a nice
bit operator to operate on all my variables and then to compare it with
the value. something like (a|b|c|d...z) != value (this wont work
however... ). This is just to get rid of a for-loop or lots of
if-statements, so it is nothing necesary, I'm just convinced that it is
possible to solve in another way, and I can't forget about it....
Ahhhrg.

#define X(var) if ((var) == value) { /* do something */ }

X(a) X(b) X(c) ... etc.

Chris Torek · Aug 28, 2005

There's a precedent in the ?: construct as I've written elsewhere, in
that:
a ? b, c : d, e
means:
a ? (b, c) : (d, e)

Actually, this is wrong:

a ? b, c : d, e

parses the same as:

(a ? b, c : d), e

so that the value of the entire expression is that of "e".

(No parentheses are required inside the ?: portion for the same
reason that no parentheses are required in subscript brackets: the
syntax offers no way to terminate the sub-expression other than
the "closing bracket".)

As for the original subject (of comparing expr0 against a sequence
of expressions expr1,expr2,...,exprN to see if they are all equal,
or perhaps if any are equal): someone else mentioned APL elsethread.
APL does have a nice way to do this, provided the N expressions
are contained in a vector.

If variable X contains a vector, the expression "rho X" (where rho
is a Greek rho character) produces, as a scalar, the number of
elements in the vector. (Applied to a matrix, rho X is a vector
giving the size of the matrix, and so on.)

Used as a dyadic (two-operand) operator with two scalars, A rho B
will produce A a vector containing A copies of the value B. Hence
if E contains the expression expr0 to be compared, (rho X) rho E
will produce a vector consisting of N copies of E, which can now
be compared to the vector X.

When applied to two vectors with the same number of elements, the
dyadic "=" operator produces as its result a vector of integers
whose values are 0 where the two elements differ, and 1 where the
two elements are the same. Hence:

X = (rho X) rho E

will produce, e.g.:

0 0 1 1 0 1

when X is 3 4 5 5 6 5 and E is 5.

Now all we have to do is reduce this vector of 0-or-1 values into
the final desired value. To do this, we use the "reduce" operator
"/", which inserts some other operator -- such as "+" or "*" --
between all the various elements of a vector. In effect, +/ 0 1
2 means 0 + 1 + 2 (which is of course 3). If we want to know
whether *all* elements of X are equal to E, we simply use the "*"
operator on all the values, so that any zeros multiply out to cause
the entire result to be zero; otherwise we use the "+" operator,
so that any nonzero inputs cause the sum to be nonzero.

Hence:

*/ X = (rho X) rho E

is an APL one-liner that tells you whether all the elements of X
are equal to E.

In C, of course, I would prefer to just write a little function:

int alleq(int e, int *x, size_t nx) {
size_t i;

for (i = 0; i < nx; i++)
if (e != x)
return 0;
return 1; /* note that alleq(any, ptr, 0) is always 1 */
}

This does, of course, require the values to compare to be in a
vector (as does the APL one-liner).

Netocrat · Aug 29, 2005

Actually, this is wrong:

a ? b, c : d, e

parses the same as:

(a ? b, c : d), e

so that the value of the entire expression is that of "e".

I appreciate you pointing that out and it clearly makes sense given the
precedence of the comma operator.

(No parentheses are required inside the ?: portion for the same
reason that no parentheses are required in subscript brackets: the
syntax offers no way to terminate the sub-expression other than
the "closing bracket".)

Granted, but my point in including them was to make the operands of the
first comma explicit because I wanted the same semantics for the multi-way
equality operator.

As for the original subject (of comparing expr0 against a sequence
of expressions expr1,expr2,...,exprN to see if they are all equal,
or perhaps if any are equal): someone else mentioned APL elsethread.
APL does have a nice way to do this, provided the N expressions
are contained in a vector. [...]
*/ X = (rho X) rho E

is an APL one-liner that tells you whether all the elements of X
are equal to E.

That's quite neat and concise. To do that in C we would need to add the
reduce operator to the syntax. rho X in C would be sizeof X / sizeof *X.
It's not so simple in C to extend E into a vector - although C99's
variable arrays help. We can declare:

type Evec[sizeof X / sizeof *X];

In comparison to APL we need to know the type of X. A typeof operator
would be useful there.

Then to set all the vector elements we might use a function or macro
(whose implementation should be obvious):

setvec(Evec, sizeof X / sizeof *X, E);

Finally the missing link is something that you partly canvassed in a
separate thread - operators that work on entire arrays. In this case we
need it to be the equality operator, however that already has different
semantics. If the array types were unsigned we could use an array
subtraction operator without overflow concerns, for a final "one-liner":

*/ (X - Evec)

In C, of course, I would prefer to just write a little function:

int alleq(int e, int *x, size_t nx) {
size_t i;

for (i = 0; i < nx; i++)
if (e != x)
return 0;
return 1; /* note that alleq(any, ptr, 0) is always 1 */
}
}
This does, of course, require the values to compare to be in a vector
(as does the APL one-liner).

If you have an array that's fine, but it isn't very flexible to require
an array. A variadic function would be flexible at the expense of type
safety. The multi-way equality operator I've proposed (===) would be
flexible and type-safe but not as concise as a variadic function and
difficult to fit into the current grammar.

Here's an alternative that is concise, flexible and type safe (but
probably an even worse fit for the grammar...). It's a multi-way
equality operator named after your alleq function (the ellipsis is not
intended to imply that this is a variadic function, rather that it takes
an arbitrary number of operands):

alleq(a, b, ...)

The sizeof operator provides a precedent for the parentheses, but there is
no precedent for the variable-sized operand list separated by commas.

Dietmar Schindler · Aug 29, 2005

Netocrat said:
a ? b, c : d, e

means:
a ? (b, c) : (d, e)

(or at least that's how gcc treats it, I haven't consulted the standard to
confirm)

No, it means

(a ? (b, c) : d), e

(and I strongly believe that gcc treats it so).

Suman · Aug 29, 2005

Dietmar said:
No, it means

(a ? (b, c) : d), e

(and I strongly believe that gcc treats it so).

Yes.

#include <stdio.h>
int main(void)
{
int a = 0;
int b = 3, c = 5, d = 2, e = 6;
printf("%d\n", a ? b, c : d, e);
printf("%d\n", a ? (b, c) : (d, e));
printf("%d\n", (a ? (b, c) : d), e);
return 0;
}
gcc version 4.0.0
gcc -W -Wall -std=c99 cond.c
and a few warnings about `left-hand operand of comma expression has no
effect' later ...
<output>
2
6
2
</output>

Making a non zero hides this fact nicely, though!

How works with large integers ?	0	Aug 16, 2022
Sort by number of characters	1	Nov 2, 2023
Refactoring/Renaming only some variables (of a particular name) in Pycharm (or another IDE)?	5	Dec 31, 2023
A number everyday of the month "and" a different number depending on the day of the month´s day time	2	Mar 16, 2021
Storing a very large number	1	Jul 17, 2013
New to VHDL... Trying to convert a 2-bytes number into an decimal	0	Dec 9, 2022
Question about function failing with large number	0	Aug 13, 2013
Just started coding and im stuck on a lesson?	1	Oct 30, 2022

compare a large number of variables

Michael Wojcik

Netocrat

Michael Wojcik

Netocrat

Michael Wojcik

Netocrat

Richard Bos

Netocrat

Michael Wojcik

Netocrat

Tim Rentsch

Michael Wojcik

Netocrat

Dave Thompson

Dave Thompson

websnarf

Chris Torek

Netocrat

Dietmar Schindler

Suman

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads