The undefinedness of a common expression.

S

sfuerst

Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.

The general opinion was that the above invokes undefined behaviour due
to the fact that there is no sequence point in the expression, and
that the value of a[0] is used in a "value computation" unsequenced
with respect to the side-effect of the assignment operator. This is
slightly surprising, with a naive viewpoint that the above "should" be
the same as t = a[0], a[t] = 1; However, code like the above
expression is rare (I only found a single instance in the Linux kernel
source) - so this obscure case can probably be ignored to maintain the
simplicity of the sequence point rules in the standard.

However, there is another, much more common case similar to the above
which appears in real-world code:

node->next->next = something; when node->next begins with the value
'node'. (A circularly linked list one link long.)

Using the offsetof() macro, and the a <---> *(a + b) identity, this
new expression may be transformed into the first undefined one. This
is a problem. Should such common practice be undefined? If not, then
there may be a defect in the standard.

One possible solution is to make pointer dereferencing (or accessing
an array) contain a sequence point between the calculation of the
pointer value and its dereference. Most users of C probably have the
mistaken impression that such a sequence point already exists. (The
abstract machine "obviously" needs to evaluate the value of an address
before that address may be dereferenced.) However, an argument can be
made that the current undefined behaviour allows compiler
optimizations that otherwise wouldn't be possible due to the knowledge
that the undefined situation cannot occur.

Steven
 
S

Stefan Ram

sfuerst said:
However, there is another, much more common case similar to
the above which appears in real-world code:
node->next->next = something;

Can you show at least one source code file containing

node->next->next = something;

?
 
S

sfuerst

  Can you show at least one source code file containing

      node->next->next = something;

  ?

A regular expression that finds them (along with a few false
positives) is
grep -Er "\->(.*)\->\1 .* ?= " *

Here's what that finds (removing the != , ==, >= etc. cases) in the
Linux kernel source

arch/sparc/kernel/pci_psycho.c: pbm->sibling->sibling = pbm;
arch/sparc/kernel/pci_schizo.c: pbm->sibling->sibling = pbm;
net/tipc/bcast.h: item->next->next = NULL;
drivers/message/fusion/mptbase.c: ioc->alt_ioc->alt_ioc =
NULL;
drivers/message/fusion/mptbase.c: ioc->alt_ioc->alt_ioc =
NULL;
drivers/scsi/aic7xxx/aic79xx_core.c: next_scb->col_scb-
col_scb = next_scb;
drivers/scsi/ipr.c: ipr_cmd->sibling->sibling = NULL;
drivers/scsi/aic7xxx_old.c: prev_p->next->next =
current_p;
drivers/scsi/aic7xxx_old.c: prev_p->next->next =
current_p;

Some of the above are false-positives because the pointer chain is
known to never to loop back to the first pointer.

Steven
 
R

Richard Tobin

sfuerst said:
However, there is another, much more common case similar to the above
which appears in real-world code:

node->next->next = something; when node->next begins with the value
'node'. (A circularly linked list one link long.)

This seems like a very odd thing to do with a circular list. Can you
suggest why it might happen in a real program?

I conjecture that such code probably has a bug in it anyway.

-- Richard
 
D

Dag-Erling Smørgrav

This seems like a very odd thing to do with a circular list. Can you
suggest why it might happen in a real program?

It reminds me of a simple loop detection algorithm (use two pointers;
for every iteration, one pointer advances one step and the other two
steps; if there is a loop, the "fast one" will eventually catch up with
the "slow one" from behind)

However, you wouldn't advance the "fast pointer" two steps without an
intermediate NULL check - unless you already knew that the list
looped... which of course a circular linked list does, by design.

BTW, the a[a[0]] construct the OP mentioned earlier reminds me of RC4...

DES
 
R

Richard Tobin

This seems like a very odd thing to do with a circular list. Can you
suggest why it might happen in a real program?
[/QUOTE]
It reminds me of a simple loop detection algorithm (use two pointers;
for every iteration, one pointer advances one step and the other two
steps; if there is a loop, the "fast one" will eventually catch up with
the "slow one" from behind)

However, you wouldn't advance the "fast pointer" two steps without an
intermediate NULL check - unless you already knew that the list
looped... which of course a circular linked list does, by design.

Remember that this problem - if it in fact exists - only arises if you
*assign* to node->next->next when it's circular. Merely examining it
is certainly ok.

-- Richard
 
M

Michael Foukarakis

However, there is another, much more common case similar to the above
which appears in real-world code:

node->next->next = something; when node->next begins with the value
'node'.  (A circularly linked list one link long.)

I don't see how the quoted statement can invoke undefined behaviour,
assuming next is a pointer to the next node in an ADT..(if it were a
flexible array member, then yes, it is potentially dangerous, but why
would anyone dereference a flexible array member is beyond me). It is
not at all equivalent with the a[a[0]] = foo; statement, either. There
are no side effects with the use of the arrow operator, and there is
nothing obscure about it either.

Overall, nice attempt to troll.
 
D

Dag-Erling Smørgrav

Michael Foukarakis said:
I don't see how the quoted statement can invoke undefined behaviour,
assuming next is a pointer to the next node in an ADT.

Did you read what sfuerst wrote? Let me spell it out for you:

#include <stdlib.h>

void list(void)
{
struct node { int data; struct node *next; };
struct node *node;

node = calloc(1, sizeof *node);
node->next = node;
node->next->next = NULL;
}

void array(void)
{
int a[5] = { -1, -1, -1, -1, -1, };

a[0] = 0;
a[a[0]] = -1;
}

The first example is a classical singly-linked list with only one
element. The second can also be viewed as a singly-linked list where
each element in the array is the index of the next element in the list,
and -1 is the equivalent of NULL - although it's not very useful, since
there is nowhere to store any actual data, unless you add a second array
b of the same size, such that a is the "next pointer" for the element
stored in b.

On a related note, a novice programmer could be excused for thinking
that the first two assignments in list() could be collapsed into one:

node->next = node = calloc(1, sizeof *node);

It seems logical, but invokes undefined behavior.
Overall, nice attempt to troll.

Not very constructive. And people wonder why USENET is going the way of
the dodo...

DES
 
D

Dag-Erling Smørgrav

Dag-Erling Smørgrav said:
The first example is a classical singly-linked list with only one
element.

a classical *circular* singly-linked list, that is. Never liked them,
myself. It's like spelling "banana" - you never know when to stop.

DES
 
S

Stefan Ram

=?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= said:
struct node { int data; struct node *next; };
struct node *node;
node = calloc(1, sizeof *node);

Just for curiosity: Does this has any benefits compared to

struct node { int data; struct node *next; } * node =
malloc( sizeof * node );

? Especially, I refer to the use of »calloc«.
 
D

Dag-Erling Smørgrav

Just for curiosity: Does this has any benefits compared to

struct node { int data; struct node *next; } * node =
malloc( sizeof * node );

? Especially, I refer to the use of »calloc«.

Check the standard, your system documentation, or any good book about C
(e.g. K&R 2) for an explanation of the difference between malloc() and
calloc().

DES
 
A

Antoine Leca

Michael said:
I don't see how the quoted statement can invoke undefined behaviour,

C99 6.5p2: between two sequence points (here the whole assignment
expression), an assigned-to object (here, the "final" /next/ pointer)
shall have its value read only to determine the value to be stored
(here, clearly restricted to 'something', perhaps casted.)

And one may argue that in the above expression, the value is _also_ read
as part of the evaluation of the "intermediary" /next/ pointer, in order
to determine the object to be assigned.

It is not at all equivalent with the a[a[0]] = foo; statement, either.

I believe Dag-Erling addressed this one.

There are no side effects with the use of the arrow operator,

I fail to see any side effect with the [] operator either. Can you
elaborate?

and there is nothing obscure about it either.

There is nothing obscure, just that we are in the grey areas of the
legal terms of the Standard.

Overall, nice attempt to troll.

The thread is cross-posted to comp.lang.c and comp.std.c; in the latter
group (which is the one I read), I believe it is really on-topic.
Please take my post in this context, it might help you to understand my
point.


Antoine
 
K

Kaz Kylheku

This seems like a very odd thing to do with a circular list. Can you
suggest why it might happen in a real program?

An empty circular list with a sentinel node is exactly such a list.

Let's suppose ``node'' to be a pointer to such a list (specifically, to the
sentinel node).

Thus node->next is a pointer to the list head (the condition
(node->next == node indicating that the list is empty).

We might use node->next->next to insert a new item just after the first
node.

newnode->next = node;

node->next->next = newnode;

If the list is empty, something meaningful happens: the item simply
becomes the only item the list. That might be a bug, since
in that case the usual postcondition ``newnode is the second item in the
list'' isn't true. Or it might not be a bug. It depends on whether
a dependency on such a postcondition is coded into the program
elsewhere.
 
T

Tim Rentsch

sfuerst said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.

The general opinion was that the above invokes undefined behaviour due
to the fact that there is no sequence point in the expression, and
that the value of a[0] is used in a "value computation" unsequenced
with respect to the side-effect of the assignment operator.

These terms are used in the newer C1X drafts, N1336 and N1401.
Looking at N1401 under 6.5.16p3, it says that the side effect of
updating the left operand (of an assignment) is sequenced after
the value computations of the left and right operands. In other
words, under N1401 this assignment is well-defined.

So, you might want to reconsider the discussion in that thread
in light of Larry Jones's recent comment that the new "sequencing"
language was put in for threading support and doesn't change
the single thread semantics, which would mean both the 'a[a[0]] = 1;'
and the other case discussed below are well-defined, not undefined.

[my apologies if this got posted twice... newsreader confusion]

This is
slightly surprising, with a naive viewpoint that the above "should" be
the same as t = a[0], a[t] = 1; However, code like the above
expression is rare (I only found a single instance in the Linux kernel
source) - so this obscure case can probably be ignored to maintain the
simplicity of the sequence point rules in the standard.

However, there is another, much more common case similar to the above
which appears in real-world code:

node->next->next = something; when node->next begins with the value
'node'. (A circularly linked list one link long.)

Using the offsetof() macro, and the a <---> *(a + b) identity, this
new expression may be transformed into the first undefined one. This
is a problem. Should such common practice be undefined? If not, then
there may be a defect in the standard.

One possible solution is to make pointer dereferencing (or accessing
an array) contain a sequence point between the calculation of the
pointer value and its dereference. Most users of C probably have the
mistaken impression that such a sequence point already exists. (The
abstract machine "obviously" needs to evaluate the value of an address
before that address may be dereferenced.) However, an argument can be
made that the current undefined behaviour allows compiler
optimizations that otherwise wouldn't be possible due to the knowledge
that the undefined situation cannot occur.

Steven
 
S

sfuerst

However, there is another, much more common case similar to the above
which appears in real-world code:
node->next->next = something; when node->next begins with the value
'node'.  (A circularly linked list one link long.)

I don't see how the quoted statement can invoke undefined behaviour,
assuming next is a pointer to the next node in an ADT..(if it were a
flexible array member, then yes, it is potentially dangerous, but why
would anyone dereference a flexible array member is beyond me). It is
not at all equivalent with the a[a[0]] = foo; statement, either. There
are no side effects with the use of the arrow operator, and there is
nothing obscure about it either.

Overall, nice attempt to troll.

It is not me contending that undefined behaviour is invoked. That
answer seemed to be the consensus of the thread in 2002. It's start
is at:
http://groups.google.com/group/comp.std.c/browse_thread/thread/cffd61637f5520d/5ce672676f5ab8ef
and it is a particularly interesting read. The pointer-chain example
is introduced about halfway along.

I'd like to believe that most programmers assume that there is some
sort of temporal ordering between the calculation of the value of a
pointer, and that pointer's eventual dereference. Unfortunately, it
seems no such ordering is mandated by the standard itself. In the
thread in 2002, there were hypothetical C implementations discussed
that could crash, hang, or do other crazy things as a result of merely
executing compiled code corresponding to inserting something into a
circularly linked list. That such implementations could exist is a
rather obscure trap imho.

Steven
 
S

sfuerst

sfuerst said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:
a[a[0]] = 1; when a[0] begins with the value 0.
The general opinion was that the above invokes undefined behaviour due
to the fact that there is no sequence point in the expression, and
that the value of a[0] is used in a "value computation" unsequenced
with respect to the side-effect of the assignment operator.

These terms are used in the newer C1X drafts, N1336 and N1401.
Looking at N1401 under 6.5.16p3, it says that the side effect of
updating the left operand (of an assignment) is sequenced after
the value computations of the left and right operands.  In other
words, under N1401 this assignment is well-defined.

So, you might want to reconsider the discussion in that thread
in light of Larry Jones's recent comment that the new "sequencing"
language was put in for threading support and doesn't change
the single thread semantics, which would mean both the 'a[a[0]] = 1;'
and the other case discussed below are well-defined, not undefined.

[my apologies if this got posted twice...  newsreader confusion]

Thanks for this. I was reading N1336 (not N1404).

Of course the above still disallows tricks like:
a[a[0]=0]=1;
but I could live with that. :)

Steven
 
L

lawrence.jones

In comp.std.c sfuerst said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.

The general opinion was that the above invokes undefined behaviour due
to the fact that there is no sequence point in the expression, and
that the value of a[0] is used in a "value computation" unsequenced
with respect to the side-effect of the assignment operator.

I believe this is corrected by the new sequencing language in the C1X
draft (N1425 is the latest version).
 
K

Kaz Kylheku

Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.

The assignment cannot modify a[0] until that assignment actually happens;
i.e. the modification of a[0] is the culmination of that assignment
(except for the part that the assignment expression also returns a value,
which is not relevant in the above).

The assignment cannot happen until the value being assigned is known (which it
always is since it is the constant 1), and it also cannot happen until the
location of the destination operand is established; if you don't yet know
/where/ to store the value, you cannot do the assignment.

To know where the assignment is going to be stored, we must evaluate
a[0], and so this must happen before the assignment occurs, which means
that it it uses the prior value of a[0].

The access to a[0] is therefore properly sequenced with regard to the
assignment: not by explicit sequence points but by a data flow dependency which
follows the rules of causality in our observable universe.

Outside of science fiction fantasies involving time travel, parallel universes,
or time going backwards in localized regions of time-space, the determination
of where an assignment is going to happen cannot occur after the assignment has
already happened.
The general opinion was that the above invokes undefined behaviour due

General or not, that is not a particularly well-informed opinion.

C expression evaluation (oustide of the sequencing operators like comma or ||)
is not totally unordered. The standard says explicitly that the order
of evaluation of the /subexpressions/ of an operator is unspecified
and may even be parallel, and the order in which side effects take place
is unspecified:

6.5 Expresions

3 The grouping of operators and operands is indicated by the syntax.71)
Except as specified later (for the function-call (), &&, ||, ?:, and comma
operators), the order of evaluation of subexpressions and the order in which
side effects take place are both unspecified.

Note that this paragraph does not insist that there is never any
order among side effects and subexpressions. There are situations where
logic requires order.

And note that 6.5 doesn't say that the evaluation of operators may be reordered
with respect to their own subexpressions (that is illogical)! Only among the
subexpressions is the order unspecified. To evaluate an operator, you must
fetch the values of its subexpressions (except where short-circuiting permits
elision). Only after that can the operator be evaluated.

So what happens if the evaluation of a parent operator invokes a side effect,
which depends on the constituent subexpressions? In that case we cannot
conclude that the order is unspecified between that side effect and
the subexpressions.

The = operator in a[a[0]] = 1 operator has two subexpressions: a[a[0]] and 1.
The implementation may evalute these in either order, in accordance with 6.5p3.
It also has a side effect: that of storing to the lvalue. Certainly, that side
effect might be ordered arbitrarily with regard to some side effects elsewhere
in the expression, in accordance with 6.5.

However, the two subexpressions of the assignment cannot be reordered with
regard to the store side effect /of that same assignment/ operator. That is a
logical impossibility, since the assignment must evaluate its two operands
before producing that side effect.

This doesn't need to be spelled out in the standard, because it's not the sort
of document for readers who cannot recognize logical impossibilities.
 
B

Ben Bacarisse

Kaz Kylheku said:
Hello, back in 2002 there was a long discussion in these newsgroups
about the undefinedness of the expression:

a[a[0]] = 1; when a[0] begins with the value 0.
The general opinion was that the above invokes undefined behaviour due

General or not, that is not a particularly well-informed opinion.
<snip more about sequencing>

I don't disagree with your arguments about sequencing, logic, and
causality, but I don't see how you can dismiss the conclusion above so
lightly -- particularly without any reference to the text that, in my
view, renders it undefined.

The oft-quoted 6.5 paragraph 2 reads:

"Between the previous and next sequence point an object shall have
its stored value modified at most once by the evaluation of an
expression. Furthermore, the prior value shall be read only to
determine the value to be stored." [footnote numbers removed]

Surely the prior value of an object (a[0] -- which is having it's
stored value modified exactly once) is being read for some purpose
other than to determine the value to be stored?

You could argue that this shouldn't be undefined, or that the new
sequencing wording in n1401.pdf makes it no longer undefined; but that
it was undefined (and still is undefined) looks like a perfectly
reasonable conclusion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top