Dave Vandervies wrote: [snippage]
--------
/*Mentioned in my post - have the VM be aware of sequence points so it
can catch this
*/
i=i++;
--------
C or C++ undefined behavior because of multiple updates between
sequence points is a *language specification conformance* issue, not
an execution one. This is detected by static analysis before or
during translation. Once converted to a sequence of instructions,
such as
ld i
inc i
st i
or
ld i
st i
inc i
the results are well defined for the virtual machine.
Static analysis can't catch all problems of this sort. Consider:
--------
/*Somewhere*/
void foo(int *a,int *b)
{
*a=(*b)++;
}
/*Somewhere else*/
void bar(int x)
{
/*Do some stuff, including:*/
foo(&x,&x);
}
--------
If your static analyzer is smart enough to recognize that you're calling
foo() with equal pointers, then wrap a few more levels of indirection
around it until you've got enough to confuse it. Being able to
(especially unintentionally) construct arbitrarily complex code that can
still lead to this case makes static checking Highly Impractical at best.
On the other hand, handling this dynamically with a sequence-point-aware
VM would trap when foo() gets two pointers to the same int, and at that
point the debugger can be invoked to work out what led to that:
--------
seq_pt ;beginning of foo()
ld a0,arg2
ld a1,arg1
ld i0,(a0)
st i0,(a1) ;VM notes that *a has been modified since last sequence point
inc i0
st i0,(a0) ;traps if a==b: object modified twice between sequence points
seq_pt ;end of *a=(*b)++. Clear modified-object list.
--------
Keep in mind that the VM's purpose is to check for poorly-defined (or
otherwise bad) C code, even if that C code can be compiled to a set of
instructions that are well-defined in the VM.
Since it's constructed as a dynamic code checker for a language that
prohibits multiple updates (and some cases of both access and update)
between sequence points, the VM knows that even though the sequence of
instructions it sees is well-defined, it could only have been generated
from C code that isn't well-defined, so it can trap on that.
--------
char buf[10];
char *str=some_string_pointer;
buf[0]='\0';
/*We want to warn about this (claimed buffer size larger than actual
destination buffer) even if str fits into buf
*/
strncat(buf,str,20);
--------
There are different levels of warnings.
Keep in mind that I introduced this with:
}Create a VM that allows aggressive testing for bad (especially not-
}well-defined) code,
I'm assuming that you wouldn't be using such a thing if you didn't want
something approaching the "pathologically paranoid" level of warnings.
In this example, if
strlen(str) is short enough, the behavior is well-defined, of course,
even though the construct isn't safe for arbitrarily long str
arguments. We can use static analysis in this case to determine that
sizeof(buf) < 20, indicating a questionable construct.
But, once again, static analysis is only enough for the trivial examples
that illustrate the point without confusing the reader, and is unlikely
to be enough to catch the cases where a similar problem shows up in
real code.
If a function gets a buffer size argument larger than the real buffer
size, that's a bug, even if what ends up being written into that buffer
does fit; we want to catch that bug as soon as possible even if the
behavior is actually well-defined until a user's cat starts sleeping
on the keyboard. (If the programmer knows that what's getting written
into the buffer won't overflow it, that's what the non-counted variants
of the functions (strcpy in this case) are for.)
To protect against overflow, we really want
strncat(buf, str, sizeof(buf)-strlen(buf)-1);
That can be detected with static analysis, in some cases, as well. To
check dynamically for potential errors, we would verify that
len <= sizeof(buf)-strlen(buf)-1,
assuming that debug_strncat() has access to sizeof(buf).
If we're storing pointers as segment-offset-size, then a little bit of
implementation magic will give it the appropriate size.
Note that this isn't directly available to the code the programmer sees if
(as is likely) the buffer isn't a local or global array; buffers passed in
(as a pointer) from elsewhere or obtained from malloc are the ones most
likely to have mismatched sizes, and sizeof won't give the size of the
buffer in those cases.
Once you're doing aggressive dynamic checking in the implementation's
runtime environment anyways, it's much simpler for all concerned to let
the library function check the sizes; it knows how buffer size and size
arguments are related, and has access to implementation magic to get at
the information it needs to check them.
dave