C Is Not Assembly

E

eps

http://james-iry.blogspot.com/2010/04/c-is-not-assembly.html

C Is Not Assembly

In my last article I dug down into some of the joys of undefined pointer
behavior in C. But I did it in the context of an architecture that most
developers aren't too likely to ever see again, the Intel 8086. I wanted
to show that this stuff matters even with more mainstream architectures
because compilers are free to do a lot of non-obvious things. C is not
assembly.

The United States Computer Emergency Readiness Team (US-CERT) "is
charged with providing response support and defense against cyber
attacks for the Federal Civil Executive Branch (.gov) and information
sharing and collaboration with state and local government, industry and
international partners."

With a U.S. Department of Homeland Security badge on their page you know
they're serious. When you overflow your buffers the terrorists win. So
you'd think they'd take C seriously.

I found a real WTF gem caused by programmers treating C like assembly.
Vulnerability Note VU#162289

"In the C language, given the following types:

char *buf;
int len;

some C compilers will assume that buf+len >= buf. As a result, code that
performs wrapping checks similar to the following:

len = 1<<30;
[...]
if(buf+len < buf) /* wrap check */
[...overflow occurred...]

are optimized out by these compilers; no object code to perform the
check will appear in the resulting executable program. In the case where
the wrap test expression is optimized out, a subsequent manipulation of
len could cause an overflow. As a result, applications that perform such
checks may be vulnerable to buffer overflows."

The advisory is careful to admit that compilers are free to do just
that. Here's why: greatly simplified, the C standard says a pointer must
point at a valid object or just past the end. Any pointer arithmetic
that might cause a pointer to step outside those bounds yields undefined
behavior. So by definition either buf + len >= buf or the program is
free to do anything up to and including launching shoulder mounted
kitten missiles at Capitol Hill.

Still, there is a WTF to lay on the compiler writer here. If a
programmer writes an "if" test then presumably he or she had some reason
to believe that sometimes the test might be true. Before optimizing away
the conditional the compiler really should have issued a warning.

In order for this code to have any hope of working a few assumptions
must hold: sizeof(int) <= sizeof(char *), overflow must happen
"silently", etc. But there's another major assumption here: the buffer
pointed to by buf must be located at the end of its address space. With
a check like this, if there are any objects located higher in the same
overflow segment then those objects are getting some kitten missiles. So
another WTF is a developer making an assumption about how a compiler
works in the face of undefined code.

Now, there are a few scenarios where all these assumptions might be
justified. For instance, if you're targeting some special purpose
embedded device then memory layout might be well understood. In such
situations, the optimization performed by the compiler might be shocking
indeed, even if technically permissible.

The problem is that the developer is thinking at the assembly code level
but the C standard says the compiler doesn't have to "think" the same
way. In assembly the distance between what you write and the object code
generated is pretty small (barring some kind of complicated macro
magic). In C the distance is quite a bit larger.

Repeat after me, C is not assembly.
 
S

Stefan Ram

eps said:
char *buf;
int len;
some C compilers will assume that buf+len >= buf. (...)
Here's why: greatly simplified, the C standard says a pointer must
point at a valid object or just past the end. Any pointer arithmetic
that might cause a pointer to step outside those bounds yields undefined
behavior. So by definition either buf + len >= buf or the program is
free to do anything

char a[ 10 ]; buf = a + 5; len = -2;

Now, evaluating *( buf + len ) is perfectly permissible, therefore,
a C implementation is not entitled to assume that buf+len >= buf.
 
E

Ersek, Laszlo

I found a real WTF gem caused by programmers treating C like assembly.
Vulnerability Note VU#162289

"In the C language, given the following types:

char *buf;
int len;

some C compilers will assume that buf+len >= buf.

That would be an invalid assumption.

char buffer[2],
*buf = &buffer[1];
int ofs = -1;

assert(buf + ofs < buf);

And the whole thing is defined. I think the wording of VU#162289 is not
precise enough here; a conformant compiler must not make that assumption
*in general*. I'd fix the wording as "some *broken* C compilers will
assume", or add a reference to further circumstances where such a
derivation is appropriate.

As a result, code that performs wrapping checks similar to the
following:

len = 1<<30;
[...]
if(buf+len < buf) /* wrap check */
[...overflow occurred...]

[snip]

The advisory is careful to admit that compilers are free to do just
that.

[snip]

Still, there is a WTF to lay on the compiler writer here. If a
programmer writes an "if" test then presumably he or she had some reason
to believe that sometimes the test might be true. Before optimizing away
the conditional the compiler really should have issued a warning.

I think gcc does this, and it sure drives me mad sometimes. I am perfectly
free to do valid tests in my code that evaluate *always* to true on some
platform, and *always* to false on some other platform. I wish for a gcc
pragma, referring *only* to the related spot, which says "yes, I know,
don't complain". "Sometimes" includes "on some implementations".

lacos
 
A

Andrew Poelstra

I found a real WTF gem caused by programmers treating C like assembly.
Vulnerability Note VU#162289

"In the C language, given the following types:

char *buf;
int len;

some C compilers will assume that buf+len >= buf.

That would be an invalid assumption.

char buffer[2],
*buf = &buffer[1];
int ofs = -1;

assert(buf + ofs < buf);

And the whole thing is defined. I think the wording of VU#162289 is not
precise enough here; a conformant compiler must not make that assumption
*in general*. I'd fix the wording as "some *broken* C compilers will
assume", or add a reference to further circumstances where such a
derivation is appropriate.

I think that if you take the address of any object (or the return
value of malloc), and add an offset to it, that offset is /always/
guaranteed to be in a positive direction, according to the standard.

So if you don't take a pointer /into/ something, but rather one /to/
something, the compiler can make such "optimizations".

(I hope I'm being clear here; English is a horrible language for this
sort of thing.)
 
B

Ben Pfaff

Andrew Poelstra said:
I think that if you take the address of any object (or the return
value of malloc), and add an offset to it, that offset is /always/
guaranteed to be in a positive direction, according to the standard.

If you change that to "..., and add a *positive* offset to it,
...." then I think you are correct. C pointer arithmetic allows
negative offsets too.
 
K

Keith Thompson

Andrew Poelstra said:
I found a real WTF gem caused by programmers treating C like assembly.
Vulnerability Note VU#162289

"In the C language, given the following types:

char *buf;
int len;

some C compilers will assume that buf+len >= buf.

That would be an invalid assumption.

char buffer[2],
*buf = &buffer[1];
int ofs = -1;

assert(buf + ofs < buf);

And the whole thing is defined. I think the wording of VU#162289 is not
precise enough here; a conformant compiler must not make that assumption
*in general*. I'd fix the wording as "some *broken* C compilers will
assume", or add a reference to further circumstances where such a
derivation is appropriate.

I think that if you take the address of any object (or the return
value of malloc), and add an offset to it, that offset is /always/
guaranteed to be in a positive direction, according to the standard.

Not quite, but I understand what you mean.

If you take the address of any object that isn't part of some larger
object (such as a single declared object or an object created by
malloc()), then adding a negative offset to that address invokes
undefined behavior. If you add an offset to such an address, the
compiler may legitimately assume that the offset is non-negative,
since if it's negative, the behavior is undefined anyway.

My quibble is that the phrase "any object" can refer to an object
that happens to be an array element.

I'm not aware of any compilers that get this wrong.
 
E

Ersek, Laszlo

Andrew Poelstra said:
On Tue, 13 Apr 2010, eps wrote:

I found a real WTF gem caused by programmers treating C like
assembly. Vulnerability Note VU#162289

"In the C language, given the following types:

char *buf;
int len;

some C compilers will assume that buf+len >= buf.

That would be an invalid assumption.

[snip]

[...] I'd fix the wording as "some *broken* C compilers will assume",
or add a reference to further circumstances where such a derivation is
appropriate.

I think that if you take the address of any object (or the return value
of malloc), and add an offset to it, that offset is /always/ guaranteed
to be in a positive direction, according to the standard.

Not quite, but I understand what you mean.

If you take the address of any object that isn't part of some larger
object (such as a single declared object or an object created by
malloc()), then adding a negative offset to that address invokes
undefined behavior. If you add an offset to such an address, the
compiler may legitimately assume that the offset is non-negative, since
if it's negative, the behavior is undefined anyway.

Yes. However, the original text of the vulnerability note, at least as
quoted by the blog post, says only "given the following types, so and so".
That condition is not sufficient.

I did consider the situation that you and Andrew describe: "add a
reference to further circumstances where such a derivation is
appropriate".

Cheers,
lacos
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top