On the development of C

Dik T. Winter · Mar 11, 2009

> news:[email protected]... ....
>
> All minor details... Ideally alloc_size or alloc_end would only be used in
> libraries that know they are available and give useful results.

This can only be done if you are going to fat pointers. The point is that
realloc is only required to work if the pointer is allocated by malloc.
So if we have:
p = malloc(some_stuff);
the following must work:
p1 = realloc(p, other_stuff);
but not:
p1 = realloc(p + 1, other_stuff);
But other functions working with pointers do not have that requirement.
So given:
p = malloc(some_stuff);
what is
alloc_size(p + 1);
to return? How do the libraries know that the pointer is one that is allocated
by malloc?

user923005 · Mar 11, 2009

> On Mar 10, 9:51 am, (e-mail address removed) wrote:
...
> > That statement isn't in dispute, but other factors come into play,
> > such as budget, staffing, and schedule. First you have to find the
> > affected code, then you have to update and test it. Not all
> > organizations have the resources or motivation to do that.
>
> I would argue that in view of the seriousness of the defect (it is a
> well known exploit) any use of gets() in commercial code *must* be
> fixed or the code vendor is negligent.

And that depends entirely on how gets is used and in what kind of programs.
If gets is *only* used in an interface between two programs I see no
problem in using it. It only becomes a problem when the program using
gets can also be used independent from other programs.

What would be the exploit in a program that reads a log-file, uses gets
to read the lines and displays statistics about the input? I see *no*
way for an exploit in such a program.

Damage could very depending upon distribution of this program. If it
is sent to hundreds of users, then the scope of damage can be much
greater.

It is clear that a malicious log file can cause any sort of bad
behavior as desired by the person of evil intent. It may be possible
to generate the evil intent by sending carefully designed input to the
program that writes the log (e.g. a Web server).

As I said before, gets() is probably never going to cause problems by
accident. Someone has to know (or guess) that gets() is used and then
inject bad code via malevolent input. The more they know about the
program using gets() the easier it will be for them to generate an
exploit.

Nate Eldredge · Mar 11, 2009

Bartc said:
Let's say the return from such a function is NULL for a value not part
of any malloc memory, such as addresses of static and frame data.

(Nitpick: you don't mean NULL, since this function isn't supposed to
return a pointer. Let's call it some special value, SIZE_ERR.)

That might require a lot of work to implement, or be impossible,
depending on the implementation.

What if the pointer points within a malloc'ed block, but not to the
beginning? That's harder to recognize.

The gets() implementation which gets a non-NULL value from alloc_size
can then test against the limit (I had in mind the address of the end
of the allocated block), and prevent buffer overflows.

If it gets a NULL value, then it can work as before. So gets() would
be safe only when used with allocated memory.

This means your alloc_size must never have a false positive; it must
always return either the correct size or SIZE_ERR. This probably means
that somewhere there needs to be a list of all malloc'ed blocks. Many
implementations don't maintain such a thing; requiring it would be
expensive in terms of memory, and expensive to search when alloc_size is
called.

Let's imagine a malloc implementation that stores the size of the block
as a size_t immediately preceding the pointer returned by malloc. It
sounds like you're imagining an alloc_size function that might work as
follows:

size_t alloc_size(void *p) {
if (p < heap_start || p >= heap_end)
return SIZE_ERR;
return ((size_t *)p)[-1];
}

Now what happens if the pointer points within a block, and the data just
before it coincidentally looks like a size_t?

char *p = malloc(1000);
size_t s = 3;
memcpy(p, &s, sizeof(s));
gets(p + sizeof(s));

If the line is more than 3 characters, gets() will truncate the line
unnecessarily, or fail in some other way. Truncation would be
especially dangerous because it will result in the program thinking
it's reading data that is different from what's in the file.

All minor details... Ideally alloc_size or alloc_end would only be
used in libraries that know they are available and give useful
results.

Are you proposing that this feature be standard or not? If not, there's
nothing to prevent implementations from providing it of their own
accord, where feasible. But some applications won't find it feasible,
or won't have the guaranteed semantics that you would need to use it in
gets().

Phil Carmody · Mar 11, 2009

Richard Heathfield said:
jacob navia said:

Well, why not just do it once, and then re-use it over and over?

Here, you make a reasonable point, which is worth discussion.

Thanks to modern features of the language, I can write client
code that can use either list type without explicitly knowing
which list type it is.

Having said that, using only the K&R features of the language
I can do that too.

Why does the client care how the list is built anyway? That
sounds like a bad design from the outset. All the client
should care about is the interface, which can be an abstract
pointer and a bunch of functions or macros using such pointers.

Just because you're using C rather than C++ doesn't mean you
have to _avoid_ encapsulation.

Phil

Keith Thompson · Mar 11, 2009

Dik T. Winter said:
What would be the exploit in a program that reads a log-file, uses gets
to read the lines and displays statistics about the input? I see *no*
way for an exploit in such a program.

"Hey, I've got this log file; can you take a look at it with your
analysis program?"

Keith Thompson · Mar 11, 2009

Richard Heathfield said:
CBFalconer said: [...]

I suspect that is because they can no longer charge excessive
dollars and other currency for new versions, so there is no real
return. The only noticeable changes are to the library, and are
designed to make it incompatible with the ISO standard.

Click to expand...

ITYM "incompatible" - note the recent disabling of %n in printf.

Um, Richard, you might want to re-read the above two lines.

Guest · Mar 11, 2009

Keith Thompson said:

jacob navia <[email protected]> writes:

Click to expand...

[...] I said that it is a lie. And YOU say that I am "offensive"
because I tell the truth, and get angry with me obviously.

Click to expand...

Click to expand...

You don't even consider the possibility of an honest mistake. You
assume that any false statement made by someone you personally
dislike must be a deliberate lie.

Click to expand...

He also often assumes, and claims, that a statement WITH WHICH HE
DISAGREES is a lie, whether or not the claim is false.

Despite having been told, I don't think Jacob quite uses the word
"lie" in a standard fashion, and he also seems to be unaware (though
quite
how is difficult to understand as he's been told *this* several times
as well) that it is an offensive term to english speakers (I guess
the french word he thinks it corresponds to isn't quite as rude).

I have made a number of true statements which Jacob Navia has
described as lies, and I doubt very much whether I'm the only one.

Case in point, sortakinda: Mark McIntyre's statement about MSVC
supporting 99% of C99 is broadly true (although the exact
percentage is obviously quibblable), because C99 is basically C90
with a few tiny bits chopped off and a few slightly bigger bits
nailed on. So any C90 compiler supports most of C99.

I suspect that Jacob Navia's meaning of "C99 support" means "support
for features INTRODUCED FOR THE FIRST TIME in C99", rather than
"all C99 features".

I think Jacob's meaning of the term "supports C99" makes much more
sense than Mark McIntyre's. In context Mark's version is quite...
odd.

In other words, he and Mark McIntyre have
failed to agree definitions before beginning the discussion.
Consequently, from Jacob Navia's perspective Mark McIntyre is
mistaken (calling him a liar was indefensible, but he could have
honestly, if unwisely, claimed that Mark was wrong). And of course
the reverse is also true - from Mark McIntyre's perspective, Jacob
Navia is wrong. These views can never be reconciled until the two
parties agree on a common dictionary.

Again, I agree with Jacob here.
Though he might get on better if he didn't throw a temper tantrum
every time someone said something he disagreed with...

Guest · Mar 11, 2009

Even that would be hard to justify since there wasn't much development
of C happening before that, either. Development of the language itself
was pretty much "done" in 1979, with the standard library following not
far behind. C has *never* undergone the kind of rapid (dare I say,
willy-nilly) development that C++ has, simply because that isn't
Ritchie's style. Some people consider stability to be a virtue, not a
shortcoming.

being the proud owner of three radically different versions
of Stroustrup's C++ book and a book on the C++ standard library
by Plauger that is mostly useful as a doorstop I can see their
point...

Oh and a largish C++ source base that cannot be recompiled with
a modern compiler.

James Kuyper · Mar 11, 2009

Despite having been told, I don't think Jacob quite uses the word
"lie" in a standard fashion, and he also seems to be unaware (though
quite
how is difficult to understand as he's been told *this* several times
as well) that it is an offensive term to english speakers (I guess
the french word he thinks it corresponds to isn't quite as rude).

You seem to be assuming that he's not trying to be offensive when he
uses the word 'lie'. Do you have any good reason for that assumption?
I've always assumed that he used the term 'lie' with the deliberate
intent of expressing how angry he feels about the "fact" (as he
perceives it) that a lie was committed.

There's nothing inappropriate about offending someone by calling them a
liar, if they are indeed lying. Jacob's fault lies not in his choice of
an offensive term to describe the act, but in his mistaken belief that
the act occurred.

Chris McDonald · Mar 11, 2009

Not necessarily, the code may not run with any kind of elevated
privileges or only be used internally, in which case there's little to
no benefit to fixing the vulnerability.

Many surveys, many available via the internet, state that 30-40% of
computer crime is performed by a company's own employees, presumably using
software "only be used internally". Even if an attack against gets()
"only" crashes the program, and doesn't gain privilege, that's still a
denial of service.

Bartc · Mar 11, 2009

Nate Eldredge said:
(Nitpick: you don't mean NULL, since this function isn't supposed to
return a pointer. Let's call it some special value, SIZE_ERR.)

I was thinking of a pointer return rather than a size. Or both. See my other
reply to Dik.

That might require a lot of work to implement, or be impossible,
depending on the implementation.

The implementation might need to change to accommodate. Either that or use
some complicated wrappers.

What if the pointer points within a malloc'ed block, but not to the
beginning? That's harder to recognize.

Yes, but I'm not offering to code this thing...

This means your alloc_size must never have a false positive; it must
always return either the correct size or SIZE_ERR. This probably means
that somewhere there needs to be a list of all malloc'ed blocks. Many
implementations don't maintain such a thing; requiring it would be
expensive in terms of memory, and expensive to search when alloc_size is
called.

You mean these blocks exist but are not necessarily linked to each other? I
think if someone wants alloc_size and similar support, some sacrifices are
needed. That's why I said this feature might be better off in a library
which can use the feature or not depending on availability.

Bartc · Mar 11, 2009

Dik T. Winter said:
This can only be done if you are going to fat pointers. The point is that
realloc is only required to work if the pointer is allocated by malloc.
So if we have:
p = malloc(some_stuff);
the following must work:
p1 = realloc(p, other_stuff);
but not:
p1 = realloc(p + 1, other_stuff);

OK. That doesn't change.

But other functions working with pointers do not have that requirement.
So given:
p = malloc(some_stuff);
what is
alloc_size(p + 1);
to return? How do the libraries know that the pointer is one that is
allocated
by malloc?

This wasn't my proposal, but seems a good idea. Let's say there are these
functions:

alloc_size(p)
Bytes in allocation, 0 if p was not allocated
alloc_start(p)
Pointer to start char of allocated block
alloc_end(p)
Pointer to last char of allocated block
alloc_valid(p)
Return 1 if p points inside an allocated block

So p can point anywhere in a block:

q = malloc(N);

Then: p>=q and p<(q+N), when using char pointers

I'm not saying these are going to be efficient to implement, or that there
might not be problems, for example N might be rounded up by malloc and not
stored in it's original form, or the alloc_end return might have alignment
issues.

It might be necessary to revise the malloc implementation to make these
possible.

The problem would in insisting these are available on every system if they
find their way into the standard.

Guest · Mar 11, 2009

jacob navia wrote:

And when somebody says: "lcc-win32 supports C99" I answer that it
is not true and that the person telling that is telling lies.

ah, pouring oil on troubled flames!

Guest · Mar 11, 2009

(e-mail address removed) wrote:

...

You seem to be assuming that he's not trying to be offensive when he
uses the word 'lie'. Do you have any good reason for that assumption?

belief in the fundamental goodness of human kind

<snip>

Guest · Mar 11, 2009

The fact that no work has been done in essential parts of the library
like malloc, for instance, where a function to know the size of an
allocated object would be a big step forward to ensure the absence
of buffer overflows.

interestingly C++ didn't see the need for this either

void pippo()
{
char* buffer = new char [100];

// at this point we can't find out how big buffer is
}

<snip>

Guest · Mar 11, 2009

How would that make gets() safer? Should this new size function work
on pointers that weren't returned by malloc()?

char *p = malloc(42);
/* alloc_size(p) == 42 */

char *q = "Not allocated by malloc";
/* alloc_size(q) == ??? */

I would assume that, if such a function were defined, calling with a
pointer that wasn't returned by malloc and friends would invoke
undefined behavior.

And there are some other questions that would have to be answered.
Does it give you the size you requested (which a current allocator
might not remember), or the size that was actually allocated (which is
likely to be larger than what was requested)?

An implementation has to maintain *some* size information, so
realloc() knows how much to copy. But it's likely that adding an
alloc_size() function would require some implementations to maintain
information that they don't currently store.

and how would this work

void pippo(void)
{
char *buffer = malloc (100);
buffer += 1;
msize (buffer);
}

assuming msize() returns the size of the malloc()ed buffer
what does it return here? 99? 100? and error?

Ben Bacarisse · Mar 11, 2009

Richard Heathfield said:
(e-mail address removed) said:

Right again - such an assumption is not only unjustified but
unjustifiable.

Nevertheless, the audit argument misses the point. Whether gets()
stays or goes is not a matter for statistical reasoning about code
bases. Nor is it a matter for debate. The gets() function must go.

I agree. The only argument cited seems to be (rightly) that existing
code should not be broken without good reason. I am of the opinion
that there is good reason, but it is worth considering the cost. How
expensive would it be, in the most costly case, to link an existing
system with a hand-written gets? How much testing and integration
does a simple function like gets need? I know that testing and
integration are not trivial, but the cost would only be borne when
re-compiling with a new compiler/library when there would have to
considerable testing anyway. I.e. the cost will always be an
incremental one.

Ben Bacarisse · Mar 11, 2009

Phil Carmody said:
Thanks to modern features of the language, I can write client
code that can use either list type without explicitly knowing
which list type it is.

Yes, but Jacob's point way about the need to convert. When you join
systems that have different lists, it does not matter how clean the
interface is, you can't use (usually) one where the other is expected.

Having said that, using only the K&R features of the language
I can do that too.

Why does the client care how the list is built anyway? That
sounds like a bad design from the outset. All the client
should care about is the interface, which can be an abstract
pointer and a bunch of functions or macros using such pointers.

Just because you're using C rather than C++ doesn't mean you
have to _avoid_ encapsulation.

Yes, but I don't think Jacob was rejecting (or avoiding)
encapsulation.

Keith Thompson · Mar 11, 2009

and how would this work

void pippo(void)
{
char *buffer = malloc (100);
buffer += 1;
msize (buffer);
}

assuming msize() returns the size of the malloc()ed buffer
what does it return here? 99? 100? and error?

The most straightforward answer is that it's undefined behavior, just
as free(buffer) would be.

It's already entirely possible for an implementation to provide this
kind of thing as an extension. (Do any implementations do so? If
not, why don't they?)

The problem with the idea of adding it to the standard is that it
would then have to be added to *all* conforming implementations, and
on some of them it would make the existing functions (malloc, calloc,
realloc, free) less efficient because of the extra information that
would have to be maintained. This would adversely affect the
performance of existing code.

Richard Bos · Mar 11, 2009

Not necessarily, the code may not run with any kind of elevated
privileges or only be used internally, in which case there's little to
no benefit to fixing the vulnerability.

Tell me... have you ever been a sysadmin, or only a programmer? 'cause I
have, and I assure you, even broken code which is only used internally
can create real problems for the company when its internal luser has an
attack of the stupid-cough.

Richard

Web development best practice	2	Aug 24, 2023
Is React Native good for mobile game development?	1	Mar 20, 2024
IntelliJ Plugin Wizard Development	1	May 30, 2023
[C#] Extend main interface on child level	0	Aug 31, 2023
What are the key advantages of using a SaaS (Software as a Service) model for application development?	1	Apr 23, 2024
Lexical Analysis on C++	1	Oct 31, 2023
JavaScript Game Development	0	Aug 18, 2021
Is C, C++, and C# Useful For Beginner Game Development	1	Jun 28, 2022

On the development of C

Dik T. Winter

user923005

Nate Eldredge

Phil Carmody

Keith Thompson

Keith Thompson

Guest

Guest

James Kuyper

Chris McDonald

Bartc

Bartc

Guest

Guest

Guest

Guest

Ben Bacarisse

Ben Bacarisse

Keith Thompson

Richard Bos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads