Bounds checked string library

jacob navia · Feb 14, 2004

A function like strcpy takes now, two unbounded pointers.

Unbounded pointers, i.e. pointers where there is no
range information, have catastrophic failure modes
specially when *writing* to main memory.

A better string library would accept *bounded* pointers.
We would have then:
char *strcpyN(char *destination, size_t bound1,
char *src,size_t bound2);

Bounded pointers are used in C in many interfaces.
This is absolutely nothing new.

Their use could be made more generalized when the
functions in the C library would leave the obsession
with unbounded pointers and accept this type too.

Of course, clever compilers could pass automatically
size information to the called function, but that would
be just an improvement. What is needed is a standard
that would allow generalized use of this type of
pointers in applications that need them.

Because in many applications security is more
important than sparing a few cycles.

Of course there exist many string libraries that do
this, but each has its own syntax. Much better
would be if standard C would encourage the use
of bounded pointers with a string library
that uses them.

jacob

Mark McIntyre · Feb 14, 2004

A function like strcpy takes now, two unbounded pointers.

Unbounded pointers, i.e. pointers where there is no
range information, have catastrophic failure modes
specially when *writing* to main memory.

Jacob
in this and your other post, I personally feel you're solving problems that
don't exist. A good compiler will have its own way to check them during
debugging, and careful programming will avoid them in production.

Plus frankly, I don't see this as a problem anyway. I'm passing an array
to a function and doing something with it - in your model I need to know
how big it is when I write the fn, which is a serious problem. Imagine I'm
reading in data from a file, and mallocing the memory for it, I don't know
at compile time how much memory ie how large an array I need.

I understand what you're trying to do but I do genuinely think that its a
problem thats already solvable by adequate quality programming.

A better string library would accept *bounded* pointers.
We would have then:
char *strcpyN(char *destination, size_t bound1,
char *src,size_t bound2);

strncpy does most of this already. The bit it doesn't do, checking the size
of the destination, is trivial to check yourself before calling it.

Their use could be made more generalized when the
functions in the C library would leave the obsession
with unbounded pointers and accept this type too.

better to create a new type, say "string", which contains the size info
within the type. What does that remind me of...

jacob navia · Feb 15, 2004

Mark McIntyre said:
Jacob
in this and your other post, I personally feel you're solving problems that
don't exist. A good compiler will have its own way to check them during
debugging, and careful programming will avoid them in production.

No. There is no way to write the strcpy function without
provoking a catastrophic failure with unbounded pointers.

Plus frankly, I don't see this as a problem anyway. I'm passing an array
to a function and doing something with it - in your model I need to know
how big it is when I write the fn, which is a serious problem.

This is precisely my point. This *is* a serious problem. You
*must* check the bounds of the array when writing to it.

Most C programmers do not do it because is incredible
tedious:

if (strlen(src) < sizeof(dst))
strcpy(src,dst);

You see a lot of code like that?

Imagine I'm
reading in data from a file, and mallocing the memory for it, I don't know
at compile time how much memory ie how large an array I need.

fread accepts bounded pointers, since the input buffer is bounded
by the size to read in!!!

I understand what you're trying to do but I do genuinely think that its a
problem thats already solvable by adequate quality programming.

Yes, but it is VERY TEDIOUS so most people (me included)
do not do it!!!

This is precisely the problem. The interface of those function
is plain wrong.

strncpy does most of this already. The bit it doesn't do, checking the size
of the destination, is trivial to check yourself before calling it.

At EACH CALL ???

This is of course possible but it is BAD DESIGN!

You are doing what a machine could do much faster.
What's the use of computers if we are going to waste
time and effort doing their job?

better to create a new type, say "string", which contains the size info
within the type. What does that remind me of...

Yes. The other solution is to overload the [] operator
and use bounded strings. This is much easier but
probably would provoke such an outcry that a smaller
but still useful solution is better.

jacob

Martin Johansen · Feb 15, 2004

The concept of the C language is to give the programmer the power of
assembly language, but with increased visual comprehension.

This is the point you are missing.

If you want specialized languages, try the languages PERL or D, for example,
which have the specialized features you are refering to.

Mark McIntyre · Feb 15, 2004

No. There is no way to write the strcpy function without
provoking a catastrophic failure with unbounded pointers.

Thats not what I said. The compiler writer can implement bounds checking in
debug mode without changing the syntax of strcpy. And you, the user of
strcpy, can avoid buffer overflows by writing careful code.

This is precisely my point. This *is* a serious problem. You
*must* check the bounds of the array when writing to it.

Rubbish. When you read 4 bytes from a file into a 12-byte array, do you
need to check that the destination is big enough first? When you copy those
4 bytes into a 128 byte array, do you need to check again?

When you need to check you should check. Wen you don't you don't.

Most C programmers do not do it because is incredible
tedious:

if you do it like this, its tedious. However if your code requires such
checking, then you /should/ put it in.

if (strlen(src) < sizeof(dst))
strcpy(src,dst);

This is not how to do it though.

You see a lot of code like that?

I see a lot of code that makes sure its array bounds can't be overflowed.
But not by daft checks like the above.

fread accepts bounded pointers, since the input buffer is bounded
by the size to read in!!!

Who said I was using fread? And whats to stop the simple mistake

int somedata[128];
fread(somedata, 129,1, file);

Yes, but it is VERY TEDIOUS so most people (me included)
do not do it!!!

Then you're doing it wrong.

At EACH CALL ???

No, at each declaration.

better to create a new type, say "string", which contains the size info
within the type. What does that remind me of...

Click to expand...

Yes. The other solution is to overload the [] operator
and use bounded strings. This is much easier but
probably would provoke such an outcry that a smaller
but still useful solution is better.

I believe the phrase I'm searchgin for is "you know where C++ can be
found...".

Nick Landsberg · Feb 15, 2004

jacob said:
No. There is no way to write the strcpy function without
provoking a catastrophic failure with unbounded pointers.

This is precisely my point. This *is* a serious problem. You
*must* check the bounds of the array when writing to it.

Most C programmers do not do it because is incredible
tedious:

if (strlen(src) < sizeof(dst))
strcpy(src,dst);

You see a lot of code like that?

No, I don't, and that's because where I work we
consider the use of strcpy() to be bad practice
and use strncpy() instead. Unadorned strcat(),
strcpy(), gets(), sprintf(), etc. don't pass code inspections.
They are treated as historical curiosities
which should not, in general, be used in production
code. And yes, checking the size of the destination
is standard coding practice in our shop.

fread accepts bounded pointers, since the input buffer is bounded
by the size to read in!!!

Yes, but it is VERY TEDIOUS so most people (me included)
do not do it!!!

This is precisely the problem. The interface of those function
is plain wrong.

strcat(), et. al., were the first of the "strxxx" functions
to be issued and put into the library. After the problems
were identified, the "strnxxx" functions were issued.
(This was before ANY standard, I believe.) I would have loved
to see the former retired from the libraries, but that did
not happen, probably because too much existing code would break.
Yes, the functions have an interface which is dangerous when
used without discipline, but that, IMO, is not a reason
to change (or augment) the language specification.

Patient: "Doctor, it hurts when I do this."
Doctor: "Then don't do it."

At EACH CALL ???

This is of course possible but it is BAD DESIGN!

Why is it bad design? Please explain.

You are doing what a machine could do much faster.
What's the use of computers if we are going to waste
time and effort doing their job?

As far as I know, there is nothing in the standard
which specifies the contents of a pointer. (Please
correct me if I am wrong.) Thus an *implementation*
may choose to somehow store the limits (bounds) of the valid area
for that pointer to operate on as part of the pointer
representation and doing something implementation
dependent when the pointer is out of bounds.
This would, of course, break code which relies
on pointer arithmetic, but that is bad practice
anyway. I know of no implementation which has
chosen to do so.

The implementation may also choose to implement
the standard libaries in a way in which the size
of the array_of_whatever is stored in the size_t
bytes just before the array and perform bounds
checking that way. (Much like some malloc implementations
keep track of the size of an allocated area.)
Again, I know of no *C* implementation
which has chosen to do so. (Although some Fortran
string-handling packages do exactly that.)

Thus there appear, at least on the face of it,
a couple of possible ways to do this without changing
the language definition.

better to create a new type, say "string", which contains the size info
within the type. What does that remind me of...

Click to expand...

Yes. The other solution is to overload the [] operator
and use bounded strings. This is much easier but
probably would provoke such an outcry that a smaller
but still useful solution is better.

jacob

Malcolm · Feb 15, 2004

Nick Landsberg said:
No, I don't, and that's because where I work we
consider the use of strcpy() to be bad practice
and use strncpy() instead.

The problem is that src will be longer than dst for a reason. Maybe someone
has an unusually long address, for example. Simply putting in a check
removes the undefined behaviour, but substitutes wrong behaviour, which is
no benefit at all. So you need either to exit the program with an error
message or code some sort of intelligent address truncator.

As far as I know, there is nothing in the standard
which specifies the contents of a pointer. (Please
correct me if I am wrong.) Thus an *implementation*
may choose to somehow store the limits (bounds) of the valid area
for that pointer to operate on as part of the pointer
representation and doing something implementation
dependent when the pointer is out of bounds.
This would, of course, break code which relies
on pointer arithmetic, but that is bad practice
anyway. I know of no implementation which has
chosen to do so.

It is actually illegal to calculate an invalid address, unless it is one
beyond the end of a valid range, so strictly conforming code would not
break.
The AS400 implementation actually does this (see the size of a
sizeof(pointer) thread).

Richard Heathfield · Feb 15, 2004

jacob said:
A function like strcpy takes now, two unbounded pointers.

No, it takes two pointers. They are bounded by common sense programming
safeguards (such as Everything Must Be Somewhere).

Unbounded pointers, i.e. pointers where there is no
range information, have catastrophic failure modes
specially when *writing* to main memory.

Not if you don't let them fail.

A better string library would accept *bounded* pointers.

A /different/ string library would accept bounded pointers.

We would have then:
char *strcpyN(char *destination, size_t bound1,
char *src,size_t bound2);

Bounded pointers are used in C in many interfaces.
This is absolutely nothing new.

Indeed. Lots of people have written string libraries. (Me too!)

Their use could be made more generalized when the
functions in the C library would leave the obsession
with unbounded pointers and accept this type too.

That would be a mistake. C is fast right now, because it assumes the
programmer knows what he's doing. When bounds-checking makes sense, the C
programmer puts it in (and, if he doesn't, on his own head be it). If it
doesn't make sense, why bother? You'll just slow everything down.

Of course, clever compilers could pass automatically
size information to the called function, but that would
be just an improvement. What is needed is a standard
that would allow generalized use of this type of
pointers in applications that need them.

Because in many applications security is more
important than sparing a few cycles.

If you want C++'s std::string, you know where to find it. And if you don't,
well, here is C - it's lean and it's mean and it's very, very keen, so
please don't slow it down for the rest of us just because some people don't
know when to bounds-check and when not to.

Of course there exist many string libraries that do
this, but each has its own syntax. Much better
would be if standard C would encourage the use
of bounded pointers with a string library
that uses them.

If it's all the same to you, I think I prefer it the way it is.

Emmanuel Delahaye · Feb 15, 2004

jacob navia said:
Most C programmers do not do it because is incredible
tedious:

if (strlen(src) < sizeof(dst))
strcpy(src,dst);

You see a lot of code like that?

I have this in my personal standard C library in my "STR" module :

/* ---------------------------------------------------------------------
STR_safecopy()
---------------------------------------------------------------------
safe string copy
In case of outbonding, the string is cut-off.
---------------------------------------------------------------------
I: destination address
I: destination size
I: source address
O: destination address
--------------------------------------------------------------------- */
char *STR_safecopy (char *const des
,size_t const size
,char const *const src)
{
char *s_out = NULL;
if (des && size && src)
{
memcpy (des, src, size - 1);
des[size - 1] = 0;
s_out = des;
}

return s_out;
}

On my "str.h", I could add if I dared:

#undef strcpy()
#define strcpy(a, b) assert (0)

to prevent the use of strcpy() and encourage the use of my function
instead...

jacob navia · Feb 15, 2004

Martin Johansen said:
The concept of the C language is to give the programmer the power of
assembly language, but with increased visual comprehension.

What about security then?

This idea that "good" programmers never make a single
mistake is plain wrong.

Never had a trap in your life Martin?

Now come on...

This is the point you are missing.

I have programmed and still go on programming in assembler.
But I have been programming in C for so many years that
the problems that young people find cool to solve I find
them unbearable after 20+ years you see?

And they happen to me again and again, like to
everyone else. I have an "off by one" bug in the
very proposal about strings

I am fed up with it and would like to get rid of it.

If you want specialized languages, try the languages PERL or D, for example,
which have the specialized features you are refering to.

C is very good. But without those bugs please.

Peter Nilsson · Feb 16, 2004

Emmanuel Delahaye said:
I have this in my personal standard C library in my "STR" module :

On my "str.h", I could add if I dared:

#undef strcpy()

#undef strcpy

#define strcpy(a, b) assert (0)

I tend to do...

assert(!"strcpy() disabled: use foobar()");

....because it makes the assert print something more useful than say...

Assertion failed at blah.c line 6: 0

Severian · Feb 16, 2004

#undef strcpy

I tend to do...

assert(!"strcpy() disabled: use foobar()");

...because it makes the assert print something more useful than say...

Assertion failed at blah.c line 6: 0

I would do:

#define strcpy(a,b) NeverFuckingUseThisFucntionPleaseKTHXa,b)

Dan Pop · Feb 16, 2004

In said:
I would do:

#define strcpy(a,b) NeverFuckingUseThisFucntionPleaseKTHXa,b)

With few exceptions, all the standard library functions can be *easily*
misused.

Do you adopt a similar attitude toward them? If not, why singling out
strcpy(), a perfectly good library function?

Dan

CBFalconer · Feb 16, 2004

Mark said:
.... snip ...

Plus frankly, I don't see this as a problem anyway. I'm passing
an array to a function and doing something with it - in your model
I need to know how big it is when I write the fn, which is a
serious problem. ... snip ...

This was a serious problem in the original (level 0) Pascal, and
was solved by the introduction of conformant arrays in level 1 ISO
Pascal. The existance of level 1 produced much disagreement,
especially from American users, at the time of standardization.

C went in the opposite direction, towards no checking whatsoever.
Both approaches have their places.

CBFalconer · Feb 16, 2004

jacob said:
"Mark McIntyre" <[email protected]> a écrit
.... snip ...

This is precisely my point. This *is* a serious problem. You
*must* check the bounds of the array when writing to it.

Most C programmers do not do it because is incredible
tedious:

if (strlen(src) < sizeof(dst))
strcpy(src,dst);

You see a lot of code like that?

So use the (non-standard) BSD functions strlcpy and strlcat. An
implementation can be found on my page, below, download section,
which contains links to the original BSD rationale etc. They are
designed to minimize programmer error.

CBFalconer · Feb 16, 2004

Mark said:
.... snip ...

Yes. The other solution is to overload the [] operator
and use bounded strings. This is much easier but
probably would provoke such an outcry that a smaller
but still useful solution is better.

Click to expand...

I believe the phrase I'm searchgin for is "you know where C++
can be found...".

Not a solution. Virtually anything you can foul in C you can foul
in C++. C++ was (mistakenly IMNSHO) designed to present no shocks
to C programmers. Pascal and Ada have the great advantages of
having been designed to reduce errors, and having been designed by
virtually single people (a Swiss and a Frenchman respectively).

The example of C++ shows what can happen when trying to graft
things onto a simple language. Use a language where it is
appropriate. There is no universal language, although there are
more or less popular languages.

Emmanuel Delahaye · Feb 16, 2004

In said:
#undef strcpy

Good point, Thanks.

Rob Thorpe · Feb 19, 2004

Martin Johansen said:
The concept of the C language is to give the programmer the power of
assembly language, but with increased visual comprehension.

This is the point you are missing.

I think people who make this claim about C either haven't written
enough C or the haven't written enough assembly language. Have you
ever tried to do the things that you've done in C in assembly
language?

Anyway, I normally find people who hold this belief are dogmatic about
it. Maybe you aren't, but I won't argue further, it's not my main
point.

Two things C certainly has got that assembly language lacks is a type
system and language elements for accessing arrays. It is only a small
change to extend the type system to cover arrays properly.

Joona I Palaste · Feb 19, 2004

I think people who make this claim about C either haven't written
enough C or the haven't written enough assembly language. Have you
ever tried to do the things that you've done in C in assembly
language?

Anyway, I normally find people who hold this belief are dogmatic about
it. Maybe you aren't, but I won't argue further, it's not my main
point.

Two things C certainly has got that assembly language lacks is a type
system and language elements for accessing arrays. It is only a small
change to extend the type system to cover arrays properly.

One thing I note C has but most assembly languages lack is the concept
of expressions. In assembly language, every mathematical operation must
usually performed as a separate statement.
Consider this C expression:
(1+2) * (3+4)
Assuming a stack-based one-register assembly language, this makes:
LOAD 1
LOAD 2
ADD
LOAD 3
LOAD 4
ADD
MUL

CBFalconer · Feb 19, 2004

Joona said:
.... snip ...

One thing I note C has but most assembly languages lack is the
concept of expressions. In assembly language, every mathematical
operation must usually performed as a separate statement.

Consider this C expression:
(1+2) * (3+4)
Assuming a stack-based one-register assembly language, this makes:

LOAD 1
LOAD 2
ADD
LOAD 3
LOAD 4
ADD
MUL

The concept is there, you just don't recognize it. Each of the
LOADs was a constant expression. The whole sequence was an
expression, and left the result stacked, just as did the loads.

Bounds checked arrays	50	Feb 14, 2004
Bounds checking and safety in C	140	Jul 29, 2007
library design	5	Jun 20, 2008
Container library (continued)	21	Dec 25, 2009
The Cephes Mathematical Library	2	Jun 23, 2008
Increasing efficiency in C	100	Mar 3, 2004
String buffer overruns?	8	Feb 27, 2012
database of C standard library functions?	3	Jul 17, 2005

Bounds checked string library

jacob navia

Mark McIntyre

jacob navia

Martin Johansen

Mark McIntyre

Nick Landsberg

Malcolm

Richard Heathfield

Emmanuel Delahaye

jacob navia

Peter Nilsson

Severian

Dan Pop

CBFalconer

CBFalconer

CBFalconer

Emmanuel Delahaye

Rob Thorpe

Joona I Palaste

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads