trim whitespace

ImpalerCore · Aug 20, 2010

Here's my thought: The difference is whether the return makes
additional representations, such as "that was a valid string". If
the behavior is
if (s == NULL)
return NULL;

then I'm not converting an invalid string into valid data; I'm leaving
it invalid.

That's a good way of putting it. I think it's a good justification
for doing it that way.

For strlen(), there's no value (maybe a -1, if you didn't use
size_t) that communicates "that wasn't even a string, your
question is invalid".

Thanks for your input (and patience ;-). I guess I'll head back to
the think tank for a while.

Keith Thompson · Aug 20, 2010

pete said:
The string functions are described as working on objects.
If ((size_t)-1) is the maximum size of an object,
then ((size_t)-1) is too long for a string length
and still works for that purpose.

And code that doesn't specifically check for a (size_t)-1 result
from strlen will likely do very bad things. Obviously it's too
late to change strlen() now. Even if it had been defined that way
from the beginning, it would be error-prone (unless you just avoid
calling strlen(NULL) in the first place).

Error handling in C is tricky.

John Kelly · Aug 20, 2010

The memmove length argument is a size_t. Can I be assume (keep - hast)
will never overflow size_t? Can I assume anything about the result of
the pointer subtraction?

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap, of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

In the code below, can anyone produce

printf ("what is the meaning of this\n");

I can only get

printf ("my pointer wrapped around\n");

# include <limits.h>
# include <stdio.h>
# include <stdlib.h>

void
diff (char *alpha, char *omega)
{
printf ("ptrdiff is %lld\n", (long long) (omega - alpha));
}

int
main (void)
{

char *alpha;
char *omega;
char data[1];

alpha = data;

omega = data;
omega += SSIZE_MAX - 2;

diff (alpha, ++omega);
diff (alpha, ++omega);
diff (alpha, ++omega);

omega = data;
omega += SSIZE_MAX - 2;
omega += SSIZE_MAX;

diff (alpha, ++omega);
diff (alpha, ++omega);
diff (alpha, ++omega);

diff (alpha, ++omega);

if (omega == alpha) {
printf ("my pointer wrapped around\n");
} else {
printf ("what is the meaning of this\n");
}

return 0;

}

Seebs · Aug 20, 2010

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap,

If you go outside the boundaries of an object, you have invoked undefined
behavior.

If you don't, there is no way to get outside the range of ptrdiff_t.

of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

Don't go outside the boundaries of an object. Then there is no
badness.

-s

Ben Bacarisse · Aug 20, 2010

Seebs said:
If you know the pointer subtracted-from is later in an object, then you can
assume that the result of the pointer subtraction is a valid size_t. By
definition, size_t can represent the size of any object. Thus, in the most
extreme case (a maximally-large object which is an array of characters),
size_t can hold the difference between the last and first addresses into
that object, therefore, you're fine.

Except that the result is of type ptrdiff_t and the result is undefined
if the difference is not representable as a value of this type. The
standard permits the range of size_t to exceed that of either the signed
or unsigned range of ptrdiff_t.

It is interesting to note that the minimum limits required for these
types are:

PTRDIFF_MIN -65535
PTRDIFF_MAX +65535

SIZE_MAX 65535

I.e. ptrdiff_t requires at least 17 bits so if your implementation's
size_t is minimal (0 to 65535) the ptrdiff_t is bound to be able to hold
any valid pointer difference. While it might have been useful to insist
that ptrdiff_t values range from -SIZE_MAX to +SIZE_MAX, that would not
be reasonable in general.

John Kelly · Aug 20, 2010

Except that the result is of type ptrdiff_t and the result is undefined
if the difference is not representable as a value of this type. The
standard permits the range of size_t to exceed that of either the signed
or unsigned range of ptrdiff_t.

It is interesting to note that the minimum limits required for these
types are:

PTRDIFF_MIN -65535
PTRDIFF_MAX +65535

SIZE_MAX 65535

I.e. ptrdiff_t requires at least 17 bits so if your implementation's
size_t is minimal (0 to 65535) the ptrdiff_t is bound to be able to hold
any valid pointer difference.

But as you said,

"the standard permits the range of size_t to exceed that of either the
signed or unsigned range of ptrdiff_t"

So I can't make an assumption about boundedness of my string size, in
relation to ptrdiff_t.

Ugh. And it gets worse.

Suppose my memory, beginning at the hast location, has no \0 bytes for a
size greater than size_t. The loop code testing for \0 (end of string)
would never find it, thus an infinite loop. Double ugh!

I'll be back.

Keith Thompson · Aug 20, 2010

Seebs said:
By definition, size_t can represent the size of any object.

[...]

We had a lengthy discussion about that not long ago. The standard
merely says that size_t is the type of the result of the sizeof
operator; there's no explicit statement that it can represent the
size of any object.

malloc-allocated objects can't have sizeof applied to them, at
least not directly.

In principle, an object larger than size_t bytes could be created
by some implementation specific method, or by calling calloc()
with arguments whose mathematical product exceeds SIZE_MAX.

In real life, I think that any *sane* implementation will choose a
size_t type big enough to represent the size of any object it can
support, and calloc() will fail and return a null pointer for any
request to create such a huge object.

Change "By definition" to "In practice", or even to "By what the
definition *should* be", and I agree.

John Kelly · Aug 20, 2010

Seebs said:
Seebs said:

By definition, size_t can represent the size of any object.

Click to expand...

[...]

We had a lengthy discussion about that not long ago. The standard
merely says that size_t is the type of the result of the sizeof
operator; there's no explicit statement that it can represent the
size of any object.

malloc-allocated objects can't have sizeof applied to them, at
least not directly.

In principle, an object larger than size_t bytes could be created
by some implementation specific method, or by calling calloc()
with arguments whose mathematical product exceeds SIZE_MAX.

In real life, I think that any *sane* implementation will choose a
size_t type big enough to represent the size of any object it can
support, and calloc() will fail and return a null pointer for any
request to create such a huge object.

Problem is, trim() can be called with an arbitrary pointer. Who knows
what it points to? It's not limited to "objects." Maybe it's a region
of memory larger than size_t, which contains no \0 bytes. An algorithm
must protect itself at all times.

I'll see what I can do.

Keith Thompson · Aug 20, 2010

John Kelly said:
Problem is, trim() can be called with an arbitrary pointer. Who knows
what it points to? It's not limited to "objects." Maybe it's a region
of memory larger than size_t, which contains no \0 bytes. An algorithm
must protect itself at all times.

I'll see what I can do.

Probably not much.

There is no portable way, and typically no way at all, for a function
that operates on strings to protect against a passed pointer that
doesn't point to a '\0'-terminated string.

There are "safer" versions of C string functions that take an
additional argument specifying the maximum size of the array
containing the string; if there's no '\0' within the specified number
of bytes, there's an error (handling it is another matter). Even so,
it's still the caller's responsibility to pass valid arguments.

(Note that strncpy() is not a "safer" version of strcpy; for
historical reasons, its behavior is quite different from what you
might expect given the name. Search for "strncpy" in the archives
of this newsgroup for more discussion.)

John Kelly · Aug 20, 2010

Probably not much.

There is no portable way, and typically no way at all, for a function
that operates on strings to protect against a passed pointer that
doesn't point to a '\0'-terminated string.

Bail out of your loop and report an error if it reaches SIZE_MAX or
SSIZE_MAX iterations. I'm pondering what constant is appropriate for
trim().

And the ptrdiff_t subtraction of

(keep - hast)

is another thing to cope with. In trim(), hast is never greater than
keep, unless the pointer itself overflows and wraps around. Before each
iteration looking for \0, I should check the pointer value to see if
it's going to wrap.

Not sure what to compare it against though. PTR_MAX would be nice but I
don't see any such thing.

James Waldby · Aug 20, 2010

.

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap, of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

In the code below, can anyone produce
printf ("what is the meaning of this\n");

I can only get
printf ("my pointer wrapped around\n");

[snip most code lines, except a representative few:]

omega = data;
omega += SSIZE_MAX - 2;
diff (alpha, ++omega);

.... (above line appears 7 times or so)

if (omega == alpha) {... }

As Seebach notes in his reply, "If you go outside the boundaries of an
object, you have invoked undefined behavior. [...] Don't go outside
the boundaries of an object. Then there is no badness." [1]

The question relevant in c.l.c isn't so much "can anyone produce"
certain behavior, as "can anyone produce, portably and in accord
with C standards", to which the answer is no, due to the irretrievably
undefined behavior of the program. That said, when I run the program
on my x86_64 Linux system with gcc, it gives the same result you got.

As a practical matter in a program like this, include a couple more
lines -- eg
printf ("SSIZE_MAX is %lx\n", SSIZE_MAX);
in main, and
printf ("a %8p o %8p ", alpha, omega);
before the printf in diff, to make it more obvious what happens.

[1] Re "Then there is no badness", I presume Seebach references
only the little microcosm in which such a program compiles and
runs, rather than the world as a whole. But I could be wrong,
perhaps he sees defined behavior as a fix for all the problems
of the world.

John Kelly · Aug 20, 2010

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap, of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

In the code below, can anyone produce
printf ("what is the meaning of this\n");

I can only get
printf ("my pointer wrapped around\n");

Click to expand...

[snip most code lines, except a representative few:]

omega = data;
omega += SSIZE_MAX - 2;
diff (alpha, ++omega);

Click to expand...

... (above line appears 7 times or so)

if (omega == alpha) {... }

Click to expand...

As Seebach notes in his reply, "If you go outside the boundaries of an
object, you have invoked undefined behavior. [...] Don't go outside
the boundaries of an object. Then there is no badness." [1]

The question relevant in c.l.c isn't so much "can anyone produce"
certain behavior, as "can anyone produce, portably and in accord
with C standards", to which the answer is no, due to the irretrievably
undefined behavior of the program.

I know it's messed up. Intentionally so.

I threw it together to try and understand what happens when pointers
and/or pointer arithmetic overflow. I don't have a good understanding
of that yet, but once I do, I think I can make trim() bullet-proof.

Keith Thompson · Aug 20, 2010

John Kelly said:
I threw it together to try and understand what happens when pointers
and/or pointer arithmetic overflow. I don't have a good understanding
of that yet, but once I do, I think I can make trim() bullet-proof.

When pointer arithmetic overflows, the behavior is undefined. The only
solution is to avoid the overflow in the first place.

Your trim() function cannot, even in principle, be made bullet-proof.

John Kelly · Aug 20, 2010

[1] Re "Then there is no badness", I presume Seebach references
only the little microcosm in which such a program compiles and
runs, rather than the world as a whole. But I could be wrong,
perhaps he sees defined behavior as a fix for all the problems
of the world.

Yeah some of these guys crack me up.

Black hats don't respect the standard. They're looking for any hole
they can find.

Standards conformance has its place, but how can you protect yourself if
you don't explore UB and learn how things really work.

John Kelly · Aug 20, 2010

When pointer arithmetic overflows, the behavior is undefined. The only
solution is to avoid the overflow in the first place.

Your trim() function cannot, even in principle, be made bullet-proof.

If it's possible to "avoid the overflow in the first place" then your
remarks are self contradictory.

Let's not confuse code with the standard.

Geoff · Aug 20, 2010

The memmove length argument is a size_t. Can I be assume (keep - hast)
will never overflow size_t? Can I assume anything about the result of
the pointer subtraction?

Click to expand...

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap, of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

In the code below, can anyone produce

printf ("what is the meaning of this\n");

I can only get

printf ("my pointer wrapped around\n");

# include <limits.h>
# include <stdio.h>
# include <stdlib.h>

void
diff (char *alpha, char *omega)
{
printf ("ptrdiff is %lld\n", (long long) (omega - alpha));
}

int
main (void)
{

char *alpha;
char *omega;
char data[1];

alpha = data;

omega = data;
omega += SSIZE_MAX - 2;

diff (alpha, ++omega);
diff (alpha, ++omega);
diff (alpha, ++omega);

omega = data;
omega += SSIZE_MAX - 2;
omega += SSIZE_MAX;

diff (alpha, ++omega);
diff (alpha, ++omega);
diff (alpha, ++omega);

diff (alpha, ++omega);

if (omega == alpha) {
printf ("my pointer wrapped around\n");
} else {
printf ("what is the meaning of this\n");
}

return 0;

}

IN Visual Studio 2010 after defining _POSIX_ to get SSIZE_MAX and
adding James' suggested outputs I get:

SSIZE_MAX is 7fff
a 0047F8AB o 004878A9 ptrdiff is 32766
a 0047F8AB o 004878AA ptrdiff is 32767
a 0047F8AB o 004878AB ptrdiff is 32768
a 0047F8AB o 0048F8A8 ptrdiff is 65533
a 0047F8AB o 0048F8A9 ptrdiff is 65534
a 0047F8AB o 0048F8AA ptrdiff is 65535
a 0047F8AB o 0048F8AB ptrdiff is 65536
what is the meaning of this

Ben Bacarisse · Aug 20, 2010

Seebs said:
If you go outside the boundaries of an object, you have invoked undefined
behavior.

If you don't, there is no way to get outside the range of ptrdiff_t.

Can you say how you reach this conclusion? I can see no reason why two
pointers into one object can't be more that PTRDIFF_MAX elements apart.
The standard (rightly) shies away from requiring signed type that has
twice the range of size_t (except when size_t is minimal, in which case
it does require it).

<snip>

Keith Thompson · Aug 20, 2010

John Kelly said:
If it's possible to "avoid the overflow in the first place" then your
remarks are self contradictory.

Let's not confuse code with the standard.

There is no way for your trim() function, as specified, to
protect itself against all bad inputs. trim() can't tell the
difference between, say, a pointer to a 1000-byte object with no
'\0' characters and a pointer to a 2000-byte object with a '\0'
somewhere near the end. As it scans for the terminating '\0',
it has no way to know when to give up.

Specifically, there is no portable way to do this in C. Some
implementations might provide ways to do it, but as far as I know
most do not (I could easily be mistaken on that last point).

If you have sufficient control over *all* potential callers of
your trim() function, then you can avoid undefined behavior.
If you don't, you can't.

I might not have expressed this sufficiently clearly before.
Do you still think my remarks are self contradictory?

John Kelly · Aug 21, 2010

You're likely using a twos complement machine. On twos complement, overflow is
wrapping. And as long as you're only using adds and subtracts and you know the
final result must be in-range, then the overflow is irrelevant.

Suppose my machine has 8-bit pointers. What will the maximum object
size be?

128, or 256?

John Kelly · Aug 21, 2010

On a twos complement machine, it doesn't matter.

Can I address objects of infinite size?

If not, then what is the maximum addressable object size on this twos
complement 8-bit machine?

trim whitespace, bullet proof version	63	Aug 21, 2010
trim	6	Sep 9, 2009
trim whitespace v3	170	Aug 23, 2010
Please help with C programming to save GPS reception data in Raspberry Pi.	0	Dec 8, 2022
Adding adressing of IPv6 to program	1	Feb 16, 2023
Fibonacci	0	May 13, 2023
Trim string	42	Aug 28, 2009
Linux: using "clone3" and "waitid"	0	Oct 17, 2023

trim whitespace

ImpalerCore

Keith Thompson

John Kelly

Seebs

Ben Bacarisse

John Kelly

Keith Thompson

John Kelly

Keith Thompson

John Kelly

James Waldby

John Kelly

Keith Thompson

John Kelly

John Kelly

Geoff

Ben Bacarisse

Keith Thompson

John Kelly

John Kelly

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads