trim whitespace

I

ImpalerCore

Here's my thought:  The difference is whether the return makes
additional representations, such as "that was a valid string".  If
the behavior is
        if (s == NULL)
                return NULL;

then I'm not converting an invalid string into valid data; I'm leaving
it invalid.

That's a good way of putting it. I think it's a good justification
for doing it that way.
For strlen(), there's no value (maybe a -1, if you didn't use
size_t) that communicates "that wasn't even a string, your
question is invalid".

Thanks for your input (and patience ;-). I guess I'll head back to
the think tank for a while.
 
K

Keith Thompson

pete said:
The string functions are described as working on objects.
If ((size_t)-1) is the maximum size of an object,
then ((size_t)-1) is too long for a string length
and still works for that purpose.

And code that doesn't specifically check for a (size_t)-1 result
from strlen will likely do very bad things. Obviously it's too
late to change strlen() now. Even if it had been defined that way
from the beginning, it would be error-prone (unless you just avoid
calling strlen(NULL) in the first place).

Error handling in C is tricky.
 
J

John Kelly

The memmove length argument is a size_t. Can I be assume (keep - hast)
will never overflow size_t? Can I assume anything about the result of
the pointer subtraction?

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap, of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

In the code below, can anyone produce

printf ("what is the meaning of this\n");

I can only get

printf ("my pointer wrapped around\n");



# include <limits.h>
# include <stdio.h>
# include <stdlib.h>

void
diff (char *alpha, char *omega)
{
printf ("ptrdiff is %lld\n", (long long) (omega - alpha));
}

int
main (void)
{

char *alpha;
char *omega;
char data[1];

alpha = data;

omega = data;
omega += SSIZE_MAX - 2;

diff (alpha, ++omega);
diff (alpha, ++omega);
diff (alpha, ++omega);

omega = data;
omega += SSIZE_MAX - 2;
omega += SSIZE_MAX;

diff (alpha, ++omega);
diff (alpha, ++omega);
diff (alpha, ++omega);

diff (alpha, ++omega);

if (omega == alpha) {
printf ("my pointer wrapped around\n");
} else {
printf ("what is the meaning of this\n");
}

return 0;

}
 
S

Seebs

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap,

If you go outside the boundaries of an object, you have invoked undefined
behavior.

If you don't, there is no way to get outside the range of ptrdiff_t.
of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

Don't go outside the boundaries of an object. Then there is no
badness.

-s
 
B

Ben Bacarisse

Seebs said:
If you know the pointer subtracted-from is later in an object, then you can
assume that the result of the pointer subtraction is a valid size_t. By
definition, size_t can represent the size of any object. Thus, in the most
extreme case (a maximally-large object which is an array of characters),
size_t can hold the difference between the last and first addresses into
that object, therefore, you're fine.

Except that the result is of type ptrdiff_t and the result is undefined
if the difference is not representable as a value of this type. The
standard permits the range of size_t to exceed that of either the signed
or unsigned range of ptrdiff_t.

It is interesting to note that the minimum limits required for these
types are:

PTRDIFF_MIN -65535
PTRDIFF_MAX +65535

SIZE_MAX 65535

I.e. ptrdiff_t requires at least 17 bits so if your implementation's
size_t is minimal (0 to 65535) the ptrdiff_t is bound to be able to hold
any valid pointer difference. While it might have been useful to insist
that ptrdiff_t values range from -SIZE_MAX to +SIZE_MAX, that would not
be reasonable in general.
 
J

John Kelly

Except that the result is of type ptrdiff_t and the result is undefined
if the difference is not representable as a value of this type. The
standard permits the range of size_t to exceed that of either the signed
or unsigned range of ptrdiff_t.

It is interesting to note that the minimum limits required for these
types are:

PTRDIFF_MIN -65535
PTRDIFF_MAX +65535

SIZE_MAX 65535

I.e. ptrdiff_t requires at least 17 bits so if your implementation's
size_t is minimal (0 to 65535) the ptrdiff_t is bound to be able to hold
any valid pointer difference.

But as you said,

"the standard permits the range of size_t to exceed that of either the
signed or unsigned range of ptrdiff_t"

So I can't make an assumption about boundedness of my string size, in
relation to ptrdiff_t.

Ugh. And it gets worse.

Suppose my memory, beginning at the hast location, has no \0 bytes for a
size greater than size_t. The loop code testing for \0 (end of string)
would never find it, thus an infinite loop. Double ugh!

I'll be back.
 
K

Keith Thompson

Seebs said:
By definition, size_t can represent the size of any object.
[...]

We had a lengthy discussion about that not long ago. The standard
merely says that size_t is the type of the result of the sizeof
operator; there's no explicit statement that it can represent the
size of any object.

malloc-allocated objects can't have sizeof applied to them, at
least not directly.

In principle, an object larger than size_t bytes could be created
by some implementation specific method, or by calling calloc()
with arguments whose mathematical product exceeds SIZE_MAX.

In real life, I think that any *sane* implementation will choose a
size_t type big enough to represent the size of any object it can
support, and calloc() will fail and return a null pointer for any
request to create such a huge object.

Change "By definition" to "In practice", or even to "By what the
definition *should* be", and I agree.
 
J

John Kelly

Seebs said:
By definition, size_t can represent the size of any object.
[...]

We had a lengthy discussion about that not long ago. The standard
merely says that size_t is the type of the result of the sizeof
operator; there's no explicit statement that it can represent the
size of any object.

malloc-allocated objects can't have sizeof applied to them, at
least not directly.

In principle, an object larger than size_t bytes could be created
by some implementation specific method, or by calling calloc()
with arguments whose mathematical product exceeds SIZE_MAX.

In real life, I think that any *sane* implementation will choose a
size_t type big enough to represent the size of any object it can
support, and calloc() will fail and return a null pointer for any
request to create such a huge object.

Problem is, trim() can be called with an arbitrary pointer. Who knows
what it points to? It's not limited to "objects." Maybe it's a region
of memory larger than size_t, which contains no \0 bytes. An algorithm
must protect itself at all times.

I'll see what I can do.
 
K

Keith Thompson

John Kelly said:
Problem is, trim() can be called with an arbitrary pointer. Who knows
what it points to? It's not limited to "objects." Maybe it's a region
of memory larger than size_t, which contains no \0 bytes. An algorithm
must protect itself at all times.

I'll see what I can do.

Probably not much.

There is no portable way, and typically no way at all, for a function
that operates on strings to protect against a passed pointer that
doesn't point to a '\0'-terminated string.

There are "safer" versions of C string functions that take an
additional argument specifying the maximum size of the array
containing the string; if there's no '\0' within the specified number
of bytes, there's an error (handling it is another matter). Even so,
it's still the caller's responsibility to pass valid arguments.

(Note that strncpy() is not a "safer" version of strcpy; for
historical reasons, its behavior is quite different from what you
might expect given the name. Search for "strncpy" in the archives
of this newsgroup for more discussion.)
 
J

John Kelly

Probably not much.

There is no portable way, and typically no way at all, for a function
that operates on strings to protect against a passed pointer that
doesn't point to a '\0'-terminated string.

Bail out of your loop and report an error if it reaches SIZE_MAX or
SSIZE_MAX iterations. I'm pondering what constant is appropriate for
trim().

And the ptrdiff_t subtraction of

(keep - hast)

is another thing to cope with. In trim(), hast is never greater than
keep, unless the pointer itself overflows and wraps around. Before each
iteration looking for \0, I should check the pointer value to see if
it's going to wrap.

Not sure what to compare it against though. PTR_MAX would be nice but I
don't see any such thing.
 
J

James Waldby

.
Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap, of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

In the code below, can anyone produce
printf ("what is the meaning of this\n");

I can only get
printf ("my pointer wrapped around\n");

[snip most code lines, except a representative few:]
omega = data;
omega += SSIZE_MAX - 2;
diff (alpha, ++omega);
.... (above line appears 7 times or so)
if (omega == alpha) {... }

As Seebach notes in his reply, "If you go outside the boundaries of an
object, you have invoked undefined behavior. [...] Don't go outside
the boundaries of an object. Then there is no badness." [1]

The question relevant in c.l.c isn't so much "can anyone produce"
certain behavior, as "can anyone produce, portably and in accord
with C standards", to which the answer is no, due to the irretrievably
undefined behavior of the program. That said, when I run the program
on my x86_64 Linux system with gcc, it gives the same result you got.

As a practical matter in a program like this, include a couple more
lines -- eg
printf ("SSIZE_MAX is %lx\n", SSIZE_MAX);
in main, and
printf ("a %8p o %8p ", alpha, omega);
before the printf in diff, to make it more obvious what happens.

[1] Re "Then there is no badness", I presume Seebach references
only the little microcosm in which such a program compiles and
runs, rather than the world as a whole. But I could be wrong,
perhaps he sees defined behavior as a fix for all the problems
of the world.
 
J

John Kelly

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap, of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

In the code below, can anyone produce
printf ("what is the meaning of this\n");

I can only get
printf ("my pointer wrapped around\n");

[snip most code lines, except a representative few:]
omega = data;
omega += SSIZE_MAX - 2;
diff (alpha, ++omega);
... (above line appears 7 times or so)
if (omega == alpha) {... }

As Seebach notes in his reply, "If you go outside the boundaries of an
object, you have invoked undefined behavior. [...] Don't go outside
the boundaries of an object. Then there is no badness." [1]

The question relevant in c.l.c isn't so much "can anyone produce"
certain behavior, as "can anyone produce, portably and in accord
with C standards", to which the answer is no, due to the irretrievably
undefined behavior of the program.

I know it's messed up. Intentionally so.

I threw it together to try and understand what happens when pointers
and/or pointer arithmetic overflow. I don't have a good understanding
of that yet, but once I do, I think I can make trim() bullet-proof.
 
K

Keith Thompson

John Kelly said:
I threw it together to try and understand what happens when pointers
and/or pointer arithmetic overflow. I don't have a good understanding
of that yet, but once I do, I think I can make trim() bullet-proof.

When pointer arithmetic overflows, the behavior is undefined. The only
solution is to avoid the overflow in the first place.

Your trim() function cannot, even in principle, be made bullet-proof.
 
J

John Kelly

[1] Re "Then there is no badness", I presume Seebach references
only the little microcosm in which such a program compiles and
runs, rather than the world as a whole. But I could be wrong,
perhaps he sees defined behavior as a fix for all the problems
of the world.

Yeah some of these guys crack me up.

Black hats don't respect the standard. They're looking for any hole
they can find.

Standards conformance has its place, but how can you protect yourself if
you don't explore UB and learn how things really work.
 
J

John Kelly

When pointer arithmetic overflows, the behavior is undefined. The only
solution is to avoid the overflow in the first place.

Your trim() function cannot, even in principle, be made bullet-proof.

If it's possible to "avoid the overflow in the first place" then your
remarks are self contradictory.

Let's not confuse code with the standard.
 
G

Geoff

The memmove length argument is a size_t. Can I be assume (keep - hast)
will never overflow size_t? Can I assume anything about the result of
the pointer subtraction?

Given a large region of memory with no \0 byte, it seems the keep
pointer could overflow and wrap, of the ptrdiff of (keep - hast) could
be unrepresentable. Either possibility is bad. How to guard against
the badness?

In the code below, can anyone produce

printf ("what is the meaning of this\n");

I can only get

printf ("my pointer wrapped around\n");



# include <limits.h>
# include <stdio.h>
# include <stdlib.h>

void
diff (char *alpha, char *omega)
{
printf ("ptrdiff is %lld\n", (long long) (omega - alpha));
}

int
main (void)
{

char *alpha;
char *omega;
char data[1];

alpha = data;

omega = data;
omega += SSIZE_MAX - 2;

diff (alpha, ++omega);
diff (alpha, ++omega);
diff (alpha, ++omega);

omega = data;
omega += SSIZE_MAX - 2;
omega += SSIZE_MAX;

diff (alpha, ++omega);
diff (alpha, ++omega);
diff (alpha, ++omega);

diff (alpha, ++omega);

if (omega == alpha) {
printf ("my pointer wrapped around\n");
} else {
printf ("what is the meaning of this\n");
}

return 0;

}

IN Visual Studio 2010 after defining _POSIX_ to get SSIZE_MAX and
adding James' suggested outputs I get:

SSIZE_MAX is 7fff
a 0047F8AB o 004878A9 ptrdiff is 32766
a 0047F8AB o 004878AA ptrdiff is 32767
a 0047F8AB o 004878AB ptrdiff is 32768
a 0047F8AB o 0048F8A8 ptrdiff is 65533
a 0047F8AB o 0048F8A9 ptrdiff is 65534
a 0047F8AB o 0048F8AA ptrdiff is 65535
a 0047F8AB o 0048F8AB ptrdiff is 65536
what is the meaning of this
 
B

Ben Bacarisse

Seebs said:
If you go outside the boundaries of an object, you have invoked undefined
behavior.

If you don't, there is no way to get outside the range of ptrdiff_t.

Can you say how you reach this conclusion? I can see no reason why two
pointers into one object can't be more that PTRDIFF_MAX elements apart.
The standard (rightly) shies away from requiring signed type that has
twice the range of size_t (except when size_t is minimal, in which case
it does require it).

<snip>
 
K

Keith Thompson

John Kelly said:
If it's possible to "avoid the overflow in the first place" then your
remarks are self contradictory.

Let's not confuse code with the standard.

There is no way for your trim() function, as specified, to
protect itself against all bad inputs. trim() can't tell the
difference between, say, a pointer to a 1000-byte object with no
'\0' characters and a pointer to a 2000-byte object with a '\0'
somewhere near the end. As it scans for the terminating '\0',
it has no way to know when to give up.

Specifically, there is no portable way to do this in C. Some
implementations might provide ways to do it, but as far as I know
most do not (I could easily be mistaken on that last point).

If you have sufficient control over *all* potential callers of
your trim() function, then you can avoid undefined behavior.
If you don't, you can't.

I might not have expressed this sufficiently clearly before.
Do you still think my remarks are self contradictory?
 
J

John Kelly

You're likely using a twos complement machine. On twos complement, overflow is
wrapping. And as long as you're only using adds and subtracts and you know the
final result must be in-range, then the overflow is irrelevant.

Suppose my machine has 8-bit pointers. What will the maximum object
size be?

128, or 256?
 
J

John Kelly

On a twos complement machine, it doesn't matter.

Can I address objects of infinite size?

If not, then what is the maximum addressable object size on this twos
complement 8-bit machine?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top