trim whitespace

J

John Kelly

static void
trim (char **ts)
{
unsigned char *exam;
unsigned char *keep;

if (!*ts) {
errno = EINVAL;
printf ("trim: %s\n", strerror (errno));
return;
}
exam = (unsigned char *) *ts;
while (isspace (*exam)) {
++exam;
}
*ts = (char *) exam;
if (!*exam) {
return;
}
keep = exam;
while (*++exam) {
if (!isspace (*exam)) {
keep = exam;
}
}
if (*++keep) {
*keep = '\0';
}
}


Anyone see bugs? It's not a trick question, I use this code. Just
wondering if I overlooked anything.
 
N

Nick

John Kelly said:
static void
trim (char **ts)
{
unsigned char *exam;
unsigned char *keep;

if (!*ts) {
errno = EINVAL;
printf ("trim: %s\n", strerror (errno));
return;
}
exam = (unsigned char *) *ts;
while (isspace (*exam)) {
++exam;
}
*ts = (char *) exam;
if (!*exam) {
return;
}
keep = exam;
while (*++exam) {
if (!isspace (*exam)) {
keep = exam;
}
}
if (*++keep) {
*keep = '\0';
}
}


Anyone see bugs? It's not a trick question, I use this code. Just
wondering if I overlooked anything.

Nothing in the code, but you'll need to be aware (the double pointer
makes it clear) that it changes the start of the string.

So if you were to allocate a buffer, read something into it, trim it,
print it and free it, you'd be in trouble. I tend to take the overhead
of shuffling it all down for the convenience of the pointer not moving.

I'm not convinced that last if does anything apart from the side effect
- if *++keep isn't true then it's the end of the string so you could
overwrite it harmlessly.
 
S

Seebs

static void
trim (char **ts)
{
unsigned char *exam;
unsigned char *keep;
if (!*ts) {

I would check ts before checking *ts. (It's also not obvious why
you're passing ts in as a "char **".)
errno = EINVAL;
printf ("trim: %s\n", strerror (errno));

It is rare but conceivable that printf will itself alter errno
even if it succeeds; check the FAQ for the famous question of
why errno has changed to ENOTTY after a successful call to printf.

I would usually send errors to stderr, not stdout, but perhaps
more importantly, I would find it pretty upsetting for a "trim
whitespace" function to produce a diagnostic error if I handed
it an empty string!
return;
}
exam = (unsigned char *) *ts;
while (isspace (*exam)) {
++exam;
}
*ts = (char *) exam;

Ah-hah, the explanation becomes clear; you're changing the pointer.

This is almost certainly not a good API. Consider the case where
someone has allocated a string:
char *s = malloc(20);
strcpy(s, " hello ");
trim(&s);

There is no longer any way for them to determine what the address they
need to pass to free() is, because you've tossed the string away.
if (!*exam) {
return;
}

This produces a possibly-surprising behavior, which is that a series of
spaces with no other content can produce an empty string, and you don't
complain, but an empty string as input produces a complaint.
keep = exam;
while (*++exam) {
if (!isspace (*exam)) {
keep = exam;
}
}
if (*++keep) {
*keep = '\0';
}
}

Here you modify the contents of the string. Which is not necessarily bad,
but if you're willing to do that, I'd say go ahead and do the entire thing
by modification.
Anyone see bugs? It's not a trick question, I use this code. Just
wondering if I overlooked anything.

See above.

Here's how I'd probably do it:
#include <ctype.h>
#include <string.h>
#include <stdio.h>

char *trim(char *s) {
char *save, *next, *last = 0;
if (!s)
return NULL;
if (!*s)
return s;
/* s is a non-empty string */
save = s;
next = s;
last = s;
/* if there are leading spaces, copy everything after
* them to the beginning of the string
*/
if (isspace((unsigned char) *next)) {
while (isspace((unsigned char) *next))
++next;
while (*next) {
*s++ = *next++;
/* save location of last non-space */
if (!isspace((unsigned char) s[-1])) {
last = s;
}
}
} else {
/* just find last non-space */
while (*next) {
if (!isspace((unsigned char) *next)) {
last = next + 1;
}
++next;
}
}
/* if we found a non-space character, we saved a pointer
* to one past it; this is either the first of the
* terminal series of spaces, or a null byte.
*/
if (last)
*last = '\0';
return save;
}

char *tests[] = {
"",
" foo ",
" bar",
"baz ",
" ",
NULL
};

int
main(void) {
char buf[256];
int i;
for (i = 0; tests; ++i) {
strcpy(buf, tests);
printf("<%s> => <%s>\n", tests, trim(buf));
}
return 0;
}

(For the curious: I originally failed the " " test, because I didn't
set "last" unless I'd seen a non-space character. Arguably, this would
also go away if I simply copied the trailing null byte, and that might
be better.)

-s
 
J

John Kelly

So if you were to allocate a buffer, read something into it, trim it,
print it and free it, you'd be in trouble. I tend to take the overhead
of shuffling it all down for the convenience of the pointer not moving.

Yes it does have that design limitation. Maybe I should save a copy of
the input pointer and strcpy the trimmed string to it, before returning.
An in-place overlapping strcpy should be safe in that case, no?

I'm not convinced that last if does anything apart from the side effect
- if *++keep isn't true then it's the end of the string so you could
overwrite it harmlessly.

True.

But to me it's a question of style. I don't like doing a no-op.

That's adds an extra instruction in the case where it's not a no-op, but
sacrificing the small efficiency for the sake of style seems harmless in
this instance.
 
D

Dann Corbit

Why bother checking if 's' refers to NULL? The other C string
functions in the standard library fail hard when passed a NULL
pointer, so why should trim be any different?

Because it will crash if you do pass a NULL pointer. It should be
different because this behavior is better. Here is my implementation:

#include <ctype.h>
#include <string.h>

char *
alltrim (char *s)
{
char *start = s;
char *trimend;
size_t end;
if (!s)
return s;
end = strlen (s);
if (!end)
return s;
while (isspace (*start))
start++;
trimend = &s[end-1];
while (isspace (*trimend))
{
*trimend-- = 0;
}
return start;
}
#ifdef UNIT_TEST
#include <stdio.h>
static char s0[] = " I need trimming on both ends. ";
static char s1[] = "I need trimming on far end. ";
static char s2[] = " I need trimming on near end.";
static char s3[] = " I need more trimming on both ends. ";
static char s4[] = "\n\t\rI need special trimming on both ends.\n\t\r ";
static char s5[] = " \n\t\r I need special trimming on near end.";
static char s6[] = "I need special trimming on far end. \n\t\r ";
static char s7[] = "I need no trimming";
static char s8[] = "";

static char *strings[12] =
{
s0, s1, s2, s3, s4, s5, s6, s7, s8, NULL,
};

int
main (void)
{
int i;
for (i = 0; i < 10; i++)
{
printf ("Original string:[%s]\n", strings);
printf ("Trimmed string:[%s]\n", alltrim (strings));
puts("---------------------------------------");
}
return 0;
}
#endif
 
K

Keith Thompson

Seebs said:
I would check ts before checking *ts. (It's also not obvious why
you're passing ts in as a "char **".)


It is rare but conceivable that printf will itself alter errno
even if it succeeds; check the FAQ for the famous question of
why errno has changed to ENOTTY after a successful call to printf.

Right. Note that this is guaranteed to print the error string for
EINVAL, since strerror (errno) is evaluated before printf is called.
But the value of errno after trim() returns could be a non-zero
value set by printf.

If you really wanted to do this:

printf ("trim: %s\n", strerror (EINVAL));
errno = EINVAL;
return;

But ...
I would usually send errors to stderr, not stdout, but perhaps
more importantly, I would find it pretty upsetting for a "trim
whitespace" function to produce a diagnostic error if I handed
it an empty string!

Indeed.

[...]
 
I

Ian Collins

Right. Note that this is guaranteed to print the error string for
EINVAL, since strerror (errno) is evaluated before printf is called.
But the value of errno after trim() returns could be a non-zero
value set by printf.

If you really wanted to do this:

printf ("trim: %s\n", strerror (EINVAL));
errno = EINVAL;
return;

But ...

Considering all he is doing is outputting strerror(errno), it would make
way more sense to set errno and return a failure indication. This would
allow the caller to decide whether they wanted to output a diagnostic.

The only diagnostic a library function should emit is the result of an
assert!
 
S

Seebs

Why bother checking if 's' refers to NULL? The other C string
functions in the standard library fail hard when passed a NULL
pointer, so why should trim be any different?

Because I'd rather have fewer crashes than more, from my code.

-s
 
S

Seebs

Setting errno without any other indication of failure is counter
intuitive. Why not return a success or failure?

I think the data point that may be missing is:

John, in general, it is permitted and expected that functions may set
errno as a harmless side-effect of ordinary operations. As such, people
tend not to rely on checking errno to determine whether an error occurred.
Instead, errno is used to tell you what the error was when you know there
was an error.

-s
 
I

ImpalerCore

Because I'd rather have fewer crashes than more, from my code.

That's a fair enough answer. The question is, would you still accept/
use a version of trim without the NULL argument check?

If not, would you prefer to use your own version of strcpy written in
a defensive style to avoid crashes from NULL pointer arguments if the
standard library's version of strcpy crashes? Should 'strlen(NULL)'
be valid semantically and return 0? Should I file a bug-report for
compilers whose standard library functions crash on NULL arguments?

I don't have a problem in general with functions that pass NULL
arguments through and do nothing to avoid a crash, but does a function
that crashes on a NULL pointer imply a design flaw? (From your
response, it seems to imply that it does.)

The thing is, I keep going back and forth between whether or not to
verify NULL pointer arguments and I can't seem to settle on a rule of
thumb to decide if/when it is better to use one or the other.
 
K

Keith Thompson

Seebs said:
Because I'd rather have fewer crashes than more, from my code.

I'd rather have my code crash earlier rather than later, and I'd
rather have it crash than give me garbage output. (Well, usually.)

Suppose you call trim(s) where s happens to be a null pointer.
That's almost certainly a bug in your program. When would you like
to find out about it?

On the other hand, that probably argues for having trim() checking
whether s==NULL and aborting if it is. On the other hand, on most
implementations the program is going to crash as soon as trim()
tries to dereference s.

But whatever the behavior of trim(NULL) is, whether it returns NULL,
returns some error indication, or aborts your program, it should be
documented.
 
S

Seebs

That's a fair enough answer. The question is, would you still accept/
use a version of trim without the NULL argument check?
Maybe.

If not, would you prefer to use your own version of strcpy written in
a defensive style to avoid crashes from NULL pointer arguments if the
standard library's version of strcpy crashes? Should 'strlen(NULL)'
be valid semantically and return 0? Should I file a bug-report for
compilers whose standard library functions crash on NULL arguments?

No. Here's the thing:

If I were DESIGNING the library, I'd insist on it being robust in that
way, probably. (There's a case to be made, in some environments, for
not doing that -- if things get passed around enough, you're testing
the same pointer for null/not-null dozens of times.)

But since it ISN'T that way, and it's a standard part of the language,
I expect users to know that.
I don't have a problem in general with functions that pass NULL
arguments through and do nothing to avoid a crash, but does a function
that crashes on a NULL pointer imply a design flaw? (From your
response, it seems to imply that it does.)

I think it sometimes does.
The thing is, I keep going back and forth between whether or not to
verify NULL pointer arguments and I can't seem to settle on a rule of
thumb to decide if/when it is better to use one or the other.

Mine is that I basically always do it in my code, and never assume anyone
else does it. Except free() because I'm emotionally attached to the
sanity of free(malloc(1));

-s
 
I

ImpalerCore


Unfortunately, errno gets clobbered by any function that sees fit to
put its error there. It's more of a portability issue, where
different functions on different systems like to put their error codes
there. Does a malloc failure set errno to ENOMEM? On some it does
and others it does not. On some platforms, some functions that
succeed can modify errno, and there's a small chance that other
libraries may touch it, so it's somewhat risky to rely on it. That's
why in general it's better to rely on return value to indicate error
status when possible.

When using the return value as the error state is not feasible, you
can still use a global errno like variable that you have control
over. It has the same problems that errno does, except now it's
portable since it doesn't compete for errno.

For example, my allocator wrapper uses a global state to indicate an
allocation failure in addition to its return value, because some of my
library functions internally do memory allocation but the return value
is not used to indicate an internal allocation error. Rather than
adding an additional parameter to return the allocation fault error
state, I use my own global state. The example snippet below shows a
global identifier c_enomem that is normally false and set to true when
my allocator wrapper fails, and functions like you would expect errno
would.

\code snippet
#define c_enomem (*gc_error_not_enough_memory())

/* A flag that signals that an out-of-memory error has occurred. */
static bool gc_private_allocator_enomem = false;

bool* gc_error_not_enough_memory( void )
{
return &gc_private_allocator_enomem;
}

void* c_malloc( size_t size )
{
void* mem = NULL;

if ( size )
{
mem = (*gc_private_allocator_ftable.malloc)( size );
if ( !mem ) {
gc_private_allocator_enomem = true;
}
}

return mem;
}
\endcode

I would go with other people's recommendation and use a return value
for error if possible, and errno to indicate the kind of error.

Best regards,
John D.
 
J

John Kelly

Setting errno without any other indication of failure is counter
intuitive.

I agree.

Why not return a success or failure?

I'm not sure that's necessary for a "trim" operation. I need to think
about that, and decide on the best interface design.
 
I

Ian Collins

I agree.


I'm not sure that's necessary for a "trim" operation. I need to think
about that, and decide on the best interface design.

Why? Returning an error result is better than printing a idagnostic.
The caller is given the choice, if they care about failure they can do
something, if not they can ignore it. Just like printf().
 
J

John Kelly

Why? Returning an error result is better than printing a idagnostic.
The caller is given the choice, if they care about failure they can do
something, if not they can ignore it. Just like printf().

Good thinking. You saved me some brain strain. :)
 
L

lawrence.jones

John Kelly said:
Yes it does have that design limitation. Maybe I should save a copy of
the input pointer and strcpy the trimmed string to it, before returning.
An in-place overlapping strcpy should be safe in that case, no?

No. Overlapping strcpy is *never* safe. But you have pointers to the
beginning and end of the string so you know its length and thus can use
memmove instead, which *is* safe.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,114
Members
46,702
Latest member
VernitaGow

Latest Threads

Top