Clunky C cleanup code

fatted · Nov 30, 2004

I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const char *);
char * string = "\t this is a string = sausages\n";
char * chop = "\t\n= ";

char *result = clean_string(string, chop);
printf("[after] %s\n",result);

return EXIT_SUCCESS;
}

char * clean_string(const char * string, const char * chop)
{
char * inter;
char * final = malloc(strlen(string)+1);
strcpy(final, string);

while((inter = strpbrk(final, chop)) != NULL)
{
int size_end = strlen(inter);
int size_start = strlen(final) - size_end;
char mod[6];

final = (char *)realloc(final,size_start + size_end + 1);
sprintf(mod, "%%.%ds%%s",size_start);
sprintf(final,mod,string,inter+1);
string = final;
}
return(final);
}

Rouben Rostamian · Nov 30, 2004

I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

char * clean_string(const char * string, const char * chop)
{
char * inter;
char * final = malloc(strlen(string)+1);
strcpy(final, string);

while((inter = strpbrk(final, chop)) != NULL)
{
int size_end = strlen(inter);
int size_start = strlen(final) - size_end;
char mod[6];

final = (char *)realloc(final,size_start + size_end + 1);
sprintf(mod, "%%.%ds%%s",size_start);
sprintf(final,mod,string,inter+1);
string = final;
}
return(final);
}

That looks good. I think it can be streamlined somewhat.

It is a good idea to get into the habit of writing a brief comment
before each function, to document its purpose and the return value.

That would have made it much easier for me to understand your program.

Here is an alternative version.

/* Allocates memory and returns a string that is obtained by removing
from string "string" all characters found in string "chop".

Returns NUKK if unable to allocate memory.
*/

char * clean_string(const char * string, const char * chop)
{
char * final = malloc(strlen(string)+1);
const char * p = string;
char * q = final;

if (q==NULL)
return NULL;

while (*p!='\0') {
if (strchr(chop, *p))
p++;
else
*q++ = *p++;
}

*q = '\0';

return final;
}

Merrill & Michele · Nov 30, 2004

"fatted"
I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const

[single point to make, rest of post snipped]

I get routinely chastised for defining functions inside of main(). I doubt
that anyone thinks it matters in this case, but what these guys are trying
to do is hammer into us habits that pay off over the long haul. MPJ

Mike Wahler · Nov 30, 2004

Merrill & Michele said:
"fatted"
I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const

Click to expand...

[single point to make, rest of post snipped]

I get routinely chastised for defining functions inside of main().

OP's code did not contain any function definitions inside of main
or any other function.

-Mike

Merrill & Michele · Nov 30, 2004

Mike Wahler said:
Merrill & Michele said:

"fatted"
I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const

Click to expand...

[single point to make, rest of post snipped]

I get routinely chastised for defining functions inside of main().

Click to expand...

OP's code did not contain any function definitions inside of main
or any other function.

-Mike

Richard Bos · Nov 30, 2004

Mike Wahler said:
Merrill & Michele said:

"fatted"

int main(void)
{
char * clean_string(const char *, const

Click to expand...

[single point to make, rest of post snipped]

I get routinely chastised for defining functions inside of main().

Click to expand...

OP's code did not contain any function definitions inside of main
or any other function.

True, but it did contain a function declaration inside main(). This is
legal, but rarely advisable.

Richard

Merrill & Michele · Nov 30, 2004

"Mike Wahler"

"Merrill & Michele"

"fatted"
I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const

Click to expand...

[single point to make, rest of post snipped]

I get routinely chastised for defining functions inside of main().

Click to expand...

OP's code did not contain any function definitions inside of main
or any other function.

Let me go through my premises. There are two return statements. This
implies .... What if I #defined 'definition' 'prototype declaration' in my
last post? MPJ

CBFalconer · Nov 30, 2004

fatted said:
I've written a function (clean_string) to remove characters from a
string, but it looks clunky to me, and I'm sure there's a more 'C'
like way of doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const char *);
char * string = "\t this is a string = sausages\n";
char * chop = "\t\n= ";

char *result = clean_string(string, chop);
printf("[after] %s\n",result);

return EXIT_SUCCESS;
}

char * clean_string(const char * string, const char * chop)
{
char * inter;
char * final = malloc(strlen(string)+1);
strcpy(final, string);

while((inter = strpbrk(final, chop)) != NULL)
{
int size_end = strlen(inter);
int size_start = strlen(final) - size_end;
char mod[6];

final = (char *)realloc(final,size_start + size_end + 1);
sprintf(mod, "%%.%ds%%s",size_start);
sprintf(final,mod,string,inter+1);
string = final;
}
return(final);
}

The first thing you should consider is the interface. It is always
desireable to malloc and free storage in the same function. So I
regard the malloc/realloc in your code with horror. You can always
perform the function on a string without expanding it, so I suggest
you design the function to do the stripping in place. This will
avoid many future errors, and result in a prototype like:

void clean_string(char *cleanee, char *delete);

and the function should include a comment about exactly what it
does, such as:

/* delete all chars in delete from cleanee */

If you really must return something it should be something useful,
such as the resultant string length.

Mike Wahler · Nov 30, 2004

Merrill & Michele said:
"Mike Wahler"

"Merrill & Michele"
"fatted"
I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const
[single point to make, rest of post snipped]

I get routinely chastised for defining functions inside of main().

Click to expand...

OP's code did not contain any function definitions inside of main
or any other function.

Click to expand...

Let me go through my premises. There are two return statements. This
implies .... What if I #defined 'definition' 'prototype declaration' in my
last post?

A declaration isn't always a definition (but a definition is
always a declaration).

-Mike

Keith Thompson · Dec 1, 2004

Merrill & Michele said:
"Mike Wahler"

"Merrill & Michele"
"fatted"
I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const
[single point to make, rest of post snipped]

I get routinely chastised for defining functions inside of main().

Click to expand...

OP's code did not contain any function definitions inside of main
or any other function.

Click to expand...

Let me go through my premises. There are two return statements. This
implies .... What if I #defined 'definition' 'prototype declaration' in my
last post? MPJ

If you "#defined 'definition' 'prototype declaration'", your last post
would have been unreadable. But yes, I know what you meant.

Yes, there are two return statements, one inside the definition of
main(), and one inside the definition of clean_string(). No problem
there.

A function definition includes the stuff in curly braces; that can't
be nested inside another function definition.

The original post has a function *declaration* inside the definition
of main(). This is legal, but not generally a good idea.

You need to have (or, in C90, should have) a visible declaration in
order to call the function from within main(). One way to do this
would be to put the entire definition of clean_string() before the
definition of main(), but there are good reasons to want to put main()
at the top -- and this method doesn't work if you have mutually
recursive functions.

The usual method is to put declarations (prototypes) for functions
other than main near the top of the source file, outside any function
definition. (You don't usually need a separate prototype for main
because main isn't usually called explicitly.) If the functions
aren't to be called from any other source file, it's best to declare
them static. If they are, the declarations should be in a separate
header file.

Merrill & Michele · Dec 1, 2004

Keith Thompson said:
Merrill & Michele said:

"Mike Wahler"
"Merrill & Michele"
"fatted"
I've written a function (clean_string) to remove characters from a
string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

--
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char * clean_string(const char *, const
[single point to make, rest of post snipped]

I get routinely chastised for defining functions inside of main().

OP's code did not contain any function definitions inside of main
or any other function.

Click to expand...

Let me go through my premises. There are two return statements. This
implies .... What if I #defined 'definition' 'prototype declaration' in my
last post? MPJ

Click to expand...

If you "#defined 'definition' 'prototype declaration'", your last post
would have been unreadable. But yes, I know what you meant.

Yes, there are two return statements, one inside the definition of
main(), and one inside the definition of clean_string(). No problem
there.

A function definition includes the stuff in curly braces; that can't
be nested inside another function definition.

The original post has a function *declaration* inside the definition
of main(). This is legal, but not generally a good idea.

You need to have (or, in C90, should have) a visible declaration in
order to call the function from within main(). One way to do this
would be to put the entire definition of clean_string() before the
definition of main(), but there are good reasons to want to put main()
at the top -- and this method doesn't work if you have mutually
recursive functions.

The usual method is to put declarations (prototypes) for functions
other than main near the top of the source file, outside any function
definition. (You don't usually need a separate prototype for main
because main isn't usually called explicitly.) If the functions
aren't to be called from any other source file, it's best to declare
them static. If they are, the declarations should be in a separate
header file.

Thanks all for correcting my correction, in particular Mr. Thompson with
generous detail. OP does one other thing that caught my interest. He
returns EXIT_SUCCESS from main(). My stdlib has the following:

/* Definition of the argument values for the exit() function */

#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1

I know that one possible answer to my following question is
READ_THE_STANDARD, but that's like telling me to climb Everest with only a
windbreaker. Q) Is there any virtue to using EXIT_SUCCESS as opposed to 0
? MPJ

Old Wolf · Dec 1, 2004

fatted said:
I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

char * string = "\t this is a string = sausages\n";
char * chop = "\t\n= ";

You should use "char const *" when pointing to string literals,
it helps to avoid undefined behaviour if you accidentally
modify the string.

[snip]

{
int size_end = strlen(inter);
int size_start = strlen(final) - size_end;

strlen() returns a size_t, so one could argue that you should
use that instead of int. size_t can hold the value of the length
of any string, whereas int might only go up to 32767.
I think your function will do horrible things if someone
passes in a string whose length is longer than INT_MAX.

char mod[6];

final = (char *)realloc(final,size_start + size_end + 1);

This cast is worse than useless (you correctly didn't cast
'malloc' earlier on in your code, however).
Also if this realloc fails then you have leaked the memory
that it was originally pointing to.
You should check all your return values of *alloc() functions.

sprintf(mod, "%%.%ds%%s",size_start);

This is a buffer overflow: even if size_start is 0 then
it takes 7 characters: % . 0 s % s \0 .
To be safe you would need to make mod (6 + longest_possible_
size_of_an_int_in_decimal_notation), bearing in mind that
int could be 64-bit or more.

sprintf(final,mod,string,inter+1);

Instead of the two-stage sprintf, I would prefer:
sprintf(final, "%.*s%s", size_start, string, inter+1);

Note that this only works if size_start is an int, so if
you had it as a size_t instead, you would need to do an:
assert(size_start < INT_MAX)
or similar, so that you don't get buffer overflows if someone
tries to bust your code by sending it a huge string.

Note, I haven't verified whether this overflows 'final' or not.
I would use snprintf() if it were available, just in case.

As other posters have noted, it is a lot simpler to do an
in-place operation without any dynamic memory allocation.
If a caller wants to preserve the original string they can
make a copy themself. This fits in with C's philosophy of not
doing unnecessary crap. Here is what I would write (untested code):

void clean_string(char *string, const char *chars)
{
char *src;

for (src = string; *src; ++src)
if (!strchr(chars, *src))
*string++ = *src;

*string = '\0';
}

dandelion · Dec 1, 2004

Thanks all for correcting my correction, in particular Mr. Thompson with
generous detail. OP does one other thing that caught my interest. He
returns EXIT_SUCCESS from main(). My stdlib has the following:

/* Definition of the argument values for the exit() function */

#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1

I know that one possible answer to my following question is
READ_THE_STANDARD, but that's like telling me to climb Everest with only a
windbreaker. Q) Is there any virtue to using EXIT_SUCCESS as opposed to 0
? MPJ

It makes the main() return value portable. On Un*x 0 is used to indicate
successful program termination and any non-zero value is (commonly)
interpreted as success (disclaimer: there are a few programs that do not
obey this rule, IIRC).

On other platforms it may be different and a successfull program termination
may be required to return 1, 0xDEADBEEF or anything else. Using
EXIT_SUCCESS, this problem is taken care of (portably) by the header files.

Lawrence Kirby · Dec 1, 2004

On Tue, 30 Nov 2004 21:01:55 -0600, Merrill & Michele wrote:

....

Thanks all for correcting my correction, in particular Mr. Thompson with
generous detail. OP does one other thing that caught my interest. He
returns EXIT_SUCCESS from main(). My stdlib has the following:

/* Definition of the argument values for the exit() function */

#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1

I know that one possible answer to my following question is
READ_THE_STANDARD, but that's like telling me to climb Everest with only a
windbreaker. Q) Is there any virtue to using EXIT_SUCCESS as opposed to 0
? MPJ

Another implementation could define EXIT_SUCCESS as a different value. In
this case that doesn't matter much because the standard explicitly defines
both 0 and EXIT_SUCCESS as indicating success with no distinction being
made between them. While that means that an implementation is likely to
define EXIT_SUCCESS as 0 there is no reason why more them one value
couldn't indicate success.

However the standard does NOT define that the value 1 indicates failure.
So exiting with 1 rather than EXIT_FAILURE is an error, or at least
non-portable.

IIRC VMS uses odd values to indicate success and even value to indicate
failure. A correct C implementation in such an environment would have to
map an exit value of 0 onto an odd value but could leave other values
unchanged. EXIT_FAILURE would be defined as some even value, and passing 1
to exit() would indicate success. EXIT_SUCCESS could still be defined as 0
because the value would be mapped to an odd value, or it could be defined
directly as an appropriate odd value.

I like seeing EXIT_SUCCESS rather than 0 because it gives some indication
that the programmer thought about what the value should indicate rather
than (perhaps) just throwing a number at it.

Lawrence

Merrill & Michele · Dec 1, 2004

"Lawrence Kirby"

Another implementation could define EXIT_SUCCESS as a different value. In
this case that doesn't matter much because the standard explicitly defines
both 0 and EXIT_SUCCESS as indicating success with no distinction being
made between them. While that means that an implementation is likely to
define EXIT_SUCCESS as 0 there is no reason why more them one value
couldn't indicate success.

However the standard does NOT define that the value 1 indicates failure.
So exiting with 1 rather than EXIT_FAILURE is an error, or at least
non-portable.

IIRC VMS uses odd values to indicate success and even value to indicate
failure. A correct C implementation in such an environment would have to
map an exit value of 0 onto an odd value but could leave other values
unchanged. EXIT_FAILURE would be defined as some even value, and passing 1
to exit() would indicate success. EXIT_SUCCESS could still be defined as 0
because the value would be mapped to an odd value, or it could be defined
directly as an appropriate odd value.

I like seeing EXIT_SUCCESS rather than 0 because it gives some indication
that the programmer thought about what the value should indicate rather
than (perhaps) just throwing a number at it.

Yeah, I was really surprised to see a one--the number identified with
truth--indicating failure. I suppose a macro is a macro though. So return
0 implies return EXIT_SUCCESS and conversely. Return [any other integer]
and return EXIT_FAILURE are implementation-defined. MPJ

Michael Mair · Dec 1, 2004

Merrill & Michele wrote:
[snip: Discussion about exit(EXIT_SUCCESS) vs. exit(0) and exit(FAILURE)

Yeah, I was really surprised to see a one--the number identified with
truth--indicating failure. I suppose a macro is a macro though. So return
0 implies return EXIT_SUCCESS and conversely. Return [any other integer]
and return EXIT_FAILURE are implementation-defined. MPJ

Talking about integer return values:
You will find that in many cases, there is one kind of success
(complete success) but many possible kinds of failures. So, it is
often easier to return 0 on success and do the error handling if
something else came back.
If nothing else, you may want to return __LINE__ to know at least
roughly where it went wrong.

For Boolean tests, of course, you go with the usual false/true
notion.

Cheers
Michael

Al Bowers · Dec 1, 2004

fatted said:
I've written a function (clean_string) to remove characters from a string,
but it looks clunky to me, and I'm sure there's a more 'C' like way of
doing it (still learning), comments and advice welcome...

char * clean_string(const char * string, const char * chop)
{
char * inter;
char * final = malloc(strlen(string)+1);

Function malloc can return NULL which would case the next
statement, using strcpy, to fail. You should chedk the return
value of malloc.

strcpy(final, string);

while((inter = strpbrk(final, chop)) != NULL)
{
int size_end = strlen(inter);
int size_start = strlen(final) - size_end;
char mod[6];

final = (char *)realloc(final,size_start + size_end + 1);

Function realloc can return NULL also. IF it does final would be
assigned the NULL value. A pointer to any previous allocations, if
any, will be lost.

sprintf(mod, "%%.%ds%%s",size_start);
sprintf(final,mod,string,inter+1);

Why don't you just leave the original malloc unchanged. Another
altermative would be to realloc in chucks or blocks. Example below
allocates in blocks of 16.

string = final;
}
return(final);
}

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *clean_string(const char *, const char *);

int main(void)
{
char *string = "\t this is a string = sausages\n";
char *chop = "\t\n= ";

char *result = clean_string(string, chop);
if(result)
printf("[after] \"%s\"\n",result);
free(result);
return EXIT_SUCCESS;
}

char *clean_string(const char * string, const char * chop)
{
char *tmp, *final;
size_t count;
const size_t CHUNK = 16;

for(final = NULL, count = 0;*string != '\0' ; string++)
{
if(count%CHUNK == 0)
{
tmp = realloc(final,count+CHUNK+1);
if(tmp == NULL)
{
free(final);
return NULL;
}
final = tmp;
}
if(strchr(chop,*string)) continue;
final[count++] = *string;
}
final[count] = '\0';
return final;
}

Randy Howard · Dec 1, 2004

/* Allocates memory and returns a string that is obtained by removing
from string "string" all characters found in string "chop".

Returns NUKK if unable to allocate memory.
*/

Which points out the importance of being sure that your comments
are correct when present.

Flash Gordon · Dec 1, 2004

It makes the main() return value portable.

Incorrect. The C standard defines that returning 0 from main or using
exit(0) causes the implementation to return an implementation defined
value indicating success to the environment.

On Un*x 0 is used to
indicate successful program termination and any non-zero value is
(commonly) interpreted as success (disclaimer: there are a few
programs that do not obey this rule, IIRC).

Such programs are IMHO brocken.

On other platforms it may be different and a successfull program
termination may be required to return 1, 0xDEADBEEF or anything else.

In which case the implementation has to handle that. Just like it has to
handle 0 correctly when it is used in a pointer context even if the null
pointer is not all bits zero.

Using EXIT_SUCCESS, this problem is taken care of (portably) by the
header files.

See above. 0 indicates success whatever the OS requires. Failure, on the
other hand, can only poitably be indicated using EXIT_FAILURE.

Keith Thompson · Dec 2, 2004

Michael Mair said:
Talking about integer return values:
You will find that in many cases, there is one kind of success
(complete success) but many possible kinds of failures. So, it is
often easier to return 0 on success and do the error handling if
something else came back.
If nothing else, you may want to return __LINE__ to know at least
roughly where it went wrong.

Returning __LINE__ from the main program is non-portable. On some
systems, only the low-order 8 bits are used, so a return from line 256
would indicate success. More generally, returning any value other
than 0, EXIT_SUCCESS, or EXIT_FAILURE from main() is non-portable.

As for using __LINE__ to show where the error occurred, that's not a
bad idea, but it's useful only if the person seeing the error message
has access to (the current version of) the source. It's a debugging
tool for developers, not a way to produce error messages that are
going to be meaningful to an end user. (Note that the assert() macro
does this.)

Fibonacci	0	May 13, 2023
C pipe	1	Dec 9, 2021
If(strcmp(str, "") == 0) - What does this line of code mean?	0	Aug 8, 2022
C language. work with text	3	Dec 10, 2021
Clunky C cleanup code	0	Dec 2, 2004
Please help with C programming to save GPS reception data in Raspberry Pi.	0	Dec 8, 2022
STRING - Remove small letters from string	1	Jan 20, 2023
HELP:function at c returning (null)	4	Mar 21, 2024

Clunky C cleanup code

fatted

Rouben Rostamian

Merrill & Michele

Mike Wahler

Merrill & Michele

Richard Bos

Merrill & Michele

CBFalconer

Mike Wahler

Keith Thompson

Merrill & Michele

Old Wolf

dandelion

Lawrence Kirby

Merrill & Michele

Michael Mair

Al Bowers

Randy Howard

Flash Gordon

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads