Search and replace c-style strings in c++

Mike Semko · Apr 25, 2012

I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.

#include <iostream>
#include <cstring>

using namespace std;

int main()
{

char* replaceSubstring(char*,char*,char*);

char *string1, *string2, *string3;
string1 = "the dog jumped over the fence";
string2 = "the";
string3 = "that";
cout << replaceSubstring(string1,string2,string3);

}

char* replaceSubstring(char *original, char *from, char *to)
{
int origlen = strlen(original);
int i = 0;
int count = 0;
char *ptr;
while (i<origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
break;
else
{
i = ptr - original + 1;
count++;
}
}
int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];
newstring[0] = '\0';
i = 0;
while (i < origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
{
strcat(newstring,original+i);
break;
}
else
{
strncat(newstring, original+i,ptr-(original+i));
strcat(newstring, to);
i = i + ptr - (original + i) + strlen(from);
}
}
strcat(newstring,"\0");
return newstring;
}

Would anyone have any suggestions on how to make this code clearer and/
or more efficient ? Any comments are welcome. Please do not suggest to
use class string instead. That is not an option. The function must
work with c-strings.

Ian Collins · Apr 25, 2012

I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.

#include<iostream>
#include<cstring>

using namespace std;

int main()
{

char* replaceSubstring(char*,char*,char*);

char *string1, *string2, *string3;
string1 = "the dog jumped over the fence";
string2 = "the";
string3 = "that";

These should be const char*, not char*.

cout<< replaceSubstring(string1,string2,string3);

}

char* replaceSubstring(char *original, char *from, char *to)
{

to and from should be const.

Would anyone have any suggestions on how to make this code clearer and/
or more efficient ? Any comments are welcome. Please do not suggest to
use class string instead. That is not an option. The function must
work with c-strings.

Use std::string internally and copy the result to a C style string.

Mike Semko · Apr 25, 2012

Use std::string internally and copy the result to a C style string.

Yes I thought about that option, but i am trying to practice using c-
strings specifically. Assuming I can't use std::string is there any
way to make this existing code more concise. Maybe I am overlooking
something. The way I have to figure out the new length and dynamically
create a string seems ugly at best.

Ian Collins · Apr 25, 2012

On 04/25/12 11:38 AM, Mike Semko wrote:

Please don't snip attributions, it's rude.

I wrote:

Yes I thought about that option, but i am trying to practice using c-
strings specifically.

Well unless you are learning C, you should take advantage of the
improved tools C++ offers you. Your code demonstrates how unpleasant
string manipulation is in C!

Assuming I can't use std::string is there any
way to make this existing code more concise. Maybe I am overlooking
something. The way I have to figure out the new length and dynamically
create a string seems ugly at best.

You have to dynamically create the new string because your result has to
be consistent, the caller has to know whether or not to free the result.
To get the size of the new string, you have to do what you have done.

You could save some time by stopping your replace loop once count
instances have been replaced (especially if count==0!). Extracting the
two while loops into their own functions would make the code's logic
clearer.

Ian Collins · Apr 25, 2012

int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];

Click to expand...

Memory leak.

No, it's not. It is the value returned by the function.

Mike Semko · Apr 25, 2012

On 04/25/12 11:38 AM, Mike Semko wrote:

Please don't snip attributions, it's rude.

I apologize, I wasn't aware that was a problem, as I have not posted
to a newsgroup before.

You have to dynamically create the new string because your result has to
be consistent, the caller has to know whether or not to free the result.
To get the size of the new string, you have to do what you have done.

You could save some time by stopping your replace loop once count
instances have been replaced (especially if count==0!). Extractingthe
two while loops into their own functions would make the code's logic
clearer.

Some people have suggested to me to just create a string of a large
size (like 2048) to use as a new string.
Do you think that is something I should have done ?

Ian Collins · Apr 25, 2012

Some people have suggested to me to just create a string of a large
size (like 2048) to use as a new string.
Do you think that is something I should have done ?

Consider what happens if an equally large or larger string is passed in!

Mike Semko · Apr 25, 2012

Consider what happens if an equally large or larger string is passed in!

Yes of course that would cause a buffer overflow.
Another thing I have considered is to calculate the maximum required
size of the string like this:

if ((strlen(to) > strlen(from))
newlength = strlen(original) + strlen(original)/strlen(from) *
(strlen(from) - strlen(to))

But now this looks even uglier and more unreadable than before

What do you think about this ?

Miles Bader · Apr 25, 2012

if the input strings may be utf-8 instead of ISO-8859-1 or ASCII, then
you've more work.

Hmm, why?

Valid UTF-8 sequences are context independent, and can't be
subsequences of other valid UTF-8 sequences, so it seems like a simple
string-replace should work fine with them, without any special
handling.

[In some cases, the result may not make sense at a higher level, e.g.,
if context-dependent unicode characters are used, but it should still
be a valid UTF-8 string.]

-miles

Juha Nieminen · Apr 26, 2012

Scott Lurndal said:
Ian Collins said:

int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];

Memory leak.

Click to expand...

No, it's not. It is the value returned by the function.

Click to expand...

Which never got free'd in the caller (granted which exited directly, but
that was just sample code).

Returning unmanaged allocated memory from a function is a BAD idea and
should always be avoided. (The only exception to this is if it's done by
a private method of a class, which gets called only by other methods of
said class, which take care of managing the returned allocation. Even then
it requires care to avoid mistakes, and in practice it's rare to need to
do this.)

none · Apr 26, 2012

Scott Lurndal said:
Scott Lurndal said:

Ian Collins said:

On 04/25/12 12:00 PM, Scott Lurndal wrote:

int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];

Memory leak.

No, it's not. It is the value returned by the function.

Click to expand...

Which never got free'd in the caller (granted which exited directly, but
that was just sample code).

Click to expand...

Returning unmanaged allocated memory from a function is a BAD idea and
should always be avoided.

Wow, that's a bit strong. Given that the OP code is essentially C,
this is a long accepted idiom in pure C. I totally agree that this is
error prone and that there are better ways but always be avoided is a
bit OTT (even if the OP code is a bit OT and should really be in
comp.language.c is he really doesn't want to use C++).

Juha Nieminen · Apr 26, 2012

Yannick Tremblay said:
Wow, that's a bit strong. Given that the OP code is essentially C,
this is a long accepted idiom in pure C.

That's because C offers no tools to manage such allocations. C++ does
offer such tools and thus isn't bound to such limitations.

Making a function return unmanaged allocated memory is only asking for
trouble. It's way too easy to forget to delete the memory in the calling
code. Also, it's usually not exception-safe.

(In order for this scheme to be exception-safe, the function that's
allocating the memory would have to either make sure no exceptions are
thrown, or have automatic lifetime management of the allocated memory.
Moreover, the calling code would have to immediately assign the returned
pointer to a smart pointer or otherwise manage its lifetime immediately.
Otherwise any exception thrown between allocation and deletion will cause
it to leak. If the code is managing the memory anyways, then why not have
the function return managed memory outright? Errors will be minimized
and the code becomes simpler and safer.)

Tobias MÃ¼ller · Apr 26, 2012

none said:
Juha Nieminen <[email protected]> wrote:

Wow, that's a bit strong. Given that the OP code is essentially C,
this is a long accepted idiom in pure C. I totally agree that this is
error prone and that there are better ways but always be avoided is a
bit OTT (even if the OP code is a bit OT and should really be in
comp.language.c is he really doesn't want to use C++).

I think in C it is much more common to pass a target buffer as parameter
and returning the number of chars written to that buffer, like:

size_t replaceSubstring(char* targetBuffer, size_t targetBufferSize, const
char* original, const char* find, const char* replace);

Optionally returning the requested buffer size if targetBuffer==NULL or
targetBufferSize==0.

Tobi

Miles Bader · Apr 26, 2012

Tobias MÃ¼ller said:
I think in C it is much more common to pass a target buffer as parameter
and returning the number of chars written to that buffer, like:

That's the usual "ye olde" C style, but malloc'd returned buffers seem
increasingly common in modern code...

-miles

Pavel · Apr 27, 2012

Mike said:
I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.

#include<iostream>
#include<cstring>

using namespace std;

int main()
{

char* replaceSubstring(char*,char*,char*);

char *string1, *string2, *string3;
string1 = "the dog jumped over the fence";
string2 = "the";
string3 = "that";
cout<< replaceSubstring(string1,string2,string3);

}

char* replaceSubstring(char *original, char *from, char *to)
{
int origlen = strlen(original);
int i = 0;
int count = 0;
char *ptr;
while (i<origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
break;
else
{
i = ptr - original + 1;
count++;
}
}
int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];
newstring[0] = '\0';
i = 0;
while (i< origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
{
strcat(newstring,original+i);
break;
}
else
{
strncat(newstring, original+i,ptr-(original+i));
strcat(newstring, to);
i = i + ptr - (original + i) + strlen(from);
}
}
strcat(newstring,"\0");
return newstring;
}

Would anyone have any suggestions on how to make this code clearer and/
or more efficient ? Any comments are welcome. Please do not suggest to
use class string instead. That is not an option. The function must
work with c-strings.

I think adding 1 is a bug: consider original="aaaa", from="aa", to="bbb"; you
need to add strlen(from). Below is my attempt to be "elegant" (although, not as
fast as possible; these two rarely go together): Disclaimer: I did not even try
to compile this:

char *
replaceSubstring(const char *original, const char *from, const char *to)
{
assert(original);
assert(from);
assert(to);
assert(*from);
int count = 0;
const int fromLen = strlen(from);
for (char *cur = *original; cur = strstr(cur, from); cur += fromLen)
++count;
const int toLen = srtrlen(to);
char *origRes = new char[strlen(original) + (toLen - fromLen) * count];
for (char *res = origRes; cur = strstr(original, from);
original = cur + fromLen) {
memcpy(res, original, cur - original);
res += (cur - original);
memcpy(res, cur, fromLen);
res += fromLen;
}
strcpy(res, original);
return origRes;
}

HTH,
Pavel

Pavel · Apr 28, 2012

Pavel said:
Mike said:

I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.

#include<iostream>
#include<cstring>

using namespace std;

int main()
{

char* replaceSubstring(char*,char*,char*);

char *string1, *string2, *string3;
string1 = "the dog jumped over the fence";
string2 = "the";
string3 = "that";
cout<< replaceSubstring(string1,string2,string3);

}

char* replaceSubstring(char *original, char *from, char *to)
{
int origlen = strlen(original);
int i = 0;
int count = 0;
char *ptr;
while (i<origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
break;
else
{
i = ptr - original + 1;
count++;
}
}
int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];
newstring[0] = '\0';
i = 0;
while (i< origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
{
strcat(newstring,original+i);
break;
}
else
{
strncat(newstring, original+i,ptr-(original+i));
strcat(newstring, to);
i = i + ptr - (original + i) + strlen(from);
}
}
strcat(newstring,"\0");
return newstring;
}

Would anyone have any suggestions on how to make this code clearer and/
or more efficient ? Any comments are welcome. Please do not suggest to
use class string instead. That is not an option. The function must
work with c-strings.

Click to expand...

I think adding 1 is a bug: consider original="aaaa", from="aa", to="bbb"; you
need to add strlen(from). Below is my attempt to be "elegant" (although, not as
fast as possible; these two rarely go together): Disclaimer: I did not even try
to compile this:

char *
replaceSubstring(const char *original, const char *from, const char *to)
{
assert(original);
assert(from);
assert(to);
assert(*from);
int count = 0;
const int fromLen = strlen(from);
for (char *cur = *original; cur = strstr(cur, from); cur += fromLen)
++count;
const int toLen = srtrlen(to);
char *origRes = new char[strlen(original) + (toLen - fromLen) * count];
for (char *res = origRes; cur = strstr(original, from);
original = cur + fromLen) {
memcpy(res, original, cur - original);
res += (cur - original);
memcpy(res, cur, fromLen);
res += fromLen;
}
strcpy(res, original);
return origRes;
}

HTH,
Pavel

Looking back at this code, I missed +1 at operator new call and a couple of
`const' qualifiers on char pointers. Otherwise, looks valid.

-Pavel

Miquel van Smoorenburg · May 6, 2012

I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.

Less bulky .. not very much, but you could do it more efficiently.
Instead of doing a strstr() on every character, you could do the
strstr() only just as many times as the "from" string occurs.
Also you could make the replace() function have 2 modes, one in
which it only counts the space you need, and one that actually
does the work.

Something like this:

#include <cstdio>
#include <cstring>
#include <iostream>

using namespace std;

int replace(const char *orig, const char *from, const char *to, char *result)
{
int resindex = 0;
int tolen = strlen(to);
int fromlen = strlen(from);
const char *last = orig;
const char *p;
while ((p = strstr(last, from)) != NULL) {
if (result) {
memcpy(result + resindex, last, p - last);
memcpy(result + resindex + (p - last), to, tolen);
}
resindex += p - last + tolen;
p += fromlen;
last = p;
}
if (result)
strcpy(result + resindex, last);
resindex += strlen(last);

return resindex;
}

int main(int argc, char *argv[])
{
const char *string1 = "the dog jumped over the fence";
const char *string2 = "the";
const char *string3 = "that";

int sz = replace(string1, string2, string3, NULL);
char *res = new char[sz + 1];
replace(string1, string2, string3, res);
cout << res << endl;

return 0;
}

Mike.

Old Wolf · May 7, 2012

But now this looks even uglier and more unreadable than before
What do you think about this ?

Manipulating C-style strings is indeed ugly and
hard to read, and error-prone; you are stuck with
that if you insist on working with them. There's
no elegant solution.

Also, as others have pointed out but you haven't
acknowledged, your program leaks memory when
you call 'new' but do not later delete the memory.
You could fix this by having the calling function
store the returned value, and then deleting it after use.

C language. work with text	3	Dec 10, 2021
Lexical Analysis on C++	1	Oct 31, 2023
Boomer trying to learn coding in C and C++	6	Dec 16, 2022
Homework in C - Help Needed	1	Oct 16, 2024
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
C-style Strings to LPCWSTR	9	May 26, 2012
Organization Assignment in C programming	0	Sep 7, 2022
Write your own isascii in c programming	0	Nov 7, 2020

Search and replace c-style strings in c++

Mike Semko

Ian Collins

Mike Semko

Ian Collins

Ian Collins

Mike Semko

Ian Collins

Mike Semko

Miles Bader

Juha Nieminen

none

Juha Nieminen

Tobias MÃ¼ller

Miles Bader

Pavel

Pavel

Miquel van Smoorenburg

Old Wolf

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads