Search and replace c-style strings in c++

M

Mike Semko

I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.


#include <iostream>
#include <cstring>

using namespace std;

int main()
{

char* replaceSubstring(char*,char*,char*);

char *string1, *string2, *string3;
string1 = "the dog jumped over the fence";
string2 = "the";
string3 = "that";
cout << replaceSubstring(string1,string2,string3);

}


char* replaceSubstring(char *original, char *from, char *to)
{
int origlen = strlen(original);
int i = 0;
int count = 0;
char *ptr;
while (i<origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
break;
else
{
i = ptr - original + 1;
count++;
}
}
int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];
newstring[0] = '\0';
i = 0;
while (i < origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
{
strcat(newstring,original+i);
break;
}
else
{
strncat(newstring, original+i,ptr-(original+i));
strcat(newstring, to);
i = i + ptr - (original + i) + strlen(from);
}
}
strcat(newstring,"\0");
return newstring;
}

Would anyone have any suggestions on how to make this code clearer and/
or more efficient ? Any comments are welcome. Please do not suggest to
use class string instead. That is not an option. The function must
work with c-strings.
 
I

Ian Collins

I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.


#include<iostream>
#include<cstring>

using namespace std;

int main()
{

char* replaceSubstring(char*,char*,char*);

char *string1, *string2, *string3;
string1 = "the dog jumped over the fence";
string2 = "the";
string3 = "that";

These should be const char*, not char*.
cout<< replaceSubstring(string1,string2,string3);

}


char* replaceSubstring(char *original, char *from, char *to)
{

to and from should be const.

Would anyone have any suggestions on how to make this code clearer and/
or more efficient ? Any comments are welcome. Please do not suggest to
use class string instead. That is not an option. The function must
work with c-strings.

Use std::string internally and copy the result to a C style string.
 
M

Mike Semko

Use std::string internally and copy the result to a C style string.

Yes I thought about that option, but i am trying to practice using c-
strings specifically. Assuming I can't use std::string is there any
way to make this existing code more concise. Maybe I am overlooking
something. The way I have to figure out the new length and dynamically
create a string seems ugly at best.
 
I

Ian Collins

On 04/25/12 11:38 AM, Mike Semko wrote:

Please don't snip attributions, it's rude.
I wrote:

Yes I thought about that option, but i am trying to practice using c-
strings specifically.

Well unless you are learning C, you should take advantage of the
improved tools C++ offers you. Your code demonstrates how unpleasant
string manipulation is in C!
Assuming I can't use std::string is there any
way to make this existing code more concise. Maybe I am overlooking
something. The way I have to figure out the new length and dynamically
create a string seems ugly at best.

You have to dynamically create the new string because your result has to
be consistent, the caller has to know whether or not to free the result.
To get the size of the new string, you have to do what you have done.

You could save some time by stopping your replace loop once count
instances have been replaced (especially if count==0!). Extracting the
two while loops into their own functions would make the code's logic
clearer.
 
M

Mike Semko

On 04/25/12 11:38 AM, Mike Semko wrote:

Please don't snip attributions, it's rude.

I apologize, I wasn't aware that was a problem, as I have not posted
to a newsgroup before.

You have to dynamically create the new string because your result has to
be consistent, the caller has to know whether or not to free the result.
  To get the size of the new string, you have to do what you have done.

You could save some time by stopping your replace loop once count
instances have been replaced (especially if count==0!).  Extractingthe
two while loops into their own functions would make the code's logic
clearer.

Some people have suggested to me to just create a string of a large
size (like 2048) to use as a new string.
Do you think that is something I should have done ?
 
I

Ian Collins

Some people have suggested to me to just create a string of a large
size (like 2048) to use as a new string.
Do you think that is something I should have done ?

Consider what happens if an equally large or larger string is passed in!
 
M

Mike Semko

Consider what happens if an equally large or larger string is passed in!

Yes of course that would cause a buffer overflow.
Another thing I have considered is to calculate the maximum required
size of the string like this:

if ((strlen(to) > strlen(from))
newlength = strlen(original) + strlen(original)/strlen(from) *
(strlen(from) - strlen(to))

But now this looks even uglier and more unreadable than before :(
What do you think about this ?
 
M

Miles Bader

if the input strings may be utf-8 instead of ISO-8859-1 or ASCII, then
you've more work.

Hmm, why?

Valid UTF-8 sequences are context independent, and can't be
subsequences of other valid UTF-8 sequences, so it seems like a simple
string-replace should work fine with them, without any special
handling.

[In some cases, the result may not make sense at a higher level, e.g.,
if context-dependent unicode characters are used, but it should still
be a valid UTF-8 string.]

-miles
 
J

Juha Nieminen

Scott Lurndal said:
Ian Collins said:
int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];

Memory leak.

No, it's not. It is the value returned by the function.

Which never got free'd in the caller (granted which exited directly, but
that was just sample code).

Returning unmanaged allocated memory from a function is a BAD idea and
should always be avoided. (The only exception to this is if it's done by
a private method of a class, which gets called only by other methods of
said class, which take care of managing the returned allocation. Even then
it requires care to avoid mistakes, and in practice it's rare to need to
do this.)
 
N

none

Scott Lurndal said:
Ian Collins said:
On 04/25/12 12:00 PM, Scott Lurndal wrote:


int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];

Memory leak.

No, it's not. It is the value returned by the function.

Which never got free'd in the caller (granted which exited directly, but
that was just sample code).

Returning unmanaged allocated memory from a function is a BAD idea and
should always be avoided.

Wow, that's a bit strong. Given that the OP code is essentially C,
this is a long accepted idiom in pure C. I totally agree that this is
error prone and that there are better ways but always be avoided is a
bit OTT (even if the OP code is a bit OT and should really be in
comp.language.c is he really doesn't want to use C++).
 
J

Juha Nieminen

Yannick Tremblay said:
Wow, that's a bit strong. Given that the OP code is essentially C,
this is a long accepted idiom in pure C.

That's because C offers no tools to manage such allocations. C++ does
offer such tools and thus isn't bound to such limitations.

Making a function return unmanaged allocated memory is only asking for
trouble. It's way too easy to forget to delete the memory in the calling
code. Also, it's usually not exception-safe.

(In order for this scheme to be exception-safe, the function that's
allocating the memory would have to either make sure no exceptions are
thrown, or have automatic lifetime management of the allocated memory.
Moreover, the calling code would have to immediately assign the returned
pointer to a smart pointer or otherwise manage its lifetime immediately.
Otherwise any exception thrown between allocation and deletion will cause
it to leak. If the code is managing the memory anyways, then why not have
the function return managed memory outright? Errors will be minimized
and the code becomes simpler and safer.)
 
T

Tobias Müller

none said:
Juha Nieminen <[email protected]> wrote:

Wow, that's a bit strong. Given that the OP code is essentially C,
this is a long accepted idiom in pure C. I totally agree that this is
error prone and that there are better ways but always be avoided is a
bit OTT (even if the OP code is a bit OT and should really be in
comp.language.c is he really doesn't want to use C++).

I think in C it is much more common to pass a target buffer as parameter
and returning the number of chars written to that buffer, like:

size_t replaceSubstring(char* targetBuffer, size_t targetBufferSize, const
char* original, const char* find, const char* replace);

Optionally returning the requested buffer size if targetBuffer==NULL or
targetBufferSize==0.

Tobi
 
M

Miles Bader

Tobias Müller said:
I think in C it is much more common to pass a target buffer as parameter
and returning the number of chars written to that buffer, like:

That's the usual "ye olde" C style, but malloc'd returned buffers seem
increasingly common in modern code...

-miles
 
P

Pavel

Mike said:
I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.


#include<iostream>
#include<cstring>

using namespace std;

int main()
{

char* replaceSubstring(char*,char*,char*);

char *string1, *string2, *string3;
string1 = "the dog jumped over the fence";
string2 = "the";
string3 = "that";
cout<< replaceSubstring(string1,string2,string3);

}


char* replaceSubstring(char *original, char *from, char *to)
{
int origlen = strlen(original);
int i = 0;
int count = 0;
char *ptr;
while (i<origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
break;
else
{
i = ptr - original + 1;
count++;
}
}
int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];
newstring[0] = '\0';
i = 0;
while (i< origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
{
strcat(newstring,original+i);
break;
}
else
{
strncat(newstring, original+i,ptr-(original+i));
strcat(newstring, to);
i = i + ptr - (original + i) + strlen(from);
}
}
strcat(newstring,"\0");
return newstring;
}

Would anyone have any suggestions on how to make this code clearer and/
or more efficient ? Any comments are welcome. Please do not suggest to
use class string instead. That is not an option. The function must
work with c-strings.

I think adding 1 is a bug: consider original="aaaa", from="aa", to="bbb"; you
need to add strlen(from). Below is my attempt to be "elegant" (although, not as
fast as possible; these two rarely go together): Disclaimer: I did not even try
to compile this:

char *
replaceSubstring(const char *original, const char *from, const char *to)
{
assert(original);
assert(from);
assert(to);
assert(*from);
int count = 0;
const int fromLen = strlen(from);
for (char *cur = *original; cur = strstr(cur, from); cur += fromLen)
++count;
const int toLen = srtrlen(to);
char *origRes = new char[strlen(original) + (toLen - fromLen) * count];
for (char *res = origRes; cur = strstr(original, from);
original = cur + fromLen) {
memcpy(res, original, cur - original);
res += (cur - original);
memcpy(res, cur, fromLen);
res += fromLen;
}
strcpy(res, original);
return origRes;
}

HTH,
Pavel
 
P

Pavel

Pavel said:
Mike said:
I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.


#include<iostream>
#include<cstring>

using namespace std;

int main()
{

char* replaceSubstring(char*,char*,char*);

char *string1, *string2, *string3;
string1 = "the dog jumped over the fence";
string2 = "the";
string3 = "that";
cout<< replaceSubstring(string1,string2,string3);

}


char* replaceSubstring(char *original, char *from, char *to)
{
int origlen = strlen(original);
int i = 0;
int count = 0;
char *ptr;
while (i<origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
break;
else
{
i = ptr - original + 1;
count++;
}
}
int newsize = origlen + (strlen(to) - strlen(from)) * count;
char *newstring = new char[newsize];
newstring[0] = '\0';
i = 0;
while (i< origlen)
{
ptr = strstr(original+i, from);
if (!ptr)
{
strcat(newstring,original+i);
break;
}
else
{
strncat(newstring, original+i,ptr-(original+i));
strcat(newstring, to);
i = i + ptr - (original + i) + strlen(from);
}
}
strcat(newstring,"\0");
return newstring;
}

Would anyone have any suggestions on how to make this code clearer and/
or more efficient ? Any comments are welcome. Please do not suggest to
use class string instead. That is not an option. The function must
work with c-strings.

I think adding 1 is a bug: consider original="aaaa", from="aa", to="bbb"; you
need to add strlen(from). Below is my attempt to be "elegant" (although, not as
fast as possible; these two rarely go together): Disclaimer: I did not even try
to compile this:

char *
replaceSubstring(const char *original, const char *from, const char *to)
{
assert(original);
assert(from);
assert(to);
assert(*from);
int count = 0;
const int fromLen = strlen(from);
for (char *cur = *original; cur = strstr(cur, from); cur += fromLen)
++count;
const int toLen = srtrlen(to);
char *origRes = new char[strlen(original) + (toLen - fromLen) * count];
for (char *res = origRes; cur = strstr(original, from);
original = cur + fromLen) {
memcpy(res, original, cur - original);
res += (cur - original);
memcpy(res, cur, fromLen);
res += fromLen;
}
strcpy(res, original);
return origRes;
}

HTH,
Pavel
Looking back at this code, I missed +1 at operator new call and a couple of
`const' qualifiers on char pointers. Otherwise, looks valid.

-Pavel
 
M

Miquel van Smoorenburg

I am trying to write a function that takes three c-style strings, and
returns a c-style string. This function searches a c-string for all
occurrences of the sub-string and replaces them with a different
string.
This program works but seems very inelegant. I can't help the feeling
like it could have been done in a less bulky way.

Less bulky .. not very much, but you could do it more efficiently.
Instead of doing a strstr() on every character, you could do the
strstr() only just as many times as the "from" string occurs.
Also you could make the replace() function have 2 modes, one in
which it only counts the space you need, and one that actually
does the work.

Something like this:

#include <cstdio>
#include <cstring>
#include <iostream>

using namespace std;

int replace(const char *orig, const char *from, const char *to, char *result)
{
int resindex = 0;
int tolen = strlen(to);
int fromlen = strlen(from);
const char *last = orig;
const char *p;
while ((p = strstr(last, from)) != NULL) {
if (result) {
memcpy(result + resindex, last, p - last);
memcpy(result + resindex + (p - last), to, tolen);
}
resindex += p - last + tolen;
p += fromlen;
last = p;
}
if (result)
strcpy(result + resindex, last);
resindex += strlen(last);

return resindex;
}

int main(int argc, char *argv[])
{
const char *string1 = "the dog jumped over the fence";
const char *string2 = "the";
const char *string3 = "that";

int sz = replace(string1, string2, string3, NULL);
char *res = new char[sz + 1];
replace(string1, string2, string3, res);
cout << res << endl;

return 0;
}

Mike.
 
O

Old Wolf

But now this looks even uglier and more unreadable than before :(
What do you think about this ?

Manipulating C-style strings is indeed ugly and
hard to read, and error-prone; you are stuck with
that if you insist on working with them. There's
no elegant solution.

Also, as others have pointed out but you haven't
acknowledged, your program leaks memory when
you call 'new' but do not later delete the memory.
You could fix this by having the calling function
store the returned value, and then deleting it after use.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top