Seebs said:
This may have gotten buried given that it started in a Nilges thread,
but it's actually pretty interesting.
There was some discussion of algorithms to perform the task:
Inputs: target string, replacement string, original string
Output: a newly-allocated string containing the original
string with each copy of the target string replaced with the
replacement string.
for (count = 0, t = strstr(target, in); t && *t; t = strstr(t, in)) {
++count;
t += inlen;
}
Here's a hunk of another one:
These hunks are essentially performing the same part of the core loop;
find the next occurrence of the target string in the original string.
The second one was written by Edward Nilges ("spinoza1111"). He offers
in its defense the assertion that it's the "best" algorithm. (Nevermind
that it's got a bug; the bug could be fixed easily.)
The first is better, not because it's less buggy (that could be fixed), but
because it's simpler to understand. It may, or may not, be less efficient.
However, efficiency will vary widely with input characteristics; for some
cases, the arguable inefficiencies may be completely irrelevant, while in
others, they'd be significant.
I don't think you can dismiss the bugs so easily. I would argue that
it is not a coincidence that the more complex code has more bugs. I
can't write code that looks likespinoza1111'scode and get it right.
I *have* to write simpler code, broken into simple functions, or I have
no chance of getting it to be correct.
For example, the latest improvement introduced another bug[1] and I
don't think that is simple carelessness. Without a re-write I suspect
almost any change is as likely to introduce a bug as fix one (I tried
to see where the error was, but I gave up). If I was paid to fix it,
I'd simply start again and it would end a up as I have already posted,
though to mirror the behaviour I'd now have to add some argument
checking.
<snip>
[1] Apparently when the target string is not present in the source
string: e.g. replace("a", "x", "b").
It took me about an hour to fix this because of some blind alleys. See
the Change Record for 13 Feb for the explanation of the bug, a failure
to anticipate that there could be a situation when the main scan
pointer (ptrIndex1) and the main target pointer (ptrIndex2) both point
to Nul but the strings do not match.
In addition, I discovered that there are cases when I initialize the
secondary scan pointer ptrIndex3 to one past the Nul that terminates
the master string, and this has also been fixed.
The purpose of my style and approach is to reduce the bug rate
exponentially going forward. I don't post code while saying "it has a
bug, but it's easy to fix". This was Peter Seebach's initial mistake:
he posted a crappy solution to replacing %s that used a character scan
that fails when a percent occurs in the input, admitting the bug
without bothering to fix it.
This is why I have retained so many tests, and why I wrote a TESTER
macro, and why I formatted the code more carefully than most little
code monkeys care to format their code.
This is the arrogant programmer who expects others to clean up his
messes for him (and lies about people to get his way, I might add).
Having said all this, I am not completely happy with my solution. This
is because I have no assurance that there might be other cases where
it fails, and this in turn is because it's just code, not theory. In
terms of CS theory, it's doing two things at once: finding strings,
and replacing them. As I have said, it arbitrarily goes left to right
and doesn't handle overlapping hits in the way some applications might
require replace("banana", "ana", "xxx") to yield bxxxxx.
[In rereading Cormen's Algorithms the other night on string matching,
which treats the problem separate from match/replace for solid
theoretical reasons, I noticed that one of his students brought this
to his attention.]
Programmers who backstab and lie, who mock others who fix their bugs
for having the bugs, while grinning about their own, and who present
code with bugs that they won't fix, create buggy solutions out of
dishonesty. They helped to cause the credit crisis, because in many
cases, securitized mortgages and their derivative contracts were never
examined for loops such that one derivative referred to itself through
a chain. It increasingly appears that bugs in software lie at the
bottom of the Toyota recall.
Dijkstra is right. Programming IS too hard for Republicans and other
liars (Peter Seebach is on record as a Bush supporter, at least in
2000).
Here is the latest version, followed by its console output.
// ***************************************************************
// * *
// * replace() demo *
// * *
// * Demonstrates how to replace non-NUL-defined strings in C *
// * using a simple function, and bypassing string.h. *
// * *
// * C H A N G E R E C O R D --------------------------------- *
// * DATE PROGRAMMER DESCRIPTION OF CHANGE *
// * -------- ---------- --------------------------------- *
// * 02 07 10 Nilges Version 1.0 *
// * *
// * 02 07 10 Nilges Bug: partial matches not handled *
// * correctly: need to iterate search *
// * for match. *
// * *
// * 02 07 10 Nilges 1. Santosh suggested tests *
// * 2. Heathfield put the boot in re *
// * including malloc *
// * *
// * 02 07 10 Nilges 1. Remove string.h use and code *
// * strlen by hand *
// * 2. Add comment block and comments*
// * inline. *
// * 3. free() storage *
// * 4. Use macro for testing *
// * 5. Bug: calculation of *
// * lngNewLength used incorrect *
// * index (intIndex3 instead of 2)*
// * which caused a memory leak. *
// * *
// * 02 07 10 Nilges 1. Bug: Ike Naar test failed. *
// * At end of scan loop, main *
// * string pointer was not *
// * correctly updated from index3 *
// * *
// * 02 07 10 Nilges Added new Santosh test *
// * *
// * 02 07 10 Nilges Added new Ike Naar test *
// * *
// * 02 08 10 Nilges 1. Added some new comments *
// * 2. Make "check for a complete *
// * match "structured" by means *
// * of a tested assignment *
// * 3. Get rid of "replace at end" *
// * evilness: the only time this *
// * flag is meaningful is in the *
// * LAST segment. *
// * 4. Return replace count *
// * 5. TESTER macro assertion added *
// * *
// * 02 10 10 Nilges 1. Bug fix: in a partial match, *
// * the main scan index is set to *
// * one past the end, which misses*
// * full matches that start *
// * between the first character of*
// * the partial match and its end.*
// * *
// * No longer updating the main *
// * scan index (ptrIndex1) to the *
// * point of partial match *
// * failure: setting it one past *
// * the first character of the *
// * partial match. *
// * *
// * 2. Line up expected & actual *
// * results per excellent *
// * suggestion (who made this?) *
// * *
// * 3. After a partial match, update *
// * the main handle search index *
// * (ptrIndex1) to the first *
// * occurrence of the handle *
// * character after the start of *
// * the partial match, or to the *
// * index of the unmatched char. *
// * *
// * 021310 Nilges Bug: failed to handle a one-char *
// * replace (of a by x in b) since *
// * we set ptrIndex2 to point to NUL *
// * which made it appear that there *
// * was a match. When the main index *
// * to the master string (ptrIndex1) *
// * goes off the end of a cliff, we *
// * needed to break out of the search *
// * loop. *
// * *
// * This also entailed setting the *
// * target index ptrIndex2 to 0 at the*
// * beginning of each search so that *
// * it is not erroneously used to *
// * indicate that there's an insert *
// * needed at the end. *
// * *
// * I have also added additional tests*
// * to verify that the code works *
// * when the string and target end *
// * at the same time. *
// * ----------------------------------------------------------- *
// * *
// * "In the near future we shall have to live with the *
// * superstition that programming is 'so easy that even a *
// * Republican can do it!'" *
// * *
// * - E. W. Dijkstra *
// * *
// * *
// ***************************************************************
#include <stdio.h>
#include <stdlib.h>
// ***** Segmentation *****
struct TYPsegmentstruct
{ char * strSegment;
long lngSegmentLength;
struct TYPsegmentstruct * ptrNext; };
// ---------------------------------------------------------------
// Calculate string length
//
//
long strLength(char *strInstring)
{
char *ptrInstring;
for (ptrInstring = strInstring; *ptrInstring; ptrInstring++);
return ptrInstring - strInstring;
}
// ---------------------------------------------------------------
// Replace target by replacement string in master string
//
//
// Caution: the string returned by this function should be freed.
//
//
char * replace(char * strMaster,
char * strTarget,
char * strReplacement,
long * ptrReplacements)
{
char * ptrIndex0;
char * ptrIndex1;
char * ptrIndex2;
char * ptrIndex3;
char * ptrIndex4;
char * strNew;
char * strNewStart;
long lngNewLength;
long lngCount;
long lngReplacementLength;
struct TYPsegmentstruct * ptrSegmentStructStarts;
struct TYPsegmentstruct * ptrSegmentStruct;
struct TYPsegmentstruct * ptrSegmentStructPrev;
lngReplacementLength = strLength(strReplacement);
if (!*strTarget)
{
printf("Error in calling replace(): target can't be null");
abort();
}
ptrIndex1 = strMaster;
ptrSegmentStructPrev = 0;
lngNewLength = 0;
*ptrReplacements = 0;
while(*ptrIndex1)
{
ptrIndex0 = ptrIndex1;
ptrIndex2 = 0;
while (-1)
{
// --- Check for (one character) handle
for(;
*ptrIndex1 && *ptrIndex1 != *strTarget;
ptrIndex1++);
if (!*ptrIndex1) break;
// --- Check for complete match while remembering the
// --- last position of the handle
ptrIndex4 = 0;
for(ptrIndex2 = strTarget + 1,
ptrIndex3 = ptrIndex1 + 1;
*ptrIndex3
&&
*ptrIndex2
&&
*ptrIndex3 == *ptrIndex2;
ptrIndex3++, ptrIndex2++)
{
if (*ptrIndex3 == *strTarget
&&
ptrIndex4 == 0) ptrIndex4 = ptrIndex3;
}
// End test: check complete match, update main ptr past
// partial match while checking for end of loop
if ((!*ptrIndex2 ? ((*ptrReplacements)++, -1) : 0)
||
(!*ptrIndex3 ? (ptrIndex1 = ptrIndex3, -1) : 0))
break;
// Update the main search pointer
ptrIndex1 = (ptrIndex4 == 0 ? ptrIndex3 : ptrIndex4);
}
// --- Create new segment
if (!(ptrSegmentStruct =
malloc(sizeof(struct TYPsegmentstruct))))
abort();
ptrSegmentStruct->strSegment = ptrIndex0;
ptrSegmentStruct->lngSegmentLength =
ptrIndex1 - ptrIndex0;
ptrSegmentStruct->ptrNext = 0;
if (ptrSegmentStructPrev != 0)
ptrSegmentStructPrev->ptrNext = ptrSegmentStruct;
else
ptrSegmentStructStarts = ptrSegmentStruct;
ptrSegmentStructPrev = ptrSegmentStruct;
// --- Update mallocation length
lngNewLength += ptrSegmentStruct->lngSegmentLength +
(ptrIndex2 && !*ptrIndex2
?
lngReplacementLength
:
0);
// --- Get past end of target string & iterate
if (*ptrIndex1) ptrIndex1 = ptrIndex3;
}
// --- Allocate just enough storage for the new string
if (!(strNewStart = malloc(lngNewLength + 1))) abort();
// --- Build the new string whilst freeing the list
strNew = strNewStart;
ptrSegmentStruct = ptrSegmentStructStarts;
while (ptrSegmentStruct)
{
for (ptrIndex1 = ptrSegmentStruct->strSegment,
lngCount = 0;
lngCount < ptrSegmentStruct->lngSegmentLength;
ptrIndex1++, lngCount++, strNew++)
*strNew = *ptrIndex1;
if (ptrSegmentStruct->ptrNext
||
ptrIndex2 != 0 && !*ptrIndex2)
for (ptrIndex1 = strReplacement;
*ptrIndex1;
ptrIndex1++, ++strNew)
*strNew = *ptrIndex1;
ptrSegmentStructPrev = ptrSegmentStruct;
ptrSegmentStruct = ptrSegmentStruct->ptrNext;
free(ptrSegmentStructPrev);
}
*strNew = '\0';
return strNewStart;
}
// ---------------------------------------------------------------
// Statement-format test macro
//
//
#define TESTER(resultPtr, master, target, replacement, expected,
expectedReplacements, replacements) \
{ \
printf("Replace \"%s\" by \"%s\" in \"%s\"\n", \
(target), (replacement), (master)); \
printf("Expect \"%s\":\n \"%s\"\n", \
(expected), \
resultPtr = replace((master), \
(target), \
(replacement), \
&(replacements))); \
printf("Replacements expected: %d: replacements: %d\n", \
(expectedReplacements), \
(replacements)); \
if (!(strLength(resultPtr) \
== \
strLength(master) \
+ \
(strLength(replacement)-strLength(target)) \
* \
replacements)) \
printf("Assertion failed\n"); \
printf("\n\n"); \
free(resultPtr); \
}
// ---------------------------------------------------------------
// Main procedure
//
//
int main()
{
char *ptrResult;
long lngReplacements;
printf("\nReplace\n\n\n");
TESTER(ptrResult,
"1111123bbb1111123bbb11123bb11111231111112111111123",
"111123",
"ono",
"1onobbb1onobbb11123bb1ono1111112111ono",
4,
lngReplacements)
TESTER(ptrResult,
"bbb1111123bbbbb",
"111123",
"ono",
"bbb1onobbbbb",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid error",
"stupid error",
"miracle",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid error",
"stupid",
"miracle",
"a miracle error",
1,
lngReplacements)
TESTER(ptrResult,
"the stupid error",
"the stupid error",
"a miracle",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"the miracle",
"the",
"a",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a miraclsnirpKamunkle",
"snirpKamunkle",
"e",
"a miracle",
1,
lngReplacements)
TESTER(ptrResult,
"a miraclesnirpKamunkle",
"a miracle",
"",
"snirpKamunkle",
1,
lngReplacements)
TESTER(ptrResult,
" a miraclesnirpKamunkle",
"a miracle",
"",
" snirpKamunkle",
1,
lngReplacements)
TESTER(ptrResult,
" a miraclesnirpKamunklea miraclea miracle",
"a miracle",
"",
" snirpKamunkle",
3,
lngReplacements)
TESTER(ptrResult,
"a miracle a miraclesnirpKamunkle a Miraclea miraclea
miracle",
"a miracle",
"",
" snirpKamunkle a Miracle",
4,
lngReplacements)
TESTER(ptrResult,
"a stupid errord",
"stupid error",
"miracle",
"a miracled",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid errod",
"stupid error",
"miracle",
"a stupid errod",
0,
lngReplacements)
TESTER(ptrResult,
"a sstupid error",
"stupid error",
"miracle",
"a smiracle",
1,
lngReplacements)
TESTER(ptrResult,
"a stupid errorstupid error",
"stupid error",
"miracle",
"a miraclemiracle",
2,
lngReplacements)
TESTER(ptrResult,
"a stupid error stupiderror",
"stupid error",
"miracle",
"a miracle stupiderror",
1,
lngReplacements)
TESTER(ptrResult,
"bbbbbbbbbb",
"b",
"a",
"aaaaaaaaaa",
10,
lngReplacements)
TESTER(ptrResult,
"In the halls of R'yleh great %s lies dreaming",
"%s",
"Cthulu",
"In the halls of R'yleh great Cthulu lies dreaming",
1,
lngReplacements)
TESTER(ptrResult,
"%s%s%s%s%s%s",
"%s",
"Cthulu",
"CthuluCthuluCthuluCthuluCthuluCthulu",
6,
lngReplacements)
TESTER(ptrResult,
"banana",
"ana",
"oat",
"boatna",
1,
lngReplacements)
TESTER(ptrResult,
" a stupid errorstupid errorHeystupid errors",
"stupid error",
"+",
" a ++Hey+s",
3,
lngReplacements)
TESTER(ptrResult,
"foo barfoo barf",
"foo bar",
"bas",
"basbasf",
2,
lngReplacements)
TESTER(ptrResult,
"abab",
"ba",
"ba",
"abab",
1,
lngReplacements)
TESTER(ptrResult,
"abab",
"bab",
"boop",
"aboop",
1,
lngReplacements)
TESTER(ptrResult,
"banana",
"ana",
"ono",
"bonona",
1,
lngReplacements)
TESTER(ptrResult,
"a",
"x",
"b",
"a",
0,
lngReplacements)
TESTER(ptrResult,
"x",
"x",
"b",
"b",
1,
lngReplacements)
TESTER(ptrResult,
"egregious",
"egregious",
"egregious",
"egregious",
1,
lngReplacements)
TESTER(ptrResult,
"egregious",
"egregious",
"x",
"x",
1,
lngReplacements)
printf("\n\nTesting complete: check output carefully: \"Assertion
failed\" should not occur!\n\n");
return 0;
}
Replace
Replace "111123" by "ono" in
"1111123bbb1111123bbb11123bb11111231111112111111123"
Expect "1onobbb1onobbb11123bb1ono1111112111ono":
"1onobbb1onobbb11123bb1ono1111112111ono"
Replacements expected: 4: replacements: 4
Replace "111123" by "ono" in "bbb1111123bbbbb"
Expect "bbb1onobbbbb":
"bbb1onobbbbb"
Replacements expected: 1: replacements: 1
Replace "stupid error" by "miracle" in "a stupid error"
Expect "a miracle":
"a miracle"
Replacements expected: 1: replacements: 1
Replace "stupid" by "miracle" in "a stupid error"
Expect "a miracle error":
"a miracle error"
Replacements expected: 1: replacements: 1
Replace "the stupid error" by "a miracle" in "the stupid error"
Expect "a miracle":
"a miracle"
Replacements expected: 1: replacements: 1
Replace "the" by "a" in "the miracle"
Expect "a miracle":
"a miracle"
Replacements expected: 1: replacements: 1
Replace "snirpKamunkle" by "e" in "a miraclsnirpKamunkle"
Expect "a miracle":
"a miracle"
Replacements expected: 1: replacements: 1
Replace "a miracle" by "" in "a miraclesnirpKamunkle"
Expect "snirpKamunkle":
"snirpKamunkle"
Replacements expected: 1: replacements: 1
Replace "a miracle" by "" in " a miraclesnirpKamunkle"
Expect " snirpKamunkle":
" snirpKamunkle"
Replacements expected: 1: replacements: 1
Replace "a miracle" by "" in " a miraclesnirpKamunklea miraclea
miracle"
Expect " snirpKamunkle":
" snirpKamunkle"
Replacements expected: 3: replacements: 3
Replace "a miracle" by "" in "a miracle a miraclesnirpKamunkle a
Miraclea miraclea miracle"
Expect " snirpKamunkle a Miracle":
" snirpKamunkle a Miracle"
Replacements expected: 4: replacements: 4
Replace "stupid error" by "miracle" in "a stupid errord"
Expect "a miracled":
"a miracled"
Replacements expected: 1: replacements: 1
Replace "stupid error" by "miracle" in "a stupid errod"
Expect "a stupid errod":
"a stupid errod"
Replacements expected: 0: replacements: 0
Replace "stupid error" by "miracle" in "a sstupid error"
Expect "a smiracle":
"a smiracle"
Replacements expected: 1: replacements: 1
Replace "stupid error" by "miracle" in "a stupid errorstupid error"
Expect "a miraclemiracle":
"a miraclemiracle"
Replacements expected: 2: replacements: 2
Replace "stupid error" by "miracle" in "a stupid error stupiderror"
Expect "a miracle stupiderror":
"a miracle stupiderror"
Replacements expected: 1: replacements: 1
Replace "b" by "a" in "bbbbbbbbbb"
Expect "aaaaaaaaaa":
"aaaaaaaaaa"
Replacements expected: 10: replacements: 10
Replace "%s" by "Cthulu" in "In the halls of R'yleh great %s lies
dreaming"
Expect "In the halls of R'yleh great Cthulu lies dreaming":
"In the halls of R'yleh great Cthulu lies dreaming"
Replacements expected: 1: replacements: 1
Replace "%s" by "Cthulu" in "%s%s%s%s%s%s"
Expect "CthuluCthuluCthuluCthuluCthuluCthulu":
"CthuluCthuluCthuluCthuluCthuluCthulu"
Replacements expected: 6: replacements: 6
Replace "ana" by "oat" in "banana"
Expect "boatna":
"boatna"
Replacements expected: 1: replacements: 1
Replace "stupid error" by "+" in " a stupid errorstupid errorHeystupid
errors"
Expect " a ++Hey+s":
" a ++Hey+s"
Replacements expected: 3: replacements: 3
Replace "foo bar" by "bas" in "foo barfoo barf"
Expect "basbasf":
"basbasf"
Replacements expected: 2: replacements: 2
Replace "ba" by "ba" in "abab"
Expect "abab":
"abab"
Replacements expected: 1: replacements: 1
Replace "bab" by "boop" in "abab"
Expect "aboop":
"aboop"
Replacements expected: 1: replacements: 1
Replace "ana" by "ono" in "banana"
Expect "bonona":
"bonona"
Replacements expected: 1: replacements: 1
Replace "x" by "b" in "a"
Expect "a":
"a"
Replacements expected: 0: replacements: 0
Replace "x" by "b" in "x"
Expect "b":
"b"
Replacements expected: 1: replacements: 1
Replace "egregious" by "egregious" in "egregious"
Expect "egregious":
"egregious"
Replacements expected: 1: replacements: 1
Replace "egregious" by "x" in "egregious"
Expect "x":
"x"
Replacements expected: 1: replacements: 1
Testing complete: check output carefully: "Assertion failed" should
not occur!