Challenge: tightest code to find-replace a string

D

DFS

* reads an existing file
* writes changes to new file
* counts replacements made by line
* counts total replacements made
* no fancy usage of sed!

I KNOW someone can better my piddly effort below (actually one I found
online and made mods to):
=================================================================================
#include <stdio.h>
#include <string.h>

int findreplace(void)
{
int bufferSize = 0x1000;
int i = 0, k = 0, j = 0;
char buffer[bufferSize];
FILE *inFile = fopen("random_in.txt", "rt");
FILE *outFile = fopen("random_out.txt", "w+");
char *find = "46";
char *replace = "----";

if(inFile == NULL || outFile == NULL)
{
printf("Error opening file(s)");
return 1;
}

printf("Replace '%s' with '%s':\n", find, replace);

while(fgets(buffer, bufferSize, inFile) != NULL)
{
char *stop = NULL;
char *start = buffer;
k = 0;

while(1)
{
stop = strstr(start, find);

if(stop == NULL)
{
fwrite(start, 1, strlen(start), outFile);
break;
} else {
fwrite(start, 1, stop - start, outFile);
fwrite(replace, 1, strlen(replace), outFile);
start = stop + strlen(find);
k++;
}
}

i++;
j += k;
printf("Line %d: %d replacements made\n", i, k);
}
printf("%d replacements made.\n", j);

fclose(inFile);
fclose(outFile);

return 0;
}


int main(void) {
findreplace();
return 0;
}

=================================================================================

input (random_in.txt)

14513111664214260256543011122553234523520226455552
41602561064325541006060354620223361346535061545034
63164621623130051346620535103421535300201464252314
30013144611120401561305220534605456101542562311260
30501506124251042546364005110661421500320026101445
35355334213621124600100142264440253516210400362562
65140560414014522562466550406113020500531011441421
60543325410345553336424511333322104440166124450061
44310321435636412163052026304311532342515351020026
10536502643531635353214012163164121056142415600245

output (random_out.txt)

14513111664214260256543011122553234523520226455552
4160256106432554100606035----202233613----535061545034
6316----216231300513----620535103421535300201----4252314
3001314----1112040156130522053----05456101542562311260
305015061242510425----364005110661421500320026101445
3535533421362112----00100142264440253516210400362562
65140560414014522562----6550406113020500531011441421
60543325410345553336424511333322104440166124450061
44310321435636412163052026304311532342515351020026
10536502643531635353214012163164121056142415600245

=================================================================================


[dfs@home files]$ ./find_replace
Replace '46' with '----':
Line 1: 0 replacements made
Line 2: 2 replacements made
Line 3: 3 replacements made
Line 4: 2 replacements made
Line 5: 1 replacements made
Line 6: 1 replacements made
Line 7: 1 replacements made
Line 8: 0 replacements made
Line 9: 0 replacements made
Line 10: 0 replacements made
10 replacements made.

=================================================================================
 
S

Stefan Ram

DFS said:
I KNOW someone can better my piddly effort below (actually one I found
online and made mods to):

What you wrote does not replace strings that contain
line breaks or occur at 0x1000 boundaries.
 
I

Ike Naar

while(1)
{
stop = strstr(start, find);

if(stop == NULL)
{
fwrite(start, 1, strlen(start), outFile);
break;
} else {
fwrite(start, 1, stop - start, outFile);
fwrite(replace, 1, strlen(replace), outFile);
start = stop + strlen(find);
k++;
}
}

This could be simplified to

while (stop = strstr(start, find), stop != NULL)
{
fwrite(start, 1, stop - start, outFile);
fputs(replace, outFile);
start = stop + strlen(find);
k++;
}
fputs(start, outFile);
 
N

Noob

Ike said:
while (stop = strstr(start, find), stop != NULL)

This doesn't "feel" very idiomatic.

Perhaps

while ((stop = strstr(start, find)) != NULL)

or even

while (stop = strstr(start, find))

The second one raises warnings with most compilers.

while ((stop = strstr(start, find)))

may shut them up.
 
D

DFS

What you wrote does not replace strings that contain
line breaks or occur at 0x1000 boundaries.

OK.

But the challenge isn't to say what it can't do. It's to show a tighter
piece of code that does it as well or better.

Looking forward to your entry!
 
J

Jorgen Grahn

OK.

But the challenge isn't to say what it can't do. It's to show a tighter
piece of code that does it as well or better.

But to do that you need to understand what the program is supposed to
accomplish.

And by the way, I don't understand what "tight" means. I'd personally
optimize for memory and I/O use.

/Jorgen
 
M

Mark Storkamp

OK.

But the challenge isn't to say what it can't do. It's to show a tighter
piece of code that does it as well or better.

Looking forward to your entry!

But it does disqualify your entry as it doesn't accomplish the stated
goal. Looking forward to your fix!
 
B

Ben Bacarisse

DFS said:
* reads an existing file
* writes changes to new file
* counts replacements made by line
* counts total replacements made
* no fancy usage of sed!

It reports rather than counts these matches. I would never write a
function with this spec. because it destroys its usefulness in other
contexts. A function should do one thing well.

I'd write a string match/replace function that returns the number of
matches. If I needed the counts reported by line, I'd write a wrapper
that adds those.

int replace_string(const char *match, const char *repl, int stopper,
FILE *fi, FILE *fo)
{
int nmatches = 0, c;
const char *mp = match;
while ((c = fgetc(fi)) != EOF && c != stopper)
if (c == *mp) {
if (!*++mp) {
++nmatches;
fputs(repl, fo);
}
}
else {
mp = match;
fputc(c, fo);
}
return nmatches;
}

Called with stopper == EOF it processes a whole file. Note how removing
the line buffer actually simplifies the code, whilst also removing an
unnecessary restriction. It's not uncommon for this to happen (there
was a recent thread about this).

Called with stopper == '\n' it processes a line and so this wrapper
prints the report:

void replace_string_report(const char *match, const char *repl,
FILE *fi, FILE *fo)
{
int total_matches = 0, lineno = 0;
while (!feof(fi)) {
int nm = replace_string(match, repl, '\n', fi, fo);
printf("\nLine %d: %d replacements\n", ++lineno, nm);
total_matches += nm;
}
printf("%d replacements\n", total_matches);
}

Here's the driver for testing.

int main(int argc, char **argv)
{
if (argc > 2) {
FILE *fin = argc > 3 ? fopen(argv[3], "r") : stdin;
FILE *fout = argc > 4 ? fopen(argv[4], "w") : stdout;
if (fin && fout)
replace_string_report(argv[1], argv[2], fin, fout);
}
}

Functions that mix tasks that can be logically separated are best
avoided. Functions with hard-wired file names and strings are, well,
let's just say, sub-optimal. Students used to say "but it's because I'm
just testing" but a simple driver like the one above makes testing
much easier than having the files and strings hard wired.

<snip>
 
B

BartC

int main(int argc, char **argv)
{
if (argc > 2) {
FILE *fin = argc > 3 ? fopen(argv[3], "r") : stdin;
FILE *fout = argc > 4 ? fopen(argv[4], "w") : stdout;
if (fin && fout)
replace_string_report(argv[1], argv[2], fin, fout);
}
}

Functions that mix tasks that can be logically separated are best
avoided. Functions with hard-wired file names and strings are, well,
let's just say, sub-optimal. Students used to say "but it's because I'm
just testing" but a simple driver like the one above makes testing
much easier than having the files and strings hard wired.

The OP's findreplace() function where everything was hard-coded inside it,
rather than being passed as arguments did grate a little (that would also be
the first thing I'd change).

But I wouldn't bother with command line parameters for testing until it's
finished. Far easier to just write:

int main(void) {
replace_string_report("46","----", "random_in.txt", "random_out.txt");
}

(Although you'd have to decide whether file names or handles are going to be
passed. If this is the only find&replace operation on the file, then file
names are probably more appropriate, although it will need more
error-checking inside the function.)
 
M

Malcolm McLean

int replace_string(const char *match, const char *repl, int stopper,
FILE *fi, FILE *fo)

{
int nmatches = 0, c;
const char *mp = match;

while ((c = fgetc(fi)) != EOF && c != stopper)
if (c == *mp) {
if (!*++mp) {
++nmatches;
fputs(repl, fo);
}
}
else {

/* bug here? fwrite(match, mp-match, 1, of); */
mp = match;
fputc(c, fo);

}

return nmatches;

}
I think there's a bug in this. Fix untested.
 
J

Jorgen Grahn

.
The OP's findreplace() function where everything was hard-coded inside it,
rather than being passed as arguments did grate a little (that would also be
the first thing I'd change).

But I wouldn't bother with command line parameters for testing until it's
finished. Far easier to just write:

int main(void) {
replace_string_report("46","----", "random_in.txt", "random_out.txt");
}

(Although you'd have to decide whether file names or handles are going to be
passed. If this is the only find&replace operation on the file, then file
names are probably more appropriate, although it will need more
error-checking inside the function.)

The easiest and most useful is to default to stdin and stdout, just
like sed(1) does. The second most useful is to emulate Perl's <>
operator (stdin, or a sequence of named files, including "-" which
means stdin).

/Jorgen
 
S

Stefan Ram

DFS said:
* reads an existing file
* writes changes to new file
* counts replacements made by line
* counts total replacements made
* no fancy usage of sed!

I think this is not a sufficient specification of requirements.

For example, it mentions »changes« in line two, but before line
two, it was not said that anything should be changed at all. So
it is not clear what »changes« refers to.

And »to count« something is not behavior that is visible from the
outside.

BTW: When given the task

»replace all the occurences of "abcabc" by "defdef" in
"012abcabcabc789" would
"012defdefdef789" be a correct result? the only correct result?«
 
J

Johannes Bauer

Well, since it apparently doesn't actually have to work, here's my
entry:

int findreplace(void)
{
return 0;
}

Actually, this fulfills the requirement pretty much in all cases. All
cases in which the search string has no occurences, that is.
I suspect it will be difficult to beat.

Here's my take at it:

int findreplace(int searchstring) {
if (searchstring < 2) {
return 0;
} else if (searchstring == 2) {
return 1;
} else {
for (int i = 2; i < searchstring; i++) {
if ((searchstring % i) == 0) {
return 0;
}
}
return 1;
}
}

Cheers,
Johannes

--
Zumindest nicht öffentlich!
Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <[email protected]>
 
B

Ben Bacarisse

BartC said:
But I wouldn't bother with command line parameters for testing until
it's finished. Far easier to just write:

int main(void) {
replace_string_report("46","----", "random_in.txt", "random_out.txt");
}

For a few lines of code you get much greater flexibility in testing.
Maybe your environment does not make command-line programs easy to run?
(Although you'd have to decide whether file names or handles are going
to be passed. If this is the only find&replace operation on the file,
then file names are probably more appropriate, although it will need
more error-checking inside the function.)

Why would you ever use file names? It's inherently a stream operation,
so limiting it to named files just makes it clunky, in my view.
 
B

BartC

Ben Bacarisse said:
Why would you ever use file names? It's inherently a stream operation,
so limiting it to named files just makes it clunky, in my view.

I don't understand streams. I like things to have a beginning and an end,
and a whole file is a well-understood chunk of data to work on, if it's not
possible to just work on strings (which would be my approach; then it would
be independent from files *and* streams).

Imagine if you were creating some string functions where strings didn't have
a well-defined end and could conceivably have an unlimited length...
 
B

Ben Bacarisse

BartC said:
I don't understand streams. I like things to have a beginning and an end,
and a whole file is a well-understood chunk of data to work on, if it's not
possible to just work on strings (which would be my approach; then it would
be independent from files *and* streams).

A stream can have an end. And not all named files do. I don't think
this is useful distinction.

Anyway, if you don't like streams, I see no reason to make you like
them. I like my way and I imagine you are happy with yours.
Imagine if you were creating some string functions where strings didn't have
a well-defined end and could conceivably have an unlimited length...

That does not sound like what I meant by "this is a stream operation".
It certainly does not apply in the case being discussed.
 
C

Chad

Ack, my browser refuses to include the quoted text in my reply. Anyhow, having strings hardwired into a function in some cases could possibly change and/or break a function. The one example that comes to mind are functions that add text to some kind of graphic. If the string name was hardwired in, the computer could possibly interpret that string as a single point on the plane. That could be bad since a piece of text can sometimes span across a line.
 
S

Stefan Ram

Chad said:
my browser refuses to include the quoted text in my reply.

Typically, a newsreader is used for Usenet access.
Anyhow, having strings hardwired into a function in some
cases could possibly change and/or break a function.

How is the »hello, world« program written, then,
without a »string hardwired« into the main function?
If the string name was hardwired in, the computer could
possibly interpret that string as a single point on the
plane.

drawText( canvas, "hello, world" )

risks that the computer can interpret »hello, world«
as »a single point on the plane«?
 
C

Chad

From my limited experience, the problem comes from if you view the functionas performing some kind of action on a string. From this vantage point, the function would move the string along some line as it executes. Once the function is done, the string would stop at some point. Now if you would let s represent some string, the same thing would happen. However, s would be the entire length of the traversal.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,709
Latest member
AustinMudi

Latest Threads

Top