substring finding problem!

S

Seebs

Well exactly. My first level implements the basic allocate and destroy,
plus "string.h like" functions to assign, extend, get length, etc. You
have to stop somewhere.

Yeah. Mine stopped a little past the functionality of <string.h> but not
much past it. In practice, it seems to have been enough.

-s
 
S

spinoza1111

It's also a silly challenge.  If I were going to do that, my first pass
would just be to put "my" in front of the str* functions I use, and
define them.

That sounds like (1) unethical behavior and (2) to be expected from
you.

Look, why not just admit it. You had no clue how to solve the
challenged problem, which could easily arise in any number of
circumstances. You're really not contributing much to the discussion.
But why would I do that?  How about we set a new challenge, which is to
do matrix multiplication without using the '*' operator.  Or, better yet,
how about we iterate through an array without using a for loop, and write
our own routines to interpolate numbers into textual strings, so we don't
have to rely on printf()?

Those of us who can (who like Schildt have written compilers) do this
for fun, for the same reason classical pianists play Czerny finger
exercises, and Bach wrote the Art of Fugue. There are some of us who
don't advance our careers by destroying reputations and buying our way
onto standards boards without any academic qualifications.

Furthermore, if I were doing matrix multiplication times a power of
two, I wouldn't use the * operator. Do you even know what operator I
would use?
Without a *reason* to not use the string library, this is nothing more
than a challenge to do something stupid.

Fox and grapes. You couldn't do it.
 
B

blmblm

No, I don't think you have.


You've memorized the mere names of thought crimes ("not invented
here") without learning your trade; as you have told us, you haven't
taken any university computer science classes at all.

This is as we know a mistake.

[ snip several paragraphs of what I believe to be tangential ]
OK, he specified a circular singly-linked list,

Oh? The way I'm reading Seebs's example, the last node in the
list pointed not to the first node -- which would as you say be a
circular linked list -- but *to the data structure representing
the list*. That seems decidedly strange to me, and I'm even a
bit skeptical that one could implement it in a way that a compiler
would accept, but whatever.
and (because by your
own admission you have never taken a single computer science class)
you had never seen this simple data structure, and you wasted an
entire day trying to figure this code out in consequence. For the same
reason you preferred to try destroy the career of Herb Schildt and you
hound me here, this enraged you and you maliciously went after the
original designer in order to ruin his position, because you didn't
know how to code a simple loop (we've seen how incompetent you are at
this task in your strlen that was off by one).

Speculation, in my opinion.

[ snip ]
 
B

blmblm

[ snip ]

An interesting factoid about this discussion is that no American or
British posters, save one, have solved the problem I set (write
replace without using string.H).

Oh, it's string.H we're to avoid? is it okay to use string.h?
Yeah, yeah, nitpick ....

My understanding is that the requirement is there to make life
simpler for people not family with the C standard library,
particularly the functions prototyped in string.h, and for people
who have trouble distinguishing between upper and lower case.
Well, for the record, I'm an American mostly-lurker, and I've
been tinkering with various solutions over the past few days, not
looking carefully at others' solutions because that would seem
to spoil the fun. I've just posted some of them, in a reply to
Seebs's post with subject line "Efficency [sic] and the standard
library".

For what it's worth, all my solutions use recursion -- that seemed
to me to be a natural way to solve this problem. <shrug>

I also happen to be American and I also posted one; however I put
it in a separate thread. See the thread entitled "A substring
replacement implementation". You might want to include it in
your test suite.

What test suite is that .... The one I could be accumulating,
I guess, from other people's posts here.
Yes, I saw your code, though I really only looked at the main
program and the comments -- both of which I rather like better
than mine.

I think I was rather hoping that the others who had posted
solutions might use my code to benchmark, but really, how hard
would it be for *me* to do that .... Maybe I will. "Stay tuned"?

For what it's worth, I pulled the replace() function and supporting
code out of your solution and connected it with my benchmarking
driver. Output below, followed by output of the best-performing
of my solutions. Yours is faster, though not spectacularly so.
I'll probably do similar experiments with others' solutions as
time and interest permit ....

======== OUTPUT with your replace() ========

Richard Harter's version

performing timing tests (4 repeats)

timing (length 4004, changing 2 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.36 0.35 0.35 0.35

timing (length 4020, changing 10 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.36 0.36 0.36 0.36

timing (length 4100, changing 50 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.44 0.45 0.44 0.44

timing (length 4500, changing 50 occurrences of 10 chars to 10 chars) 20000 times
times (seconds): 0.49 0.49 0.48 0.49

16 tests (counting repeats)
0 errors (counting repeats)
total time = 6.59 seconds

======== OUTPUT with my replace() ========

further improved version of replace():
scans input only once, building linked list of matches
avoids recomputing string lengths
uses user-written string functions

performing timing tests (4 repeats)

timing (length 4004, changing 2 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.49 0.47 0.46 0.50

timing (length 4020, changing 10 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.51 0.53 0.52 0.52

timing (length 4100, changing 50 occurrences of 2 chars to 2 chars) 20000 times
times (seconds): 0.72 0.77 0.75 0.74

timing (length 4500, changing 50 occurrences of 10 chars to 10 chars) 20000 times
times (seconds): 0.75 0.73 0.75 0.74

16 tests (counting repeats)
0 errors (counting repeats)
total time = 9.96 seconds
 
M

Malcolm McLean

That sounds like (1) unethical behavior and (2) to be expected from
you.
That's how I would solve it.
I have written "replace" functions in the past, using the standard
string library. The challenge is to replicate the function without
that library (which is reasonable, I'v worked in the past on consoles
without standard libraries), so you just write the string fucntions.
With the exception of sprintf, they are all very easy to write if
perfect efficency isn't a major consideration.

To write repace on top of another string library is more difficult. It
may be that that particular libary is a lot better than the standard
one. The problem is that, as it isn't standard, I'm not familiar with
it. I'm often tearing my hair out over Perl scripts because I want to
delete the middle three characters of a string, or something. Perl has
far more powerful strign handling functions than C, and there will be
an easy way to achieve this. However since I'm not very familiar with
them, I'm leafing through the Perl book, looking for clues.
 
B

blmblm

That sounds like (1) unethical behavior and (2) to be expected from
you.

How is this unethical? In essence what he's doing is implementing
a standard interface, but in a way that can't collide with an
existing implementation.
Look, why not just admit it. You had no clue how to solve the
challenged problem, which could easily arise in any number of
circumstances. You're really not contributing much to the discussion.


Those of us who can (who like Schildt have written compilers) do this
for fun, for the same reason classical pianists play Czerny finger
exercises, and Bach wrote the Art of Fugue. There are some of us who
don't advance our careers by destroying reputations and buying our way
onto standards boards without any academic qualifications.

Furthermore, if I were doing matrix multiplication times a power of
two, I wouldn't use the * operator. Do you even know what operator I
would use?

"Matrix multiplcation times a power of two?" Could you explain what
you mean by that? In my usage "matrix multiplication" is a binary
operation on matrices, so I don't understand how one of the operands
can be a power of two. ?

[ snip ]
 
B

blmblm

Furthermore, if I were doing matrix multiplication times a power of
two, I wouldn't use the * operator. Do you even know what operator I
would use?

"Matrix multiplcation times a power of two?" Could you explain what
you mean by that? In my usage "matrix multiplication" is a binary
operation on matrices, so I don't understand how one of the operands
can be a power of two. ?

[ snip ]

He probably meant multiplication of scalar and matrix, where
scalar is power of two?

Or he means power of square matrices?

What operator?
First thing that came in my mind is i<<1 instead of i*2 ;)
But then again even in assembler, depending on CPU, shifting
can be slower then multiplication ;)
So if you write i*2 or i<<1 doens;t matter because compiler
will probably use what is most appropriate for current hardware...
but then again Im just guessing what he meant by this ;)

In a simple test, gcc compiled both expressions (i<<1 and i*2) to
the same x86 assembler code, using shift. (Interestingly enough, it
did this even at the lowest level of optimization. Well, it was
interesting to me anyway.)

Whether the same thing would be true with other compiles and
for other architectures, who knows.

Then again, worrying about this sort of thing .... Well, there's
the widely-repeated remark about premature optimization, no?
the exact source of which seems to be subject to some debate,
but both of the possible originators mentioned here

http://en.wikipedia.org/wiki/Program_optimization

are people I would hesitate to disagree with.
 
B

blmblm

[ snip ]

[ snip ]
Interesting. I suspect that mine is a bit faster because it
doesn't need to construct linked lists - everything is on the
stack. Also the calling recurse replaces calling malloc. That
said, my version could be time optimized by eliminating
recursion. The plan would be to create a default table holding a
number of entries as an automatic variable, have a pointer to the
table, and switch to a malloc based expansible table when the
original one overflows. Instead of having m recursions the cost
would O(log m) malloc calls. Equally important the actual loop
can be compressed into a tight loop. If you want a hot version
for reference I'll code it.

Well, now I'm rather looking forward to actually reading your
code (and that of others) ....

Don't spend more time on this just for the sake of adding to
my collection, but if you do, well, yeah, I can generate times
so you/we can compare .... (Of course you almost surely can do
your own timings to compare the "hot" version to the original,
but it might be interesting to compare it to timings for others'
code too, and I might be the right person to do that. I just have
to track down the latest versions of things others have posted,
in the multiple threads, etc.)
 
C

Chris M. Thomasson

[...]
Interesting. I suspect that mine is a bit faster because it
doesn't need to construct linked lists - everything is on the
stack. Also the calling recurse replaces calling malloc. That
said, my version could be time optimized by eliminating
recursion. The plan would be to create a default table holding a
number of entries as an automatic variable, have a pointer to the
table, and switch to a malloc based expansible table when the
original one overflows.

That works fairly well:


http://groups.google.com/group/comp.lang.c/msg/2ebbf412549f3300


The algorithm I settled on will only resort to dynamic memory when the
number of matches is greater than 256. Then it will avoid `malloc()' again
until the matches are greater than 512 matches. I suppose an optimization
might be to malloc larger and larger space to hold matches.
 
B

Ben Bacarisse

Richard Harter <[email protected]> wrote:
[snip]
Yes, I saw your code, though I really only looked at the main
program and the comments -- both of which I rather like better
than mine.

I think I was rather hoping that the others who had posted
solutions might use my code to benchmark, but really, how hard
would it be for *me* to do that .... Maybe I will. "Stay tuned"?

For what it's worth, I pulled the replace() function and supporting
code out of your solution and connected it with my benchmarking
driver. Output below, followed by output of the best-performing
of my solutions. Yours is faster, though not spectacularly so.
I'll probably do similar experiments with others' solutions as
time and interest permit ....

======== OUTPUT with your replace() ========

Richard Harter's version [snip]
16 tests (counting repeats)
0 errors (counting repeats)
total time = 6.59 seconds

======== OUTPUT with my replace() ========

further improved version of replace():
scans input only once, building linked list of matches
avoids recomputing string lengths
uses user-written string functions
[snip]

16 tests (counting repeats)
0 errors (counting repeats)
total time = 9.96 seconds

Interesting. I suspect that mine is a bit faster because it
doesn't need to construct linked lists - everything is on the
stack. Also the calling recurse replaces calling malloc. That
said, my version could be time optimized by eliminating
recursion. The plan would be to create a default table holding a
number of entries as an automatic variable, have a pointer to the
table, and switch to a malloc based expansible table when the
original one overflows. Instead of having m recursions the cost
would O(log m) malloc calls. Equally important the actual loop
can be compressed into a tight loop. If you want a hot version
for reference I'll code it.

That's one of my versions, too. Up to now I have not posted any
timings because the relative speeds depend so much on the input used.
For example, with the 4K long test strings used by B L Massingill it
is hard to beat using strstr:

$ ./timer d4004 '[]' 'xx' bb_replace rh_replace fast_replace array_replace
bb_replace(`cat d4004`, "[]", "xx"):
2483200 calls in 3.998s is 1.61µs/call (1.61e-06s/call)
rh_replace(`cat d4004`, "[]", "xx"):
424600 calls in 4.001s is 9.422µs/call (9.422e-06s/call)
fast_replace(`cat d4004`, "[]", "xx"):
1123600 calls in 3.999s is 3.559µs/call (3.559e-06s/call)
array_replace(`cat d4004`, "[]", "xx"):
3519700 calls in 4.000s is 1.136µs/call (1.136e-06s/call)

(d4004 is a file containing 4004 bytes with two '[]' sub-strings.)

bb_replace is the very simple code I posted a while ago. rh_replace
is yours. fast_replace uses a local array to store the match
locations and some hand-written sub-string code to avoid the
re-scanning implied by using strstr (I'll post it below) and
array_replace is the same, but simply uses strstr to find the
sub-strings.

If we use short strings, the position is reversed in that the two that
use strstr (bb_replace and array_replace) pay a price:

$ ./timer "abcdefg[]hijklmnop[]qrstuvwxyz" '[]' 'xx' bb_replace rh_replace fast_replace array_replace
bb_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
10141800 calls in 3.999s is 394.3ns/call (3.943e-07s/call)
rh_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
29636400 calls in 4.000s is 135ns/call (1.35e-07s/call)
fast_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
26954200 calls in 4.000s is 148.4ns/call (1.484e-07s/call)
array_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
16888500 calls in 4.000s is 236.8ns/call (2.368e-07s/call)

The order is changed again if there is no matching sub-string:

$ ./timer "abcdefghijklmnopqrstuvwxyz" '[]' 'xx' bb_replace rh_replace fast_replace array_replace
bb_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
38832000 calls in 3.998s is 103ns/call (1.03e-07s/call)
rh_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
33301000 calls in 4.000s is 120.1ns/call (1.201e-07s/call)
fast_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
43300700 calls in 4.000s is 92.38ns/call (9.238e-08s/call)
array_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
46132400 calls in 4.000s is 86.71ns/call (8.671e-08s/call)

Pretty much any implementation has fast and slow cases. I am not sure
if there is any useful information in all of the data I can generate
(which it why I have not posted any results so far).

Anyway, here is my (probably misnamed) "fast_replace":

#include <string.h>
#define STRLEN strlen
#define STRSTR strstr
#define STRCPY strcpy
#define STRNCMP strncmp
#define MEMCPY memcpy
#define FN(x) x

#define AUTO_SIZE 20

char *FN(fast_replace)(char *src, char *match, char *replacement)
{
size_t mlen = STRLEN(match), matches = 0, asize = AUTO_SIZE;
const char *small_list[AUTO_SIZE], **mp = small_list, **op = 0;

const char *anchor = src;
do {
while (*anchor && *anchor != *match)
++anchor;
if (*anchor) {
if (STRNCMP(anchor + 1, match + 1, mlen - 1) == 0) {
if (matches >= asize) {
const char **np = realloc(op, 2 * asize * sizeof *np);
if (!np) {
fprintf(stderr, "Out of memory\n");
free(op);
return 0;
}
if (!op)
MEMCPY(np, mp, sizeof small_list);
mp = op = np;
asize *= 2;
}
mp[matches++] = anchor;
anchor += mlen;
}
else ++anchor;
}
else break;
} while (*anchor);

size_t rlen = matches ? STRLEN(replacement) : 0;
char *result = malloc(anchor - src - matches*mlen + matches*rlen + 1);
if (result) {
char *dst = result;
const char *s = src;
for (int i = 0; i < matches; ++i) {
MEMCPY(dst, s, mp - s);
MEMCPY(dst += mp - s, replacement, rlen);
dst += rlen;
s = mp + mlen;
}
STRCPY(dst, s);
}
free(op);
return result;
}

The macros are to enable me to build an alternate version that
uses naive home-grown versions of strlen, strstr etc to see
how that impacts on the performance.

Maybe you had something faster in mind. I'll happily add it to my
test set.
 
S

spinoza1111

How is this unethical?  In essence what he's doing is implementing
a standard interface, but in a way that can't collide with an
existing implementation.

I said last week that my replace function integrates two different
things.
"Matrix multiplcation times a power of two?"  Could you explain what
you mean by that?  In my usage "matrix multiplication" is a binary
operation on matrices, so I don't understand how one of the operands
can be a power of two.  ?

It contains a power of two in each element in the scenario where you
wouldn't use * you would use shift. Look, I was just having a little
fun with Seebie.
 
S

spinoza1111

[email protected] (Richard Harter) said:
On 20 Feb 2010 16:36:47 GMT, (e-mail address removed)
[snip]
Yes, I saw your code, though I really only looked at the main
program and the comments -- both of which I rather like better
than mine.
I think I was rather hoping that the others who had posted
solutions might use my code to benchmark, but really, how hard
would it be for *me* to do that ....  Maybe I will.  "Stay tuned"?
For what it's worth, I pulled the replace() function and supporting
code out of your solution and connected it with my benchmarking
driver.  Output below, followed by output of the best-performing
of my solutions.  Yours is faster, though not spectacularly so.
I'll probably do similar experiments with others' solutions as
time and interest permit ....
======== OUTPUT with your replace() ========
Richard Harter's version [snip]
16 tests (counting repeats)
0 errors (counting repeats)
total time = 6.59 seconds
======== OUTPUT with my replace() ========
further improved version of replace():
scans input only once, building linked list of matches
avoids recomputing string lengths
uses user-written string functions
16 tests (counting repeats)
0 errors (counting repeats)
total time = 9.96 seconds
Interesting.  I suspect that mine is a bit faster because it
doesn't need to construct linked lists - everything is on the
stack.  Also the calling recurse replaces calling malloc.  That
said, my version could be time optimized by eliminating
recursion.  The plan would be to create a default table holding a
number of entries as an automatic variable, have a pointer to the
table, and switch to a malloc based expansible table when the
original one overflows.  Instead of having m recursions the cost
would O(log m) malloc calls.  Equally important the actual loop
can be compressed into a tight loop.  If you want a hot version
for reference I'll code it.

That's one of my versions, too.  Up to now I have not posted any
timings because the relative speeds depend so much on the input used.
For example, with the 4K long test strings used by B L Massingill it
is hard to beat using strstr:

  $ ./timer d4004 '[]' 'xx' bb_replace rh_replace fast_replace array_replace
  bb_replace(`cat d4004`, "[]", "xx"):
         2483200 calls in 3.998s is 1.61µs/call (1.61e-06s/call)
  rh_replace(`cat d4004`, "[]", "xx"):
          424600 calls in 4.001s is 9.422µs/call (9.422e-06s/call)
  fast_replace(`cat d4004`, "[]", "xx"):
         1123600 calls in 3.999s is 3.559µs/call (3.559e-06s/call)
  array_replace(`cat d4004`, "[]", "xx"):
         3519700 calls in 4.000s is 1.136µs/call (1.136e-06s/call)

(d4004 is a file containing 4004 bytes with two '[]' sub-strings.)

bb_replace is the very simple code I posted a while ago.  rh_replace
is yours.  fast_replace uses a local array to store the match
locations and some hand-written sub-string code to avoid the
re-scanning implied by using strstr (I'll post it below) and
array_replace is the same, but simply uses strstr to find the
sub-strings.

If we use short strings, the position is reversed in that the two that
use strstr (bb_replace and array_replace) pay a price:

  $ ./timer "abcdefg[]hijklmnop[]qrstuvwxyz" '[]' 'xx' bb_replace rh_replace fast_replace array_replace
  bb_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
        10141800 calls in 3.999s is 394.3ns/call (3.943e-07s/call)
  rh_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
        29636400 calls in 4.000s is 135ns/call (1.35e-07s/call)
  fast_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
        26954200 calls in 4.000s is 148.4ns/call (1.484e-07s/call)
  array_replace("abcdefg[]hijklmnop[]qrstuvwxyz", "[]", "xx"):
        16888500 calls in 4.000s is 236.8ns/call (2.368e-07s/call)

The order is changed again if there is no matching sub-string:

  $ ./timer "abcdefghijklmnopqrstuvwxyz" '[]' 'xx' bb_replace rh_replace fast_replace array_replace
  bb_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
        38832000 calls in 3.998s is 103ns/call (1.03e-07s/call)
  rh_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
        33301000 calls in 4.000s is 120.1ns/call (1.201e-07s/call)
  fast_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
        43300700 calls in 4.000s is 92.38ns/call (9.238e-08s/call)
  array_replace("abcdefghijklmnopqrstuvwxyz", "[]", "xx"):
        46132400 calls in 4.000s is 86.71ns/call (8.671e-08s/call)

Pretty much any implementation has fast and slow cases.  I am not sure
if there is any useful information in all of the data I can generate
(which it why I have not posted any results so far).

Anyway, here is my (probably misnamed) "fast_replace":

#include <string.h>
#define STRLEN  strlen
#define STRSTR  strstr
#define STRCPY  strcpy
#define STRNCMP strncmp
#define MEMCPY  memcpy
#define FN(x)   x

#define AUTO_SIZE 20

char *FN(fast_replace)(char *src, char *match, char *replacement)
{
     size_t mlen = STRLEN(match), matches = 0, asize = AUTO_SIZE;
     const char *small_list[AUTO_SIZE], **mp = small_list, **op = 0;

     const char *anchor = src;
     do {
          while (*anchor && *anchor != *match)
               ++anchor;
          if (*anchor) {
               if (STRNCMP(anchor + 1, match + 1, mlen - 1) == 0) {
                    if (matches >= asize) {
                         const char **np = realloc(op, 2 * asize * sizeof *np);
                         if (!np) {
                              fprintf(stderr, "Out of memory\n");
                              free(op);
                              return 0;
                         }
                         if (!op)
                              MEMCPY(np, mp, sizeof small_list);
                         mp = op = np;
                         asize *= 2;
                     }
                    mp[matches++] = anchor;
                    anchor += mlen;
               }
               else ++anchor;

Here, you forgot to steal my idea, Ben. It was to manually code your
strncmp so as to "go back" not to the character after the first
character of the partial match, but to the first character that
matches the first character of the string we're looking for, or to the
character at which the partial match fails.

By using the library function, you do too much backup, in my opinion.
          }
          else break;
     } while (*anchor);

     size_t rlen = matches ? STRLEN(replacement) : 0;
     char *result = malloc(anchor - src - matches*mlen + matches*rlen + 1);
     if (result) {
          char *dst = result;
          const char *s = src;
          for (int i = 0; i < matches; ++i) {
               MEMCPY(dst, s, mp - s);
               MEMCPY(dst += mp - s, replacement, rlen);
               dst += rlen;
               s = mp + mlen;
          }
          STRCPY(dst, s);
     }
     free(op);
     return result;

}

The macros are to enable me to build an alternate version that
uses naive home-grown versions of strlen, strstr etc to see
how that impacts on the performance.


But you're still thinking inside the black box. Thinking in terms of
strncmp causes you to restart the search too far the left of the place
where it should optimally restart (the character "handle" of the
target inside the partial match or after the end of the partial
match). For long targets with a lot of overlapping material on the
left such as commonly occur in text data, this could be a serious
performance bottleneck.

The problem is that C library functions for strings are limited Turing
machines with no "magic" ability to see beyond the current "tape
square" or remember useful things. This reflects the reality of modern
computer architectures which in the default case can't be assumed
anymore to have "magic" instructions like those available on the VAX
or IBM 370, which were microprogrammed back in the day when memory was
cheap, CPU power expensive, men were men, women were women, and sheep
were nervous.

You need, I think, to benchmark your code in situations where targets
are long and overlap heavily in the input string. This might be a
library or book search application where you are looking for people's
full names where they have a small set of common first names such as
Bob, or Joe, or Billy Bob, but divergent last names. In this scenario,
your constant returns to one past the first character match will waste
several cycles on names such as Billy Bob Bogus when you are looking
for Billy Bob Badass.

In that specific scenario, you will go through the loop for each
character of the string "Billy Bob Badass" up to a. What you want is
to get past the end of that string ASAP while being sure that the
start of Billy Bob Bogus doesn't occur, given the character by
character constraint of the C machine.

My way, which is not ideal, is to go back to the B of Bob when I hit
the a of Badass and that a fails to match the o in Bogus. I then go to
the second b (B in Bob), find the o, and go past the o to the second B
in Bob, find the space, and then try B in Bogus, only to see
immediately that it is not followed by i. I then return to the zero
state of looking for the lead B. Whereas you seem to go all the way
back to i needlessly.

More sophisticated search algorithms have state in the form of tables,
therefore it's extremely unlikely they are used in actual C libraries.
Instead, those libraries probably use your memoryless procedure but
with both code and an underlying architecture "tuned" to relatively
mindless zipping left to right through strings.

In the scenario I was thinking of, the programmer, or his manager, or
his team decided to NOT use string.H in this let us say critical
operation to see what performance they could get for long targets with
a tendency to start with the same strings. So far, nobody to my
knowledge except myself, Willem and io_x have met the challenge, of
thinking outside the box.
 
B

blmblm

[ snip ]
What test suite is that .... The one I could be accumulating,
I guess, from other people's posts here.


For what it's worth, I pulled the replace() function and supporting
code out of your solution and connected it with my benchmarking
driver. Output below, followed by output of the best-performing
of my solutions. Yours is faster, though not spectacularly so.
I'll probably do similar experiments with others' solutions as
time and interest permit ....

I've benchmarked all the code I could find that's been posted here.
I believe another poster (Chris M. Thomasson) has posted links to
code in pastebin, but I didn't track that down and include it.
One thing that occurred to me is that previously I had been
compiling with "gcc -O" (optimization), and "gcc -O3" (maximum
optimization) might be better. For reasons I won't summarize
right now, I also tried "-O2". Results below (total time only).

Presented without analysis for now ....


harter-1 is the recursive solution, from
Message-ID: <[email protected]>

harter-2 is the non-recursive solution, from
Message-ID: <[email protected]>

willem is from
Message-ID: <[email protected]>

nilges is from
Message-ID: <[email protected]>

io_x is from
msg-id Message-ID: <[email protected]>

blmblm-$V-$L [1] is from
Message-ID: <[email protected]>

[1] V = version, L = string library (C or user-written)


harter-1 (O2) 5.67 seconds
harter-1 (O3) 8.48 seconds
harter-2 (O2) 5.94 seconds
harter-2 (O3) 6.51 seconds
io_x (O2) 18.05 seconds [1]
io_x (O3) **** seconds (segfault) [2]
nilges (O2) 7.72 seconds
nilges (O3) 7.69 seconds
willem (O2) 7.18 seconds
willem (O3) 7.23 seconds

[2] These results may be misleading -- I wasn't sure how to
generate an executable from the mix of C and assembly and more or
less tried things until I got a clean compile/link, with:

nasm -felf -o replacer.o replacer.s
gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
(n=2 or n=3)

blmblm-1-C (O2) 10.78 seconds
blmblm-1-user (O2) 35.97 seconds
blmblm-2-C (O2) 9.60 seconds
blmblm-2-user (O2) 35.10 seconds
blmblm-3-C (O2) 7.86 seconds
blmblm-3-user (O2) 8.58 seconds

blmblm-1-C (O3) 10.36 seconds
blmblm-1-user (O3) 37.58 seconds
blmblm-2-C (O3) 9.45 seconds
blmblm-2-user (O3) 32.96 seconds
blmblm-3-C (O3) 7.49 seconds
blmblm-3-user (O3) 7.72 seconds
 
B

blmblm

On 20 Feb 2010 16:36:47 GMT, (e-mail address removed)



I've posted a second version that replaces recursion by
iteration. It should run slightly faster; like most such
conversions the original is simpler and cleaner.

I hadn't set out to create a performance string replace routine.
To get significant improvement the guts of the "find_next"
function must be replaced with a better algorithm. There are
four paths I can see:

(1) Hack and stumble about
(2) Use a simplified version of a standard algorithm
(3) Bite the bullet and implement a standard algorithm
(4) Invent something new and wonderful

There are tradeoffs. High performance algorithms tend to have
initialization costs, hence the simpler algorithms win when the
input is simple.

In any event if you want to pick up replace2 have at it.

Already done :) -- I noticed your second version in the process of
trying to collect other people's code, and I just posted a summary
of my attempt to benchmark the various versions.
 
C

Chris M. Thomasson

[ snip ]
What test suite is that .... The one I could be accumulating,
I guess, from other people's posts here.


For what it's worth, I pulled the replace() function and supporting
code out of your solution and connected it with my benchmarking
driver. Output below, followed by output of the best-performing
of my solutions. Yours is faster, though not spectacularly so.
I'll probably do similar experiments with others' solutions as
time and interest permit ....

I've benchmarked all the code I could find that's been posted here.
I believe another poster (Chris M. Thomasson) has posted links to
code in pastebin, but I didn't track that down and include it.

It's here:

http://clc.pastebin.com/f62504e4c
 
B

blmblm

[ snip ]
I've benchmarked all the code I could find that's been posted here.
I believe another poster (Chris M. Thomasson) has posted links to
code in pastebin, but I didn't track that down and include it.
One thing that occurred to me is that previously I had been
compiling with "gcc -O" (optimization), and "gcc -O3" (maximum
optimization) might be better. For reasons I won't summarize
right now, I also tried "-O2". Results below (total time only).

Presented without analysis for now ....

I missed (at least) one. Ben Bacarisse posted one version of his
code, in

Message-ID: <0.ffa3b609be9d5a7e3382.20100221014206GMT.877hq7e55d.fsf@bsb.me.uk>

which gives results as shown below

He says in that message that which implementation is fastest
depends on input. So my benchmarking here probably doesn't
prove much. It does demonstrate, though, that compiler
optimization level may also affect which is fastest (results
for Harter's code) ....

harter-1 is the recursive solution, from
Message-ID: <[email protected]>

harter-2 is the non-recursive solution, from
Message-ID: <[email protected]>

willem is from
Message-ID: <[email protected]>

nilges is from
Message-ID: <[email protected]>

io_x is from
msg-id Message-ID: <[email protected]>

blmblm-$V-$L [1] is from
Message-ID: <[email protected]>

[1] V = version, L = string library (C or user-written)

bacarisse (O2) 4.47 seconds
bacarisse (O3) 4.48 seconds

harter-1 (O2) 5.67 seconds
harter-1 (O3) 8.48 seconds
harter-2 (O2) 5.94 seconds
harter-2 (O3) 6.51 seconds
io_x (O2) 18.05 seconds [1]
io_x (O3) **** seconds (segfault) [2]
nilges (O2) 7.72 seconds
nilges (O3) 7.69 seconds
willem (O2) 7.18 seconds
willem (O3) 7.23 seconds

[2] These results may be misleading -- I wasn't sure how to
generate an executable from the mix of C and assembly and more or
less tried things until I got a clean compile/link, with:

nasm -felf -o replacer.o replacer.s
gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
(n=2 or n=3)

blmblm-1-C (O2) 10.78 seconds
blmblm-1-user (O2) 35.97 seconds
blmblm-2-C (O2) 9.60 seconds
blmblm-2-user (O2) 35.10 seconds
blmblm-3-C (O2) 7.86 seconds
blmblm-3-user (O2) 8.58 seconds

blmblm-1-C (O3) 10.36 seconds
blmblm-1-user (O3) 37.58 seconds
blmblm-2-C (O3) 9.45 seconds
blmblm-2-user (O3) 32.96 seconds
blmblm-3-C (O3) 7.49 seconds
blmblm-3-user (O3) 7.72 seconds
 
B

blmblm

[ snip ]
I've benchmarked all the code I could find that's been posted here.
I believe another poster (Chris M. Thomasson) has posted links to
code in pastebin, but I didn't track that down and include it.
One thing that occurred to me is that previously I had been
compiling with "gcc -O" (optimization), and "gcc -O3" (maximum
optimization) might be better. For reasons I won't summarize
right now, I also tried "-O2". Results below (total time only).

Presented without analysis for now ....

I missed (at least) one. Ben Bacarisse posted one version of his
code, in

Message-ID: <0.ffa3b609be9d5a7e3382.20100221014206GMT.877hq7e55d.fsf@bsb.me.uk>

which gives results as shown below

Adding results for Chris Thomasson's code, from pastebin as referenced
in

Message-ID: said:
He says in that message that which implementation is fastest
depends on input. So my benchmarking here probably doesn't
prove much. It does demonstrate, though, that compiler
optimization level may also affect which is fastest (results
for Harter's code) ....

harter-1 is the recursive solution, from
Message-ID: <[email protected]>

harter-2 is the non-recursive solution, from
Message-ID: <[email protected]>

willem is from
Message-ID: <[email protected]>

nilges is from
Message-ID: <[email protected]>

io_x is from
msg-id Message-ID: <[email protected]>

blmblm-$V-$L [1] is from
Message-ID: <[email protected]>

[1] V = version, L = string library (C or user-written)

thomasson (O2) 4.08 seconds
thomasson (O3) 4.08 seconds
bacarisse (O2) 4.47 seconds
bacarisse (O3) 4.48 seconds

harter-1 (O2) 5.67 seconds
harter-1 (O3) 8.48 seconds
harter-2 (O2) 5.94 seconds
harter-2 (O3) 6.51 seconds
io_x (O2) 18.05 seconds [1]
io_x (O3) **** seconds (segfault) [2]
nilges (O2) 7.72 seconds
nilges (O3) 7.69 seconds
willem (O2) 7.18 seconds
willem (O3) 7.23 seconds

[2] These results may be misleading -- I wasn't sure how to
generate an executable from the mix of C and assembly and more or
less tried things until I got a clean compile/link, with:

nasm -felf -o replacer.o replacer.s
gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
(n=2 or n=3)

blmblm-1-C (O2) 10.78 seconds
blmblm-1-user (O2) 35.97 seconds
blmblm-2-C (O2) 9.60 seconds
blmblm-2-user (O2) 35.10 seconds
blmblm-3-C (O2) 7.86 seconds
blmblm-3-user (O2) 8.58 seconds

blmblm-1-C (O3) 10.36 seconds
blmblm-1-user (O3) 37.58 seconds
blmblm-2-C (O3) 9.45 seconds
blmblm-2-user (O3) 32.96 seconds
blmblm-3-C (O3) 7.49 seconds
blmblm-3-user (O3) 7.72 seconds
 
B

Ben Bacarisse

I've benchmarked all the code I could find that's been posted here.

For reference, I pulled out your V3 and ran the same sets of tests I
just posted again Richard Harter's second version. Here are the times
I got:

A:
blm_replace(`cat d4004`, "[]", "xx"):
3190500 calls in 3.998s is 1.253µs/call (1.253e-06s/call)
rh_replace2(`cat d4004`, "[]", "xx"):
513700 calls in 4.000s is 7.787µs/call (7.787e-06s/call)
fast_replace(`cat d4004`, "[]", "xx"):
1076300 calls in 4.000s is 3.717µs/call (3.717e-06s/call)

B:
blm_replace(`cat d4004`, "{}", "xx"):
3752900 calls in 3.999s is 1.066µs/call (1.066e-06s/call)
rh_replace2(`cat d4004`, "{}", "xx"):
519000 calls in 4.000s is 7.708µs/call (7.708e-06s/call)
fast_replace(`cat d4004`, "{}", "xx"):
1044200 calls in 4.000s is 3.83µs/call (3.83e-06s/call)

C:
blm_replace(`cat wap.txt`, "and", "xx"):
200 calls in 5.044s is 2.522e+04µs/call (0.02522s/call)
rh_replace2(`cat wap.txt`, "and", "xx"):
400 calls in 4.125s is 1.031e+04µs/call (0.01031s/call)
fast_replace(`cat wap.txt`, "and", "xx"):
500 calls in 4.254s is 8508µs/call (0.008508s/call)

D:
blm_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
6555300 calls in 3.999s is 610.1ns/call (6.101e-07s/call)
rh_replace2("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
25659800 calls in 4.000s is 155.9ns/call (1.559e-07s/call)
fast_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
18203800 calls in 4.000s is 219.7ns/call (2.197e-07s/call)

E:
blm_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
34562700 calls in 3.998s is 115.7ns/call (1.157e-07s/call)
rh_replace2("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
35579200 calls in 4.000s is 112.4ns/call (1.124e-07s/call)
fast_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
44387600 calls in 4.000s is 90.11ns/call (9.011e-08s/call)

A: 4K text with two short replacements.
B: 4K text with no replacements.
C: Large text with lots of replacements.
D: Short text with several short replacements.
E: Short text with no replacements.

If you want to include my current favourite in you tests it is:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Bigger is better here: */
#define AUTO_SIZE 20

char *fast_replace(char *src, char *match, char *replacement)
{
size_t mlen = strlen(match), matches = 0, asize = AUTO_SIZE;
const char *small_list[AUTO_SIZE], **mp = small_list, **op = 0;

const char *anchor = src;
do {
while (*anchor && *anchor != *match)
++anchor;
if (*anchor) {
if (strncmp(anchor + 1, match + 1, mlen - 1) == 0) {
if (matches >= asize) {
const char **np = realloc(op, 2 * asize * sizeof *np);
if (!np) {
fprintf(stderr, "Out of memory\n");
free(op);
return 0;
}
if (!op)
memcpy(np, mp, sizeof small_list);
mp = op = np;
asize *= 2;
}
mp[matches++] = anchor;
anchor += mlen;
}
else ++anchor;
}
else break;
} while (*anchor);

size_t rlen = matches ? strlen(replacement) : 0;
char *result = malloc(anchor - src - matches*mlen + matches*rlen + 1);
if (result) {
char *dst = result;
const char *s = src;
for (int i = 0; i < matches; ++i) {
memcpy(dst, s, mp - s);
memcpy(dst += mp - s, replacement, rlen);
dst += rlen;
s = mp + mlen;
}
strcpy(dst, s);
}
free(op);
return result;
}

<snip>
 
B

Ben Bacarisse

(e-mail address removed) <[email protected]> wrote:

Thanks. Useful to compare out results.
Adding results for Chris Thomasson's code, from pastebin as referenced
in

Message-ID: <[email protected]>

I've done the same and I get very different results for Chris's code
and for yours. Yours is the fastest on long strings, presumably
because you use strstr. Chris's is next fastest on long strings.

Yes, though it is reasonably predictable. For example, since Chris's
code is good for long strings and slow for short ones.

I used -O3 throughout. I'll post the -O2 with -O2 as well but in my
case I get faster time in all cases using -O3 (gcc 4.4.1).

<snip>
I assume these results are for 4004 byte long strings with 2 short
replacements. If so, here are my comparative timings:
thomasson (O2) 4.08 seconds
cmt_replace(`cat d4004`, "[]", "xx"):
4162900 calls in 4.000s is 960.8ns/call (9.608e-07s/call)
thomasson (O3) 4.08 seconds
cmt_replace(`cat d4004`, "[]", "xx"):
4202100 calls in 4.000s is 951.9ns/call (9.519e-07s/call)fast_replace(`cat d4004`, "[]", "xx"):
1133100 calls in 4.000s is 3.53µs/call (3.53e-06s/call)
fast_replace(`cat d4004`, "[]", "xx"):
1143900 calls in 4.000s is 3.497µs/call (3.497e-06s/call)

You get little difference between Chris's code and mine, but I get a
huge factor. Could just be a difference in hardware or cache/memory
configuration.
rh_replace2(`cat d4004`, "[]", "xx"):
515300 calls in 4.000s is 7.763µs/call (7.763e-06s/call)
rh_replace2(`cat d4004`, "[]", "xx"):
517900 calls in 4.000s is 7.723µs/call (7.723e-06s/call)
io_x (O2) 18.05 seconds [1]
io_x (O3) **** seconds (segfault) [2]
nilges (O2) 7.72 seconds
en_replace(`cat d4004`, "[]", "xx"):
499900 calls in 4.000s is 8.002µs/call (8.002e-06s/call)
en_replace(`cat d4004`, "[]", "xx"):
486300 calls in 4.000s is 8.226µs/call (8.226e-06s/call)
w_replace(`cat d4004`, "[]", "xx"):
492100 calls in 4.001s is 8.13µs/call (8.13e-06s/call)
w_replace(`cat d4004`, "[]", "xx"):
493100 calls in 4.000s is 8.112µs/call (8.112e-06s/call)
[2] These results may be misleading -- I wasn't sure how to
generate an executable from the mix of C and assembly and more or
less tried things until I got a clean compile/link, with:

nasm -felf -o replacer.o replacer.s
gcc -o tester -Wall -pedantic -std=c99 -On tester.c replacer.o
(n=2 or n=3)

blmblm-1-C (O2) 10.78 seconds
blmblm-1-user (O2) 35.97 seconds
blmblm-2-C (O2) 9.60 seconds
blmblm-2-user (O2) 35.10 seconds
blmblm-3-C (O2) 7.86 seconds
blm_replace(`cat d4004`, "[]", "xx"):
3162500 calls in 3.998s is 1.264µs/call (1.264e-06s/call)
blm_replace(`cat d4004`, "[]", "xx"):
3212200 calls in 3.998s is 1.245µs/call (1.245e-06s/call)

Again a difference. You code is super fast, at least on my hardware.

Full results using -O2 this time and the same set of tests to see
variation in times:

blm_replace(`cat d4004`, "[]", "xx"):
3162500 calls in 3.998s is 1.264µs/call (1.264e-06s/call)
rh_replace2(`cat d4004`, "[]", "xx"):
516300 calls in 4.000s is 7.748µs/call (7.748e-06s/call)
cmt_replace(`cat d4004`, "[]", "xx"):
4181100 calls in 4.000s is 956.6ns/call (9.566e-07s/call)
w_replace(`cat d4004`, "[]", "xx"):
492100 calls in 4.001s is 8.13µs/call (8.13e-06s/call)
en_replace(`cat d4004`, "[]", "xx"):
499900 calls in 4.000s is 8.002µs/call (8.002e-06s/call)
fast_replace(`cat d4004`, "[]", "xx"):
1140500 calls in 4.000s is 3.507µs/call (3.507e-06s/call)

blm_replace(`cat d4004`, "{}", "xx"):
3774700 calls in 3.999s is 1.06µs/call (1.06e-06s/call)
rh_replace2(`cat d4004`, "{}", "xx"):
518400 calls in 4.000s is 7.716µs/call (7.716e-06s/call)
cmt_replace(`cat d4004`, "{}", "xx"):
4253000 calls in 4.000s is 940.5ns/call (9.405e-07s/call)
w_replace(`cat d4004`, "{}", "xx"):
495700 calls in 4.000s is 8.07µs/call (8.07e-06s/call)
en_replace(`cat d4004`, "{}", "xx"):
511000 calls in 4.000s is 7.829µs/call (7.829e-06s/call)
fast_replace(`cat d4004`, "{}", "xx"):
1040400 calls in 4.000s is 3.844µs/call (3.844e-06s/call)

blm_replace(`cat wap.txt`, "and", "xx"):
200 calls in 5.107s is 2.553e+04µs/call (0.02553s/call)
rh_replace2(`cat wap.txt`, "and", "xx"):
400 calls in 4.125s is 1.031e+04µs/call (0.01031s/call)
cmt_replace(`cat wap.txt`, "and", "xx"):
200 calls in 4.542s is 2.271e+04µs/call (0.02271s/call)
w_replace(`cat wap.txt`, "and", "xx"):
400 calls in 4.273s is 1.068e+04µs/call (0.01068s/call)
en_replace(`cat wap.txt`, "and", "xx"):
400 calls in 4.915s is 1.229e+04µs/call (0.01229s/call)
fast_replace(`cat wap.txt`, "and", "xx"):
400 calls in 3.420s is 8550µs/call (0.00855s/call)

blm_replace(`cat wap.txt`, "ZZZ", "xx"):
200 calls in 4.479s is 2.24e+04µs/call (0.0224s/call)
rh_replace2(`cat wap.txt`, "ZZZ", "xx"):
600 calls in 3.821s is 6369µs/call (0.006369s/call)
cmt_replace(`cat wap.txt`, "ZZZ", "xx"):
200 calls in 4.383s is 2.192e+04µs/call (0.02192s/call)
w_replace(`cat wap.txt`, "ZZZ", "xx"):
500 calls in 3.379s is 6758µs/call (0.006758s/call)
en_replace(`cat wap.txt`, "ZZZ", "xx"):
600 calls in 3.945s is 6576µs/call (0.006576s/call)
fast_replace(`cat wap.txt`, "ZZZ", "xx"):
1000 calls in 4.105s is 4105µs/call (0.004105s/call)

blm_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
6237100 calls in 3.999s is 641.2ns/call (6.412e-07s/call)
rh_replace2("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
25080600 calls in 4.000s is 159.5ns/call (1.595e-07s/call)
cmt_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
7199800 calls in 4.000s is 555.6ns/call (5.556e-07s/call)
w_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
21167000 calls in 4.000s is 189ns/call (1.89e-07s/call)
en_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
10014700 calls in 4.000s is 399.4ns/call (3.994e-07s/call)
fast_replace("abzzefzzijlmzzpqrzzuvzzyz", "zz", "xx"):
18143800 calls in 4.000s is 220.5ns/call (2.205e-07s/call)

blm_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
33012700 calls in 3.999s is 121.1ns/call (1.211e-07s/call)
rh_replace2("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
35563900 calls in 4.000s is 112.5ns/call (1.125e-07s/call)
cmt_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
18117800 calls in 4.000s is 220.8ns/call (2.208e-07s/call)
w_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
40621400 calls in 4.000s is 98.47ns/call (9.847e-08s/call)
en_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
26174300 calls in 4.000s is 152.8ns/call (1.528e-07s/call)
fast_replace("abcdefghijlmnopqrstuvwxyz", "zz", "xx"):
41656400 calls in 4.000s is 96.02ns/call (9.602e-08s/call)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,109
Messages
2,570,671
Members
47,262
Latest member
EffiePju4

Latest Threads

Top