automated test for sub-string search algorithms...

C

Chris M. Thomasson

This code includes my sub-string search algorithm, the automated test, and a
stress test for my algorithm:


http://clc.pastebin.com/f5cb3bbd2


This testing framework helped me find several bugs in my algorithm. I have
fixed them and so far, so good... I will continue to blast the shi% out of
my algorithm with this test and report any errors here.


The automated test randomly generates a comparand string, then it builds a
source string and inserts the comparand at random positions within it. It
records all the offsets, and the count of matches. Now, you can use this
information to verify that your search algorithm is getting everything
correct. Please refer to the `strstr_test_create()' function which generates
the random data. Then refer to the `strstr_test_xstrstr()' function which
actually verifies that my algorithm is getting things right.


AFAICT, it's a fairly decent way to blast your sub-string search algorithm
with tremendous forms of random data.


Enjoy!




BTW, if you have any improvements in mind, please share all of them.

;^)
 
C

Chris M. Thomasson

Chris M. Thomasson said:
This code includes my sub-string search algorithm, the automated test, and
a stress test for my algorithm:


http://clc.pastebin.com/f5cb3bbd2
[...]

There is a little bug on `line 68':


printf("source(%u): %s\n",
(unsigned long)src_size,
src);


The first formatter is not compatible with the damn argument! Here, let me
fix that:


http://clc.pastebin.com/f32655568


Sorry about that crap. Anyway, I changed a couple of things. One, I funnel
all debug messages through a function macro `DBG_PRINTF()' which
automatically disables output if `NDEBUG' is defined. Two, I made a little
tweak to the search algorithm itself. It now searches from left-to-right
after it determines that a rightmost character matches the rightmost
character from the comparand.


IMHO, this simple little test program comes in handy when you are trying to
debug a sub-string search algorithm. I can play around with all sorts of
improvements and see exactly where each one of them fail. One thing I am
exploring is reading 32/64-bits worth of characters at a time. This allows
me reduce the total number of comparisons...
 
C

Chris M. Thomasson

Chris M. Thomasson said:
This code includes my sub-string search algorithm, the automated test, and
a stress test for my algorithm:


http://clc.pastebin.com/f5cb3bbd2


This testing framework helped me find several bugs in my algorithm. I have
fixed them and so far, so good... I will continue to blast the shi% out of
my algorithm with this test and report any errors here.

Here is latest version of my sub-string search algorithm:


http://clc.pastebin.com/f5c1d3e5a


This one allows search to skip many more characters than before. I don't
think its like Boyer-Moore because I am not using two tables. Humm...


Any thoughts?
 
C

Chris M. Thomasson

Chris M. Thomasson said:
Here is latest version of my sub-string search algorithm:


http://clc.pastebin.com/f5c1d3e5a


This one allows search to skip many more characters than before. I don't
think its like Boyer-Moore because I am not using two tables. Humm...


Any thoughts?

I would say it's close to the following algorithm:


http://en.wikipedia.org/wiki/Boyer-Moore-Horspool_algorithm


I think I am initializing the hash table a little bit differently. For
instance, my bad character hash hit is zero, while the Horspool algorithm
would have the length of the comparand stored in the table.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top