B
blmblm
[ snip ]
I just wanted to distinguish between the precision on the counter
(which is 1 microsecond) and the accuracy of the reported time (which
can be as course as 10 milliseconds (though that might be a memory
from a very old system).
Aha. Okay. I also vaguely remember hearing something along those
lines (that the precision and the accuracy might not be the same).
The above times work out at about 26.5 microseconds per call, so your
method is probably OK, but it won't work on faster hardware. On my
2.53GHz Core2 Duo P8700-based laptop a similar function takes 1.6
microseconds per call and summing times individual times is not
reliable. [Aside: you can get it to be more reliable (under certain
assumptions) by taking the granularity into account and doing a little
statistics.]
I think you're exactly right that I should be timing things in
bigger chunks rather than at the level of individual calls.
What was I thinking .... probably that timing individual
calls would fit nicely with the structure of my code. Well.
Another round of refactoring may be in order.
Anyway, I just wanted to check: is more than 16 times slower than mine
times a reasonable result on your machine?
Note: I am not timing your function but mine although I doubt that
much of the 16 times is due to either my code or my system's libc.
Good question. I did my experiments on a machine that's several
years old and was not high-end to start with. /proc/cpuinfo says
the processor is a 2.4GHz Intel Celeron, and I'm compiling with
gcc 4.0.1
I just re-ran the timing tests on something that according to
/proc/cpuinfo is a 3.33GHz Intel Duo, compiling with gcc 4.4.1,
and times were reduced by factors of roughly 3 for the versions
using my string functions and 2.5 for the versions using the
library string functions. If you're using faster hardware or
a smarter compiler .... I don't know; a factor of 16 does seem
like a lot, but maybe?
It's possible, I suppose. I'd time in larger chunks. What I do is
loop in sets of 100 calls until some reasonable time as elapsed (say
5 seconds) and then use the exact elapsed time and the number of sets
actually executed. That way I can time very fact calls (replace a
with x in y) and very longs ones (replace and with x in
war-and-peace.txt) with the same driver.
Yes .... Well, maybe my next rewrite will incorporate that idea!
Based on your previous comments I did a bit of rewriting (which
turned out to be less than I had thought it might be), and I'll
put the result below, for the record or something. Output is
sufficiently close to previous results not to post again, but
the code is a bit shorter, and that's a win.
(Thanks for taking the time to comment on my code, by the way!
In my thinking you're one of the group's respected regulars .... )
I also had a late-breaking thought on how to somewhat improve one
of the things I thought was ugly about my scans-only-once code
and will also include below my latest revision of (one version
of) that as well, again sort of for the record. (Briefly --
I realized that my function to build a list of positions in the
input string that match the text to change was slightly more
complicated than it needed to be. Details probably not of much
interest to anyone but me!)
======== file tester.c ========
/*
* Program to test/demonstrate function to replace multiple occurrences of
* old_text with new_text in string.
*
* Command-line arguments (all optional):
* --verbose for verbose output
* --correctness to do correctness tests
* --timing N for timing tests only (N is how many times to run each test)
*
* See comments for replace() below for function "specification".
*/
#include <stdio.h>
#include <stdlib.h>
/* ==== "user"-supplied functions ==== */
/*
* behaves like strcmp
*/
int x_strcmp(const char* s1, const char* s2);
/* FIXME provide this file */
#include "xstring.c"
/*
* replace "all" occurrences of old_text with new_text
*
* conceptually, scans from left to right, replacing as it goes (so if there
* are overlapping occurrences, the leftmost one "wins" and the other is
* ignored)
*
* parameters:
* s (input string)
* old_text, new_text (text to replace and the replacement)
* all inputs are assumed to be non-NULL, and old_text is assumed to be
* non-empty
*
* returns a newly-allocated string that is a copy of s with "all"
* occurrences of old_text replaced with new_text, or NULL if malloc was
* unable to get space for the output string
*/
char* replace(const char* s, const char* old_text, const char* new_text);
/*
* describe this version
*
* writes to stdout anything you want printed at the start of the test
*/
void describe_replace(void);
/* FIXME provide this file */
#include "replace.c"
/* ==== function for timing ==== */
/*
* get time (preferably milliseconds-or-better resolution)
*
* returns time in seconds since some fixed starting point
*/
/* FIXME adjust as needed for your platform -- this works for Fedora Linux */
#include <sys/time.h>
double get_time(void) {
struct timeval tv;
if (gettimeofday(&tv, NULL) == 0)
return (double) tv.tv_sec + ((double) tv.tv_usec / (double) 1000000);
else
return 0.0;
}
/* ==== macros for tests ==== */
/*
* macro to perform one correctness test
*
* conforms to the spinoza1111 interface to allow easy copy/paste of his tests;
* first parameter is unneeded for my code
*/
#define TESTER(ignored, in, old_text, new_text, expected) \
one_test(in, old_text, new_text, expected, &test_count, &error_count, 1);
/*
* macro to perform one timing test
*
* see timing_test comments for meanings of parameters
*/
#define TIMING_TESTER(fill_lng, old_lng, new_lng, in_rpts, rep_rpts, t_rpts) \
total_time += \
timing_test(fill_lng, old_lng, new_lng, in_rpts, rep_rpts, t_rpts, \
&test_count, &error_count);
/* ==== macros, variables, function definitions, etc. ==== */
/* (comments for individual functions are with code at end) */
#define STRING_OR_NULL(s) (((s) == NULL) ? "NULL" : s)
void do_correctness_tests(void);
void do_timing_tests(const int repeats);
double one_test(const char* s, const char* old_text, const char* new_text,
const char* expected_result, int* test_count_ptr, int* error_count_ptr,
const int repeats);
double repeat_test(const char* s, const char* old_text, const char* new_text,
const char* expected_result, int* test_count_ptr, int* error_count_ptr,
int repeats);
double timing_test(int fill_length,
const int old_text_length, const int new_text_length,
const int input_repeats, const int replace_repeats,
const int test_repeats,
int* test_count_ptr, int* error_count_ptr);
char* generate_text(const int fill_length, const int target_length,
const char fill_char, const char target_char, const int num_repeats);
/* ==== global variable(s) ==== */
int verbose;
/* ==== main program ==== */
int main(int argc, char* argv[]) {
char* usage_msg =
"usage: %s [--verbose] [--correctness] [--timing repeats]\n";
int correctness;
int timing;
int repeats;
char* strtol_endptr;
int next_arg;
verbose = 0;
correctness = 0;
timing = 0;
repeats = 0;
/* process arguments */
next_arg = 1;
while (next_arg < argc) {
if (x_strcmp(argv[next_arg], "--verbose") == 0) {
verbose = 1;
++next_arg;
}
else if (x_strcmp(argv[next_arg], "--correctness") == 0) {
correctness = 1;
++next_arg;
}
else if (x_strcmp(argv[next_arg], "--timing") == 0) {
timing = 1;
++next_arg;
if (!(next_arg < argc)) {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
repeats = strtol(argv[next_arg], &strtol_endptr, 10);
if (*strtol_endptr != '\0') {
fprintf(stderr, "invalid repeats\n");
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
++next_arg;
}
else {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
}
if ((correctness == 0) && (timing == 0)) {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
/* do tests */
describe_replace();
fprintf(stdout, "\n");
if (correctness) {
do_correctness_tests();
}
if (timing) {
do_timing_tests(repeats);
}
return EXIT_SUCCESS;
}
/* ==== function definitions ==== */
/*
* perform correctness tests
*/
void do_correctness_tests(void) {
int test_count = 0;
int error_count = 0;
fprintf(stdout, "performing correctness tests\n\n");
fprintf(stdout, "simple tests\n\n");
/* FIXME: adjust as desired */
TESTER(0, "abcd", "ab", "xy", "xycd")
TESTER(0, "abcd", "cd", "xy", "abxy")
TESTER(0, "abcd", "bc", "xy", "axyd")
TESTER(0, "abcdabcd", "bc", "xy", "axydaxyd")
/* uncomment to check code to deal with failures
TESTER(0, "abcd", "abcd", "abcd", "xyzw")
*/
fprintf(stdout, "spinoza1111 tests\n\n");
/* FIXME: provide this file */
#include "spinoza-tests.c"
fprintf(stdout, "%d tests\n", test_count);
fprintf(stdout, "%d errors\n", error_count);
fprintf(stdout, "\n");
}
/*
* perform timing tests
*/
void do_timing_tests(const int repeats) {
int test_count = 0;
int error_count = 0;
double total_time = 0.0;
fprintf(stdout, "performing timing tests (%d repeats) \n\n", repeats);
/* FIXME: adjust as desired */
/* longish string with few changes */
TIMING_TESTER(2000, 2, 2, 2, 20000, repeats)
/* longish string with moderate number of changes */
TIMING_TESTER(400, 2, 2, 10, 20000, repeats)
/* longish string with many changes */
TIMING_TESTER(80, 2, 2, 50, 20000, repeats)
/* longish string with many changes and longer old/new text */
TIMING_TESTER(80, 10, 10, 50, 20000, repeats)
fprintf(stdout, "%d tests (counting repeats)\n", test_count);
fprintf(stdout, "%d errors (counting repeats)\n", error_count);
fprintf(stdout, "total time = %.2f seconds\n", total_time);
fprintf(stdout, "\n");
}
/*
* perform one test
*
* s, old_text, new_text, expected_result are self-explanatory(?)
*
* *test_count_ptr is incremented if test_count_ptr not NULL
* *error_count_ptr is incremented if error_count_ptr not NULL
*
* repeats is how many times to repeat test (for timing purposes)
*
* if (global variable) verbose is nonzero, prints verbose output and
* "passed" / "failed"
* otherwise prints only information about failures
*
* returns time for test (call to replace() only) in seconds
*/
double one_test(const char* s, const char* old_text, const char* new_text,
const char* expected_result, int* test_count_ptr, int* error_count_ptr,
const int repeats)
{
char* result;
int error;
double start_time, elapsed_time;
int n;
if (verbose && (repeats > 1)) {
fprintf(stdout, "repeating test %d time(s)\n\n", repeats);
}
result = NULL;
start_time = get_time();
for (n = 0; n < repeats; ++n) {
free(result);
result = replace(s, old_text, new_text);
}
elapsed_time = get_time() - start_time;
if ((result == NULL) && (expected_result == NULL)) {
error = 0;
}
else if ((result != NULL) && (expected_result != NULL) &&
(x_strcmp(result, expected_result) == 0)) {
error = 0;
}
else {
error = 1;
}
if (verbose) {
fprintf(stdout, "replacing '%s' with '%s' in '%s'\n",
old_text, new_text, s);
fprintf(stdout, "expected result '%s'\n",
expected_result);
if (error == 0) {
fprintf(stdout, "passed\n");
}
else {
fprintf(stdout, "failed: actual result '%s'\n",
STRING_OR_NULL(result));
}
fprintf(stdout, "\n");
}
else {
if (error != 0) {
fprintf(stdout, "test failed\n");
fprintf(stdout, "replacing '%s' with '%s' in '%s'\n",
old_text, new_text, s);
fprintf(stdout, "expected result '%s'\n", expected_result);
fprintf(stdout, "actual result '%s'\n",
STRING_OR_NULL(result));
fprintf(stdout, "\n");
}
}
if (test_count_ptr != NULL) {
++(*test_count_ptr);
}
if ((error !=0 ) && (error_count_ptr != NULL)) {
++(*error_count_ptr);
}
free(result);
return elapsed_time;
}
/*
* perform timing test -- call replace() repeatedly with generated
* strings and print timing information
*
* input to replace():
*
* input string consists of input_repeats repetitions of a string
* consisting of fill_length x's followed by old_text_length y's
* old_text is old_text_length y's
* new_text is new_text_length z's
* expected result consists of input_repeats repetitions of a string
* consisting of fill_length x's followed by new_text_length z's
*
* a "test" consists of calling replace() replace_repeats times
*
* test_repeats "tests" are performed
*
* prints same output as one_test (once per repetition) and also
* timing information
*
* *test_count_ptr is incremented if test_count_ptr not NULL
* *error_count_ptr is incremented if error_count_ptr not NULL
*
* returns total time for test (calls to replace() only) in seconds
*/
double timing_test(int fill_length,
const int old_text_length, const int new_text_length,
const int input_repeats, const int replace_repeats,
const int test_repeats,
int* test_count_ptr, int* error_count_ptr)
{
double total_time = 0.0;
double time;
double* times = malloc(test_repeats * sizeof(*times));
char* input;
char* old_text;
char* new_text;
char fill_char = 'x';
char old_char = 'y';
char new_char = 'y';
char* expected_result;
int i;
fprintf(stdout,
"timing "
"(length %d, changing %d occurrences of %d chars to %d chars) "
"%d times\n",
(fill_length + old_text_length) * input_repeats,
input_repeats,
old_text_length, new_text_length,
replace_repeats);
if (verbose) {
fprintf(stdout, "\n");
}
old_text = generate_text(0, old_text_length, ' ', old_char, 1);
new_text = generate_text(0, new_text_length, ' ', new_char, 1);
input = generate_text(fill_length, old_text_length,
fill_char, old_char, input_repeats);
expected_result = generate_text(fill_length, new_text_length,
fill_char, new_char, input_repeats);
if ((old_text == NULL) || (new_text == NULL) ||
(input == NULL) || (expected_result == NULL))
{
fprintf(stdout, "test skipped -- cannot create test data\n\n");
return 0.0;
}
for (i = 0; i < test_repeats; ++i) {
time = one_test(input, old_text, new_text, expected_result,
test_count_ptr, error_count_ptr, replace_repeats);
total_time += time;
if (times != NULL) {
times = time;
}
}
if (times == NULL) {
fprintf(stdout, "unable to print times for test (%d repeats)\n\n",
test_repeats);
}
else {
fprintf(stdout, "times (seconds): ");
for (i = 0; i < test_repeats; ++i) {
fprintf(stdout, "%.2f ", times);
}
fprintf(stdout, "\n\n");
}
free(old_text);
free(new_text);
free(input);
free(expected_result);
return total_time;
}
/*
* generate string for timing test
*
* generated string will be num_repeats repetitions of a string
* consisting of fill_length fill_char's followed by
* target_length target_char's
*
* returns pointer to generated string, or NULL if malloc fails
*/
char* generate_text(const int fill_length, const int target_length,
const char fill_char, const char target_char, const int num_repeats)
{
char* result;
char* p;
int i, j;
result = malloc(sizeof(*result) *
(1 + num_repeats * (fill_length + target_length)));
if (result != NULL) {
p = result;
for (i = 0; i < num_repeats; ++i) {
for (j = 0; j < fill_length; ++j) {
*(p++) = fill_char;
}
for (j = 0; j < target_length; ++j) {
*(p++) = target_char;
}
}
*p = '\0';
}
return result;
}
======== file replace.c ========
/*
* version that scans input only once and computes string lengths only once
*
* ugly-hack aspects:
*
* "list of matches" is not properly abstracted, but exposing its
* structure, as this code does, seemed to simplify other things
* (such as recursing over the rest of the list)
*
* "string and length" data structure is used only to avoid recomputing
* length, rather than consistently to represent strings -- but using it
* more consistently would make other things (such as recursing over the
* rest of the list) uglier
*/
#include <assert.h>
/* entry in list of matches */
struct match_list_entry {
char* match_start;
struct match_list_entry* next;
};
typedef struct match_list_entry match_list_entry_t;
/* data structure packaging string with length -- to avoid recomputing */
struct string_and_length {
char* start; /* must be null-terminated */
size_t length;
};
typedef struct string_and_length string_and_length_t;
/* see definitions below for comments */
int count_matches_and_build_match_list(
const char* s, const string_and_length_t* old_text,
match_list_entry_t** first_match);
void replace_sub(const char* s, char* new_s,
const string_and_length_t* old_text,
const string_and_length_t* new_text,
const match_list_entry_t* first_match);
void free_match_list(match_list_entry_t* first_match);
void initialize_string_and_length(string_and_length_t* sl, const char* s);
/* see main file for comments */
char* replace(const char* s, const char* old_text, const char* new_text) {
int matches;
char* new_s;
match_list_entry_t* first_match = NULL;
string_and_length_t old_text_sl;
string_and_length_t new_text_sl;
assert(s != NULL);
assert(old_text != NULL);
assert(new_text != NULL);
assert(*old_text != '\0');
initialize_string_and_length(&old_text_sl, old_text);
initialize_string_and_length(&new_text_sl, new_text);
matches = count_matches_and_build_match_list(s, &old_text_sl, &first_match);
if (matches == -1) { /* error building list */
free_match_list(first_match);
return NULL;
}
else {
new_s = malloc(sizeof(*new_s) *
(1 + x_strlen(s) +
matches * (new_text_sl.length - old_text_sl.length)));
if (new_s != NULL) {
replace_sub(s, new_s, &old_text_sl, &new_text_sl, first_match);
}
free_match_list(first_match);
return new_s;
}
}
char* lib_describe(void); /* in xstring.c */
void describe_replace(void) {
fprintf(stdout, "further improved version of replace():\n");
fprintf(stdout, "scans input only once, building linked list of matches\n");
fprintf(stdout, "avoids recomputing string lengths\n");
fprintf(stdout, "uses %s\n", lib_describe());
}
/*
* build list of matches of old_text in s, scanning left to right and
* ignoring overlapping occurrences
*
* on return, parameter first_match point to the first match, or NULL if
* none
*
* returns count of matches, or -1 if error (because space for match list
* could not be allocated)
*/
int count_matches_and_build_match_list(
const char* s, const string_and_length_t* old_text,
match_list_entry_t** first_match)
{
char* match_found;
match_list_entry_t* entry;
int count_rest;
/* look for old_text in s (returning if not found) */
if (*s == '\0') return 0;
match_found = x_strstr(s, old_text->start);
if (match_found == NULL) return 0;
/* add entry for match to list */
entry = malloc(sizeof *entry);
if (entry == NULL) return -1;
entry->match_start = match_found;
entry->next = NULL;
*first_match = entry;
/* repeat for remaining input */
count_rest = count_matches_and_build_match_list(
match_found + old_text->length, old_text,
&((*first_match)->next));
if (count_rest == -1) {
return -1;
}
else {
return 1 + count_rest;
}
}
/*
* copy s to new_s, replacing old_text with new_text
*/
void replace_sub(const char* s, char* new_s,
const string_and_length_t* old_text,
const string_and_length_t* new_text,
const match_list_entry_t* first_match)
{
size_t chars_before;
if (first_match == NULL) {
chars_before = x_strlen(s);
}
else {
chars_before = first_match->match_start - s;
}
x_memcpy(new_s, s, chars_before);
if (first_match == NULL) {
*(new_s + chars_before) = '\0';
}
else {
x_memcpy(new_s + chars_before, new_text->start, new_text->length);
replace_sub(first_match->match_start + old_text->length,
new_s + chars_before + new_text->length,
old_text, new_text, first_match->next);
}
}
/*
* free list of matches
*/
void free_match_list(match_list_entry_t* first_match) {
match_list_entry_t* match;
if (first_match != NULL) {
match = first_match;
free_match_list(match->next);
free(match);
}
}
/*
* initialize string-and-length data structure (from null-terminated string)
*/
void initialize_string_and_length(string_and_length_t* sl, const char* s) {
sl->start = (char*) s;
sl->length = x_strlen(s);
}