Efficency and the standard library

blmblm · Feb 20, 2010

[ snip ]

I just wanted to distinguish between the precision on the counter
(which is 1 microsecond) and the accuracy of the reported time (which
can be as course as 10 milliseconds (though that might be a memory
from a very old system).

Click to expand...

Aha. Okay. I also vaguely remember hearing something along those
lines (that the precision and the accuracy might not be the same).

The above times work out at about 26.5 microseconds per call, so your
method is probably OK, but it won't work on faster hardware. On my
2.53GHz Core2 Duo P8700-based laptop a similar function takes 1.6
microseconds per call and summing times individual times is not
reliable. [Aside: you can get it to be more reliable (under certain
assumptions) by taking the granularity into account and doing a little
statistics.]

Click to expand...

I think you're exactly right that I should be timing things in
bigger chunks rather than at the level of individual calls.
What was I thinking .... probably that timing individual
calls would fit nicely with the structure of my code. Well.
Another round of refactoring may be in order.

Anyway, I just wanted to check: is more than 16 times slower than mine
times a reasonable result on your machine?

Note: I am not timing your function but mine although I doubt that
much of the 16 times is due to either my code or my system's libc.

Click to expand...

Good question. I did my experiments on a machine that's several
years old and was not high-end to start with. /proc/cpuinfo says
the processor is a 2.4GHz Intel Celeron, and I'm compiling with
gcc 4.0.1

I just re-ran the timing tests on something that according to
/proc/cpuinfo is a 3.33GHz Intel Duo, compiling with gcc 4.4.1,
and times were reduced by factors of roughly 3 for the versions
using my string functions and 2.5 for the versions using the
library string functions. If you're using faster hardware or
a smarter compiler .... I don't know; a factor of 16 does seem
like a lot, but maybe?

Click to expand...

It's possible, I suppose. I'd time in larger chunks. What I do is
loop in sets of 100 calls until some reasonable time as elapsed (say
5 seconds) and then use the exact elapsed time and the number of sets
actually executed. That way I can time very fact calls (replace a
with x in y) and very longs ones (replace and with x in
war-and-peace.txt) with the same driver.

Click to expand...

Yes .... Well, maybe my next rewrite will incorporate that idea!
Based on your previous comments I did a bit of rewriting (which
turned out to be less than I had thought it might be), and I'll
put the result below, for the record or something. Output is
sufficiently close to previous results not to post again, but
the code is a bit shorter, and that's a win.

(Thanks for taking the time to comment on my code, by the way!
In my thinking you're one of the group's respected regulars .... )

I also had a late-breaking thought on how to somewhat improve one
of the things I thought was ugly about my scans-only-once code
and will also include below my latest revision of (one version
of) that as well, again sort of for the record. (Briefly --
I realized that my function to build a list of positions in the
input string that match the text to change was slightly more
complicated than it needed to be. Details probably not of much
interest to anyone but me!)

======== file tester.c ========

/*
* Program to test/demonstrate function to replace multiple occurrences of
* old_text with new_text in string.
*
* Command-line arguments (all optional):
* --verbose for verbose output
* --correctness to do correctness tests
* --timing N for timing tests only (N is how many times to run each test)
*
* See comments for replace() below for function "specification".
*/
#include <stdio.h>
#include <stdlib.h>

/* ==== "user"-supplied functions ==== */

/*
* behaves like strcmp
*/
int x_strcmp(const char* s1, const char* s2);
/* FIXME provide this file */
#include "xstring.c"

/*
* replace "all" occurrences of old_text with new_text
*
* conceptually, scans from left to right, replacing as it goes (so if there
* are overlapping occurrences, the leftmost one "wins" and the other is
* ignored)
*
* parameters:
* s (input string)
* old_text, new_text (text to replace and the replacement)
* all inputs are assumed to be non-NULL, and old_text is assumed to be
* non-empty
*
* returns a newly-allocated string that is a copy of s with "all"
* occurrences of old_text replaced with new_text, or NULL if malloc was
* unable to get space for the output string
*/
char* replace(const char* s, const char* old_text, const char* new_text);
/*
* describe this version
*
* writes to stdout anything you want printed at the start of the test
*/
void describe_replace(void);
/* FIXME provide this file */
#include "replace.c"

/* ==== function for timing ==== */
/*
* get time (preferably milliseconds-or-better resolution)
*
* returns time in seconds since some fixed starting point
*/
/* FIXME adjust as needed for your platform -- this works for Fedora Linux */
#include <sys/time.h>
double get_time(void) {
struct timeval tv;
if (gettimeofday(&tv, NULL) == 0)
return (double) tv.tv_sec + ((double) tv.tv_usec / (double) 1000000);
else
return 0.0;
}

/* ==== macros for tests ==== */

/*
* macro to perform one correctness test
*
* conforms to the spinoza1111 interface to allow easy copy/paste of his tests;
* first parameter is unneeded for my code
*/
#define TESTER(ignored, in, old_text, new_text, expected) \
one_test(in, old_text, new_text, expected, &test_count, &error_count, 1);

/*
* macro to perform one timing test
*
* see timing_test comments for meanings of parameters
*/
#define TIMING_TESTER(fill_lng, old_lng, new_lng, in_rpts, rep_rpts, t_rpts) \
total_time += \
timing_test(fill_lng, old_lng, new_lng, in_rpts, rep_rpts, t_rpts, \
&test_count, &error_count);

/* ==== macros, variables, function definitions, etc. ==== */
/* (comments for individual functions are with code at end) */

#define STRING_OR_NULL(s) (((s) == NULL) ? "NULL" : s)

void do_correctness_tests(void);
void do_timing_tests(const int repeats);
double one_test(const char* s, const char* old_text, const char* new_text,
const char* expected_result, int* test_count_ptr, int* error_count_ptr,
const int repeats);
double repeat_test(const char* s, const char* old_text, const char* new_text,
const char* expected_result, int* test_count_ptr, int* error_count_ptr,
int repeats);
double timing_test(int fill_length,
const int old_text_length, const int new_text_length,
const int input_repeats, const int replace_repeats,
const int test_repeats,
int* test_count_ptr, int* error_count_ptr);
char* generate_text(const int fill_length, const int target_length,
const char fill_char, const char target_char, const int num_repeats);

/* ==== global variable(s) ==== */

int verbose;

/* ==== main program ==== */

int main(int argc, char* argv[]) {

char* usage_msg =
"usage: %s [--verbose] [--correctness] [--timing repeats]\n";

int correctness;
int timing;
int repeats;

char* strtol_endptr;
int next_arg;

verbose = 0;
correctness = 0;
timing = 0;
repeats = 0;

/* process arguments */

next_arg = 1;
while (next_arg < argc) {
if (x_strcmp(argv[next_arg], "--verbose") == 0) {
verbose = 1;
++next_arg;
}
else if (x_strcmp(argv[next_arg], "--correctness") == 0) {
correctness = 1;
++next_arg;
}
else if (x_strcmp(argv[next_arg], "--timing") == 0) {
timing = 1;
++next_arg;
if (!(next_arg < argc)) {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
repeats = strtol(argv[next_arg], &strtol_endptr, 10);
if (*strtol_endptr != '\0') {
fprintf(stderr, "invalid repeats\n");
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
++next_arg;
}
else {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
}

if ((correctness == 0) && (timing == 0)) {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}

/* do tests */

describe_replace();
fprintf(stdout, "\n");

if (correctness) {
do_correctness_tests();
}
if (timing) {
do_timing_tests(repeats);
}
return EXIT_SUCCESS;
}

/* ==== function definitions ==== */

/*
* perform correctness tests
*/
void do_correctness_tests(void) {
int test_count = 0;
int error_count = 0;

fprintf(stdout, "performing correctness tests\n\n");

fprintf(stdout, "simple tests\n\n");

/* FIXME: adjust as desired */

TESTER(0, "abcd", "ab", "xy", "xycd")
TESTER(0, "abcd", "cd", "xy", "abxy")
TESTER(0, "abcd", "bc", "xy", "axyd")
TESTER(0, "abcdabcd", "bc", "xy", "axydaxyd")
/* uncomment to check code to deal with failures
TESTER(0, "abcd", "abcd", "abcd", "xyzw")
*/

fprintf(stdout, "spinoza1111 tests\n\n");

/* FIXME: provide this file */

#include "spinoza-tests.c"

fprintf(stdout, "%d tests\n", test_count);
fprintf(stdout, "%d errors\n", error_count);
fprintf(stdout, "\n");
}

/*
* perform timing tests
*/
void do_timing_tests(const int repeats) {
int test_count = 0;
int error_count = 0;
double total_time = 0.0;

fprintf(stdout, "performing timing tests (%d repeats) \n\n", repeats);

/* FIXME: adjust as desired */

/* longish string with few changes */
TIMING_TESTER(2000, 2, 2, 2, 20000, repeats)
/* longish string with moderate number of changes */
TIMING_TESTER(400, 2, 2, 10, 20000, repeats)
/* longish string with many changes */
TIMING_TESTER(80, 2, 2, 50, 20000, repeats)
/* longish string with many changes and longer old/new text */
TIMING_TESTER(80, 10, 10, 50, 20000, repeats)

fprintf(stdout, "%d tests (counting repeats)\n", test_count);
fprintf(stdout, "%d errors (counting repeats)\n", error_count);
fprintf(stdout, "total time = %.2f seconds\n", total_time);
fprintf(stdout, "\n");
}

/*
* perform one test
*
* s, old_text, new_text, expected_result are self-explanatory(?)
*
* *test_count_ptr is incremented if test_count_ptr not NULL
* *error_count_ptr is incremented if error_count_ptr not NULL
*
* repeats is how many times to repeat test (for timing purposes)
*
* if (global variable) verbose is nonzero, prints verbose output and
* "passed" / "failed"
* otherwise prints only information about failures
*
* returns time for test (call to replace() only) in seconds
*/
double one_test(const char* s, const char* old_text, const char* new_text,
const char* expected_result, int* test_count_ptr, int* error_count_ptr,
const int repeats)
{
char* result;
int error;
double start_time, elapsed_time;
int n;

if (verbose && (repeats > 1)) {
fprintf(stdout, "repeating test %d time(s)\n\n", repeats);
}

result = NULL;
start_time = get_time();
for (n = 0; n < repeats; ++n) {
free(result);
result = replace(s, old_text, new_text);
}
elapsed_time = get_time() - start_time;

if ((result == NULL) && (expected_result == NULL)) {
error = 0;
}
else if ((result != NULL) && (expected_result != NULL) &&
(x_strcmp(result, expected_result) == 0)) {
error = 0;
}
else {
error = 1;
}

if (verbose) {
fprintf(stdout, "replacing '%s' with '%s' in '%s'\n",
old_text, new_text, s);
fprintf(stdout, "expected result '%s'\n",
expected_result);
if (error == 0) {
fprintf(stdout, "passed\n");
}
else {
fprintf(stdout, "failed: actual result '%s'\n",
STRING_OR_NULL(result));
}
fprintf(stdout, "\n");
}
else {
if (error != 0) {
fprintf(stdout, "test failed\n");
fprintf(stdout, "replacing '%s' with '%s' in '%s'\n",
old_text, new_text, s);
fprintf(stdout, "expected result '%s'\n", expected_result);
fprintf(stdout, "actual result '%s'\n",
STRING_OR_NULL(result));
fprintf(stdout, "\n");
}
}

if (test_count_ptr != NULL) {
++(*test_count_ptr);
}
if ((error !=0 ) && (error_count_ptr != NULL)) {
++(*error_count_ptr);
}

free(result);
return elapsed_time;
}

/*
* perform timing test -- call replace() repeatedly with generated
* strings and print timing information
*
* input to replace():
*
* input string consists of input_repeats repetitions of a string
* consisting of fill_length x's followed by old_text_length y's
* old_text is old_text_length y's
* new_text is new_text_length z's
* expected result consists of input_repeats repetitions of a string
* consisting of fill_length x's followed by new_text_length z's
*
* a "test" consists of calling replace() replace_repeats times
*
* test_repeats "tests" are performed
*
* prints same output as one_test (once per repetition) and also
* timing information
*
* *test_count_ptr is incremented if test_count_ptr not NULL
* *error_count_ptr is incremented if error_count_ptr not NULL
*
* returns total time for test (calls to replace() only) in seconds
*/
double timing_test(int fill_length,
const int old_text_length, const int new_text_length,
const int input_repeats, const int replace_repeats,
const int test_repeats,
int* test_count_ptr, int* error_count_ptr)
{
double total_time = 0.0;
double time;
double* times = malloc(test_repeats * sizeof(*times));

char* input;
char* old_text;
char* new_text;
char fill_char = 'x';
char old_char = 'y';
char new_char = 'y';
char* expected_result;

int i;

fprintf(stdout,
"timing "
"(length %d, changing %d occurrences of %d chars to %d chars) "
"%d times\n",
(fill_length + old_text_length) * input_repeats,
input_repeats,
old_text_length, new_text_length,
replace_repeats);
if (verbose) {
fprintf(stdout, "\n");
}

old_text = generate_text(0, old_text_length, ' ', old_char, 1);
new_text = generate_text(0, new_text_length, ' ', new_char, 1);
input = generate_text(fill_length, old_text_length,
fill_char, old_char, input_repeats);
expected_result = generate_text(fill_length, new_text_length,
fill_char, new_char, input_repeats);
if ((old_text == NULL) || (new_text == NULL) ||
(input == NULL) || (expected_result == NULL))
{
fprintf(stdout, "test skipped -- cannot create test data\n\n");
return 0.0;
}

for (i = 0; i < test_repeats; ++i) {
time = one_test(input, old_text, new_text, expected_result,
test_count_ptr, error_count_ptr, replace_repeats);
total_time += time;
if (times != NULL) {
times = time;
}
}

if (times == NULL) {
fprintf(stdout, "unable to print times for test (%d repeats)\n\n",
test_repeats);
}
else {
fprintf(stdout, "times (seconds): ");
for (i = 0; i < test_repeats; ++i) {
fprintf(stdout, "%.2f ", times);
}
fprintf(stdout, "\n\n");
}
free(old_text);
free(new_text);
free(input);
free(expected_result);

return total_time;
}

/*
* generate string for timing test
*
* generated string will be num_repeats repetitions of a string
* consisting of fill_length fill_char's followed by
* target_length target_char's
*
* returns pointer to generated string, or NULL if malloc fails
*/
char* generate_text(const int fill_length, const int target_length,
const char fill_char, const char target_char, const int num_repeats)
{
char* result;
char* p;
int i, j;

result = malloc(sizeof(*result) *
(1 + num_repeats * (fill_length + target_length)));
if (result != NULL) {
p = result;
for (i = 0; i < num_repeats; ++i) {
for (j = 0; j < fill_length; ++j) {
*(p++) = fill_char;
}
for (j = 0; j < target_length; ++j) {
*(p++) = target_char;
}
}
*p = '\0';
}

return result;
}

======== file replace.c ========

/*
* version that scans input only once and computes string lengths only once
*
* ugly-hack aspects:
*
* "list of matches" is not properly abstracted, but exposing its
* structure, as this code does, seemed to simplify other things
* (such as recursing over the rest of the list)
*
* "string and length" data structure is used only to avoid recomputing
* length, rather than consistently to represent strings -- but using it
* more consistently would make other things (such as recursing over the
* rest of the list) uglier
*/
#include <assert.h>

/* entry in list of matches */
struct match_list_entry {
char* match_start;
struct match_list_entry* next;
};
typedef struct match_list_entry match_list_entry_t;

/* data structure packaging string with length -- to avoid recomputing */
struct string_and_length {
char* start; /* must be null-terminated */
size_t length;
};
typedef struct string_and_length string_and_length_t;

/* see definitions below for comments */
int count_matches_and_build_match_list(
const char* s, const string_and_length_t* old_text,
match_list_entry_t** first_match);
void replace_sub(const char* s, char* new_s,
const string_and_length_t* old_text,
const string_and_length_t* new_text,
const match_list_entry_t* first_match);
void free_match_list(match_list_entry_t* first_match);
void initialize_string_and_length(string_and_length_t* sl, const char* s);

/* see main file for comments */
char* replace(const char* s, const char* old_text, const char* new_text) {

int matches;
char* new_s;

match_list_entry_t* first_match = NULL;

string_and_length_t old_text_sl;
string_and_length_t new_text_sl;

assert(s != NULL);
assert(old_text != NULL);
assert(new_text != NULL);
assert(*old_text != '\0');

initialize_string_and_length(&old_text_sl, old_text);
initialize_string_and_length(&new_text_sl, new_text);

matches = count_matches_and_build_match_list(s, &old_text_sl, &first_match);
if (matches == -1) { /* error building list */
free_match_list(first_match);
return NULL;
}
else {
new_s = malloc(sizeof(*new_s) *
(1 + x_strlen(s) +
matches * (new_text_sl.length - old_text_sl.length)));
if (new_s != NULL) {
replace_sub(s, new_s, &old_text_sl, &new_text_sl, first_match);
}
free_match_list(first_match);
return new_s;
}
}
char* lib_describe(void); /* in xstring.c */
void describe_replace(void) {
fprintf(stdout, "further improved version of replace():\n");
fprintf(stdout, "scans input only once, building linked list of matches\n");
fprintf(stdout, "avoids recomputing string lengths\n");
fprintf(stdout, "uses %s\n", lib_describe());
}

/*
* build list of matches of old_text in s, scanning left to right and
* ignoring overlapping occurrences
*
* on return, parameter first_match point to the first match, or NULL if
* none
*
* returns count of matches, or -1 if error (because space for match list
* could not be allocated)
*/
int count_matches_and_build_match_list(
const char* s, const string_and_length_t* old_text,
match_list_entry_t** first_match)
{
char* match_found;
match_list_entry_t* entry;
int count_rest;

/* look for old_text in s (returning if not found) */
if (*s == '\0') return 0;
match_found = x_strstr(s, old_text->start);
if (match_found == NULL) return 0;

/* add entry for match to list */
entry = malloc(sizeof *entry);
if (entry == NULL) return -1;
entry->match_start = match_found;
entry->next = NULL;
*first_match = entry;

/* repeat for remaining input */
count_rest = count_matches_and_build_match_list(
match_found + old_text->length, old_text,
&((*first_match)->next));
if (count_rest == -1) {
return -1;
}
else {
return 1 + count_rest;
}
}

/*
* copy s to new_s, replacing old_text with new_text
*/
void replace_sub(const char* s, char* new_s,
const string_and_length_t* old_text,
const string_and_length_t* new_text,
const match_list_entry_t* first_match)
{
size_t chars_before;

if (first_match == NULL) {
chars_before = x_strlen(s);
}
else {
chars_before = first_match->match_start - s;
}
x_memcpy(new_s, s, chars_before);
if (first_match == NULL) {
*(new_s + chars_before) = '\0';
}
else {
x_memcpy(new_s + chars_before, new_text->start, new_text->length);
replace_sub(first_match->match_start + old_text->length,
new_s + chars_before + new_text->length,
old_text, new_text, first_match->next);
}
}

/*
* free list of matches
*/
void free_match_list(match_list_entry_t* first_match) {
match_list_entry_t* match;

if (first_match != NULL) {
match = first_match;
free_match_list(match->next);
free(match);
}
}

/*
* initialize string-and-length data structure (from null-terminated string)
*/
void initialize_string_and_length(string_and_length_t* sl, const char* s) {
sl->start = (char*) s;
sl->length = x_strlen(s);
}

spinoza1111 · Feb 20, 2010

spinoza1111wrote:

Right. Nevertheless, it may not /stay/ addressible and available. For

In rare cases.

example, it might be an auto object, in which case it dies when its
containing block dies. To avoid this, one can store it either in static
storage (which means its size is fixed at compile time) or in
dynamically allocated storage (which means one needs some way to track

Most often, of course, it will be mallocated.

and manage that storage). The former alternative doesn't scale, and the
latter is a container. Since you eschew containers, I must conclude that

I don't know what you're talking about (and I suspect you don't
either). I don't "eschew containers", if the original data is in a
container. I said, don't copy it unnecessarily, since its O(n) per
node and creates the risk that everything might seem proper (because
you're using your copied data in the nodes) while the original is
invalid (freed, out of scope, obsolete, or all three).

you either content yourself with non-scalable solutions, or rely on auto
objects *not* being destroyed when their block exits, or perhaps confine
your data lifetime requirements to a single block.

Actually, dear boy, it's your solution which being O(n) per node which
doesn't scale, and only a silly bastard would rely on autos not being
destroyed when their block exits (do you?). Your third alternative is
poorly expressed and incoherent ranting: what actually is the case is
that the links are only valid as long as the data linked to is valid,
and this is just common sense. The data mapped by the link comes from
the same source (malloc) and goes to the same knacker's yard (free) as
the linked list.

You have no conception of storage elegance if you can so blindly
publish a tool which will so blindly double storage use on each use.

spinoza1111 · Feb 20, 2010

Ben Bacarisse said:
Ben Bacarisse said:

[ snip ]

I just wanted to distinguish between the precision on the counter
(which is 1 microsecond) and the accuracy of the reported time (which
can be as course as 10 milliseconds (though that might be a memory
from a very old system).

Click to expand...

Aha. Okay. I also vaguely remember hearing something along those
lines (that the precision and the accuracy might not be the same).

The above times work out at about 26.5 microseconds per call, so your
method is probably OK, but it won't work on faster hardware. On my
2.53GHz Core2 Duo P8700-based laptop a similar function takes 1.6
microseconds per call and summing times individual times is not
reliable. [Aside: you can get it to be more reliable (under certain
assumptions) by taking the granularity into account and doing a little
statistics.]
I think you're exactly right that I should be timing things in
bigger chunks rather than at the level of individual calls.
What was I thinking .... probably that timing individual
calls would fit nicely with the structure of my code. Well.
Another round of refactoring may be in order.
Anyway, I just wanted to check: is more than 16 times slower than mine
times a reasonable result on your machine?
Note: I am not timing your function but mine although I doubt that
much of the 16 times is due to either my code or my system's libc.
Good question. I did my experiments on a machine that's several
years old and was not high-end to start with. /proc/cpuinfo says
the processor is a 2.4GHz Intel Celeron, and I'm compiling with
gcc 4.0.1
I just re-ran the timing tests on something that according to
/proc/cpuinfo is a 3.33GHz Intel Duo, compiling with gcc 4.4.1,
and times were reduced by factors of roughly 3 for the versions
using my string functions and 2.5 for the versions using the
library string functions. If you're using faster hardware or
a smarter compiler .... I don't know; a factor of 16 does seem
like a lot, but maybe?

Click to expand...

Click to expand...

It's possible, I suppose. I'd time in larger chunks. What I do is
loop in sets of 100 calls until some reasonable time as elapsed (say
5 seconds) and then use the exact elapsed time and the number of sets
actually executed. That way I can time very fact calls (replace a
with x in y) and very longs ones (replace and with x in
war-and-peace.txt) with the same driver.

Click to expand...

Yes .... Well, maybe my next rewrite will incorporate that idea!
Based on your previous comments I did a bit of rewriting (which
turned out to be less than I had thought it might be), and I'll
put the result below, for the record or something. Output is
sufficiently close to previous results not to post again, but
the code is a bit shorter, and that's a win.

(Thanks for taking the time to comment on my code, by the way!
In my thinking you're one of the group's respected regulars .... )

I also had a late-breaking thought on how to somewhat improve one
of the things I thought was ugly about my scans-only-once code
and will also include below my latest revision of (one version
of) that as well, again sort of for the record. (Briefly --
I realized that my function to build a list of positions in the
input string that match the text to change was slightly more
complicated than it needed to be. Details probably not of much
interest to anyone but me!)

======== file tester.c ========

/*
* Program to test/demonstrate function to replace multiple occurrences of
* old_text with new_text in string.
*
* Command-line arguments (all optional):
* --verbose for verbose output
* --correctness to do correctness tests
* --timing N for timing tests only (N is how many times to run each test)
*
* See comments for replace() below for function "specification".
*/
#include <stdio.h>
#include <stdlib.h>

/* ==== "user"-supplied functions ==== */

/*
* behaves like strcmp
*/
int x_strcmp(const char* s1, const char* s2);
/* FIXME provide this file */
#include "xstring.c"

/*
* replace "all" occurrences of old_text with new_text
*
* conceptually, scans from left to right, replacing as it goes (so if there
* are overlapping occurrences, the leftmost one "wins" and the other is
* ignored)
*
* parameters:
* s (input string)
* old_text, new_text (text to replace and the replacement)
* all inputs are assumed to be non-NULL, and old_text is assumed to be
* non-empty
*
* returns a newly-allocated string that is a copy of s with "all"
* occurrences of old_text replaced with new_text, or NULL if malloc was
* unable to get space for the output string
*/
char* replace(const char* s, const char* old_text, const char* new_text);
/*
* describe this version
*
* writes to stdout anything you want printed at the start of the test
*/
void describe_replace(void);
/* FIXME provide this file */
#include "replace.c"

/* ==== function for timing ==== */
/*
* get time (preferably milliseconds-or-better resolution)
*
* returns time in seconds since some fixed starting point
*/
/* FIXME adjust as needed for your platform -- this works for Fedora Linux */
#include <sys/time.h>
double get_time(void) {
struct timeval tv;
if (gettimeofday(&tv, NULL) == 0)
return (double) tv.tv_sec + ((double) tv.tv_usec / (double) 1000000);
else
return 0.0;

}

/* ==== macros for tests ==== */

/*
* macro to perform one correctness test
*
* conforms to thespinoza1111interface to allow easy copy/paste of his tests;
* first parameter is unneeded for my code
*/
#define TESTER(ignored, in, old_text, new_text, expected) \
one_test(in, old_text, new_text, expected, &test_count, &error_count, 1);

/*
* macro to perform one timing test
*
* see timing_test comments for meanings of parameters
*/
#define TIMING_TESTER(fill_lng, old_lng, new_lng, in_rpts, rep_rpts, t_rpts) \
total_time += \
timing_test(fill_lng, old_lng, new_lng, in_rpts, rep_rpts, t_rpts, \
&test_count, &error_count);

/* ==== macros, variables, function definitions, etc. ==== */
/* (comments for individual functions are with code at end) */

#define STRING_OR_NULL(s) (((s) == NULL) ? "NULL" : s)

void do_correctness_tests(void);
void do_timing_tests(const int repeats);
double one_test(const char* s, const char* old_text, const char* new_text,
const char* expected_result, int* test_count_ptr, int* error_count_ptr,
const int repeats);
double repeat_test(const char* s, const char* old_text, const char* new_text,
const char* expected_result, int* test_count_ptr, int* error_count_ptr,
int repeats);
double timing_test(int fill_length,
const int old_text_length, const int new_text_length,
const int input_repeats, const int replace_repeats,
const int test_repeats,
int* test_count_ptr, int* error_count_ptr);
char* generate_text(const int fill_length, const int target_length,
const char fill_char, const char target_char, const int num_repeats);

/* ==== global variable(s) ==== */

int verbose;

/* ==== main program ==== */

int main(int argc, char* argv[]) {

char* usage_msg =
"usage: %s [--verbose] [--correctness] [--timing repeats]\n";

int correctness;
int timing;
int repeats;

char* strtol_endptr;
int next_arg;

verbose = 0;
correctness = 0;
timing = 0;
repeats = 0;

/* process arguments */

next_arg = 1;
while (next_arg < argc) {
if (x_strcmp(argv[next_arg], "--verbose") == 0) {
verbose = 1;
++next_arg;
}
else if (x_strcmp(argv[next_arg], "--correctness") == 0) {
correctness = 1;
++next_arg;
}
else if (x_strcmp(argv[next_arg], "--timing") == 0) {
timing = 1;
++next_arg;
if (!(next_arg < argc)) {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
repeats = strtol(argv[next_arg], &strtol_endptr, 10);
if (*strtol_endptr != '\0') {
fprintf(stderr, "invalid repeats\n");
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
++next_arg;
}
else {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}
}

if ((correctness == 0) && (timing == 0)) {
fprintf(stderr, usage_msg, argv[0]);
return EXIT_FAILURE;
}

/* do tests */

describe_replace();
fprintf(stdout, "\n");

if (correctness) {
do_correctness_tests();
}
if (timing) {
do_timing_tests(repeats);
}
return EXIT_SUCCESS;

}

/* ==== function definitions ==== */

/*
* perform correctness tests
*/
void do_correctness_tests(void) {
int test_count = 0;
int error_count = 0;

fprintf(stdout, "performing correctness tests\n\n");

fprintf(stdout, "simple tests\n\n");

/* FIXME: adjust as desired */

TESTER(0, "abcd", "ab", "xy", "xycd")
TESTER(0, "abcd", "cd", "xy", "abxy")
TESTER(0, "abcd", "bc", "xy", "axyd")
TESTER(0, "abcdabcd", "bc", "xy",...

read more »

Click to expand...

While thanking Ben, and rightfully so, for his help, you might have
had the common decency to also acknowledge theft of ideas (TESTER
macro, test suite, etc.) from me. I understand that you're probably
afraid of the regs here who use "the politics of personal destruction"
so readily, and part of their strategy is to make people afraid to
associate with their target do jour...here a person who's exposed
their incompetence. But you might have acted like a man.

Edward G. Nilges

blmblm · Feb 20, 2010

[ snip ]

In Herb's defense, I'd say that intelligent people find it hard to
treat mistakes (such as the design of feof) with respect, and enough
to remember the deviant implementation that feof is. Herb's failure to
recall to mind a massive boner means he's an intelligent person.

Be that as it may, I would say that someone writing a book about C
has an obligation to use its functions correctly. (You do rather
acknowledge this later.)

Whereas nasty little clerks (who as we saw this morning make their own
stupid mistakes, but expect to be forgiven) take a sour delight in
"learning" clap trap.

This isn't programming, Seebach. It's Trainspotting.

An intelligent programmer, when confronted with the incredibly poor
design of feof, turns right around and changes feof into a simple
macro:

#define READ(f) (attemptToRead(f), feof(f)) ? 0 : -1 // Change
attemptToRead to fgetc, fgets, etc.

He then can forget the design error, which is normalized deviance from
hell, since it forces the program to try to read SOMETHING to detect
IO. On any number of environments, this can change the state of the
underlying hardware in unpredictable ways. Even if the hardware can
signal to the runtime "don't you dare try to read from me, I have
nothing for you now", the C runtime HAS TO READ, which can easily
throw off auditing software or firmware, recording in all cases one
extra read.

I think there might be two separate issues here:

One is whether the input function blocks until either input or
end-of-file is detected. To me this seems like a sensible design;
input routines that don't block are useful in some situations,
but it seems to me that they potentially require the calling
program to repeatedly check for input, since "no input available
right now" does not necessarily imply end of input.

The other is whether one can implement blocking input in a way
that avoids whatever problems you have in mind (and more-specific
examples of those would be interesting to hear about). It's hard
for me to imagine circumstances in which it would be impossible to
correctly implement blocking input, and if it's possible but isn't
done correctly, well, in my thinking this indicates a problem with
the device drivers or the implementation of the C input functions
(inclusive "or", by the way). That it's possible to implement
an API badly -- well, it's not clear to me that this necessarily
indicates a flaw with the API.

?

Herb should have done a better job, but the error is so common among
good programmers and intelligent people so as to cast the aspersion on
C and standards committee members too cowardly and incompetent to fix
it.

(Quoted since I refer to it above.)

[ snip ]

blmblm · Feb 20, 2010

The main concern is, is this library easy to use? Only when the
program hits the treacle do ypu stary worrying about how efficient the
code is behind those nice interfaces.

Click to expand...

Yeah. My string library (like most C programmers, I wrote one at one point)
actually does have, under some circumstances, linked lists in it. It never
seems to have become an issue.

They're used to provide familiar semantics. Consider:
char s[256] = "hello, world!";
char *t;

t = strstr(s, "world");
strcpy(t, "sailor!");

You would expect this to leave s = "hello, sailor!" (note that it's a [], not
a *, and sized to provide enough room).

Click to expand...

At this point, it doesn't seem that "your" string library is what we
need, since you have to do its mallocs or fixed allocations. A true
solution would instead encapsulate all memory management.

Your library allows its users to make crappy and irresponsible secret
decisions as to "how big" things can get, which programmers are
rarely, if ever, qualified to make.

I'm not sure we can conclude, based on the post you're responding
to, that Seebs's string library has the flaws you describe -- to
me his example is meant only to indicate some of the potential
problems if there are multiple ways to access the same data
(in the example, via both s and t). How his actual string library
is defined and implemented -- well, he gives a bit more detail
below, but not enough for me to tell one way or another whether
your speculations are correct.

OK, your string library is something like a library of pointers
outside the library; a collection of maps. I did something like this
in a preliminary fashion in 1987, but found that it was too much
trouble. A string library needs ownership of the strings.

However, a full library MIGHT want to sponsor both "virtual" strings
(pointers to space the library has neither mallocated nor predefined)
and "actual" strings (where the library has done the memory
management).

[ snip signature ]

blmblm · Feb 20, 2010

spinoza1111wrote:

Click to expand...

Heathfield wrote a "linked list tool"
without pointing to data,

Click to expand...

[...] my point was that the DATA in the list is the DATA and
not a link, and nobody with a halfway decent education in data
structures would have proposed such a stupid design, save, perhaps,
for a linked list of data elements whose type is simple and fixed in
length...not a "reusable tool".

Click to expand...

Alex Stepanov, the designer of the C++ STL library is untutored in

Click to expand...

Again, these constant appeals to older men "untutored"

Did you read Nick's whole sentence before starting to type your
response? he said

which to me says nothing at all about whether he thinks Stepanov
is "untutored in computer science", but is instead an attempt to
suggest that if Heathfield's approach is indeed faulty, he's in
good company.

are sickeningly
ignorant excuses for ignorance, because many of them (such as
Dijkstra) went to universities before universities had computer
science programmes, and others (such as Stepanov) were self-educated
because of life events. Dijkstra most certainly, and probably
Stepanov, never engaged in anti-intellectual assaults and the politics
of personality which are in this newsgroup the tropes of the
malignantly uninformed.

[ snip ]

(And you even retained the text .... "Whatever", maybe.)

[ snip ]

Malcolm McLean · Feb 20, 2010

That's your job.

So we've got version 1 which uses malloc.h. The person who receives
the code say "Oh Oh, no good" and makes it stdlib.h.
Now the code isn't the same. So if we give a bug report to the writer
of version 1, we have to say that we were using a different version,
and he might say "well it's no longer my problem". So you need to
repeat the bug with version 1, which takes an extra day. Even with
perfect goodwill, you create layers of bureaucracy.

Naturally, when you make a change, you test everything. The problem is
that the person doing the testing "knows" that all he's done is
repalce a non-standard header. So nothing can go wrong, right? And 99%
of the time, he'll be right. So it creates a sort of sense of
unnecessary work, mandated by pointless regulations. We need the code
urgently, so tester skimps on the unit tests. Again 99% of the time
he's right. But discipline has slipped. We no longer have a bullet-
proof testing system.

That's the way with code. The trivial is important, because the whole
point of computers is that they can perform trivial tasks, without
human intervention.

blmblm · Feb 20, 2010

[ snip ]

[ snip ]

[ snip ]

(For the record, the part of my post you quoted includes only about
about half of my code, since apparently Google by default doesn't
present the entirety of long posts .... )

While thanking Ben, and rightfully so, for his help, you might have
had the common decency to also acknowledge theft of ideas (TESTER
macro, test suite, etc.) from me.

"Theft"? At worst I think I'm guilty of not being as explicit as
I might have been about sources. It was certainly not my intent to
"steal" any of your work, or to present it as my own, and I'm sorry
if that's how it comes across.

I call your attention to the following lines from my code, first
where I define my implementation of a TESTER macro:

/*
* macro to perform one correctness test
*
* conforms to the spinoza1111 interface to allow easy copy/paste of his tests;
* first parameter is unneeded for my code
*/
#define TESTER(ignored, in, old_text, new_text, expected) \
one_test(in, old_text, new_text, expected, &test_count, &error_count, 1);

I would have thought that the above comments at least make it clear
that my implementation of a TESTER macro conforms to an interface
defined by someone else -- identified somewhat cryptically as
"spinoza1111", but I thought that would be enough for anyone who
follows this group at all. It didn't really occur to me that the
idea of such a macro was startlingly original and something one would
need to give credit for, but if you think it is -- oh, I don't know.

If I were really intending to steal your ideas, would I have
mentioned you at all? I suppose that comment about making it easy
to copy/paste might come across that way, but really -- my intent
was to make it easy to keep up with your evolving test suite and
thereby demonstrate that my proposed solution(s) "work" according
to your criteria.

And then at the point where I actually invoke your tests:

fprintf(stdout, "spinoza1111 tests\n\n");

I suppose I thought this would make it clear enough to anyone
following the relevant discussion that I was proposing to run
tests proposed/invented/something by this spinoza1111 entity.

Thanking you .... Before I did that I would want to examine the
actual tests more closely; I have not looked at them carefully
enough to have an opinion about their quality. My idea was to
use your tests rather than inventing my own as a way of keeping
to a common frame of reference.

But if you want to claim that I should have been more explicit
that these were your tests and not mine, well, fair enough, and
I do apologize for that.

If I post further versions of my code I will do what I can to
make sure credit is given where due.

I understand that you're probably
afraid of the regs here who use "the politics of personal destruction"
so readily, and part of their strategy is to make people afraid to
associate with their target do jour...here a person who's exposed
their incompetence. But you might have acted like a man.

Why would I want to act like a man? (I'm not one.)

I think I won't address the question of whether I want to associate
with you, or be perceived as associated with you, and why.

I will note that some years ago in a thread in comp.programming
I'm pretty sure I backed you up in a tangential dispute about --
something about how things are done in assembler language on
IBM mainframees, if I remember right. I could try to look up
details if anyone's interested, but between the vagueness of my
recollection and the various shortcomings of Google's interface
to its archives, I don't know ....

blmblm · Feb 20, 2010

[ snip ]

If you don't want other people to use your ideas, why post them? If you
don't want other people to use your ideas without acknowledgement,
consider asking for acknowledgement when posting them. Most people will
honour this.

Well, *I* certainly would -- or at least if I didn't it would be an
inadvertent mistake rather than a deliberate attempt to plagiarize.
In all truth I think perhaps I am guilty here of not being as
thorough as I should have been about acknowledging sources,
and I feel rather bad about that -- I do try to be careful about
giving credit where credit is due, but obviously sometimes, um,
"mistakes are made".

(a) you don't;
(b) she isn't;

Well now, you don't really know that ....

But one might keep
silent on a subject for reasons other than fear.

(c) they don't.

[ snip ]

Why *should* she act like a man?

Wow, you remember .... possibly because I made a fuss in some
previous discussion. (Which might not be best behavior either.)

blmblm · Feb 20, 2010

[ snip ]

"Theft"? At worst I think I'm guilty of not being as explicit as
I might have been about sources. It was certainly not my intent to
"steal" any of your work, or to present it as my own, and I'm sorry
if that's how it comes across.

[ snip ]

But if you want to claim that I should have been more explicit
that these were your tests and not mine, well, fair enough, and
I do apologize for that.

If I post further versions of my code I will do what I can to
make sure credit is given where due.

Looking over what I posted .... Oh dear. I really *should* have
put something at the top of the file "spinoza-tests.c" indicating
that those lines were copied verbatim from your test suite -- but
the intent, again, was to keep to a common frame of reference,
and perhaps provide an easy way for others to run your tests,
rather than to plagiarize.

Proposed comments for the file containing the tests:

/*
* The "spinoza1111 tests", as developed by Edward Nilges.
* This version is copied from Usenet post Message-ID [whatever].
*/

(I'm not actually sure I correctly copied and pasted the latest
version of your tests. Something else to fix.)

Proposed comments for the file containing the driver code:

/*
* Correctness tests use the TESTER macro interface proposed
* by Edward Nilges in [another reference to a Usenet post].
*/

[ snip ]

blmblm · Feb 20, 2010

(e-mail address removed) wrote:

No, it's just that I have a very, very, very good memory. (Alas, I also
have a very, very, very bad memory.)

In other words -- a selective memory, like most of us?

Further off-topic ....

One of the things I find most entertaining, or something, about
my current day job (part of which is teaching undergraduate CS
courses) is how the students apparently retain *some* of the
flood of words I pour into their ears several times a week, but
there's no telling what parts they'll remember, or whether their
recollection of what I said will match mine. There have been
many interesting conversations beginning with a student saying
"But don't you remember, you said ...." followed by something I
either don't remember saying at all, or don't remember saying in
quite the form in which it's quoted back to me. Sort of a

.

Seebs · Feb 20, 2010

Actually, it is. I do not expect you to understand this.

Lemme jump to a totally unrelated field to explain/support this:

World of Warcraft. Popular game, many people play it. There's a broad
range of player skill. Some players are quite good. Some are quite bad.

Imagine that you want to know whether someone is good or bad. Certainly,
watching them play for a while can give you a pretty good idea. But you
know what else can give you a pretty good idea? Reading anything they write.
If they can't spell, they're probably bad. If they can spell, they're
probably good.

This may seem like it ought to be unrelated. Strictly speaking, it's
not a guarantee; I know some skilled players who can't spell, and I know
some good spellers who don't play very well.

But by and large:

Attention to detail is a character trait.

People who are careful and thoughtful in one area are usually careful and
thoughtful in others. People who don't care in one area usually don't
care in others.

Thus, someone who includes <malloc.h> is likely to make mistakes based on
a characteristic habit of picking things up without really understanding them,
or various superstitions unsupported by the nature of the language. For
instance, I'd expect such a person to try to duplicate the features of the
C library, but do it poorly, and introduce multiple bugs into a hundred
line version of something that should have been under twenty lines and not
needed noticeable debugging.

-s

Seebs · Feb 20, 2010

I'm not sure we can conclude, based on the post you're responding
to, that Seebs's string library has the flaws you describe -- to
me his example is meant only to indicate some of the potential
problems if there are multiple ways to access the same data
(in the example, via both s and t). How his actual string library
is defined and implemented -- well, he gives a bit more detail
below, but not enough for me to tell one way or another whether
your speculations are correct.

I think it *allows* you to do your own memory allocation in some cases,
but the default is that it's transparent.

If you do:
sz *foo = sznew("hello, world!");
sz *bar = szsz(foo, "o");

you end up with foo having the text "hello, world!", bar having the
text "o, world!", and actually they share a single storage region,
which is internal to them. If you szfree(foo), bar is automatically
deallocated as it is part of foo.

Interestingly, his post does give you enough information to make a qualified
guess as to whether his speculations are correct: If you check the "From"
line, you'll note that it's from a poster with a practically superhuman
capacity for finding something false to say given any slight ambiguity.

-s

Chris M. Thomasson · Feb 20, 2010

Richard Heathfield said:
Fine - and how are you going to manage all those objects?

I am not exactly sure what your getting at here. AFAICT, Edward was
referring to method 1:

http://groups.google.com/group/comp.lang.c/msg/1ac1c32aa51c11a7

e.g.:
___________________________________________________
struct node
{
struct node* next;
void* object;
};
___________________________________________________

So, I can create a bunch of objects and store them in a linked list based on
the node structure above:
___________________________________________________
static struct node* g_objects = NULL;

struct object
{
unsigned count;
/* [whatever] */
};

struct node*
insert(struct node** phead,
struct object* obj)
{
struct node* n = malloc(sizeof(*n));

if (n)
{
++obj->count = 0;
n->object = obj;
n->next = *phead;
*phead = n;
}

return n;
}

void create()
{
size_t i;

for (i = 0; i < 1024; ++i)
{
struct object* obj;

if (! (obj = malloc(sizeof(*obj))))
{
break;
}

obj->count = 0;

if (! insert(&g_objects, obj))
{
free(obj);
break;
}
}
}
___________________________________________________

And now, I guess you could say that the `g_objects' list and the reference
count are "managing" all of those objects. Is that what you were think about
when you wrote:

"Fine - and how are you going to manage all those objects?"

Richard?

blmblm · Feb 20, 2010

Lemme jump to a totally unrelated field to explain/support this:

World of Warcraft. Popular game, many people play it. There's a broad
range of player skill. Some players are quite good. Some are quite bad.

Imagine that you want to know whether someone is good or bad. Certainly,
watching them play for a while can give you a pretty good idea. But you
know what else can give you a pretty good idea? Reading anything they write.
If they can't spell, they're probably bad. If they can spell, they're
probably good.

This may seem like it ought to be unrelated. Strictly speaking, it's
not a guarantee; I know some skilled players who can't spell, and I know
some good spellers who don't play very well.

But by and large:

Attention to detail is a character trait.

People who are careful and thoughtful in one area are usually careful and
thoughtful in others. People who don't care in one area usually don't
care in others.

But apparently there are some exceptions -- I mean, whatever I may
think of the content and style of Mr. Nilges's posts, the spelling
and grammar and so forth are generally impeccable, aside from the
occasional typo. "Just sayin'."

Moi · Feb 20, 2010

"Richard Heathfield" <[email protected]> wrote in message

struct object
{
unsigned count;
/* [whatever] */
};

struct node*
insert(struct node** phead,
struct object* obj)
{
struct node* n = malloc(sizeof(*n));

if (n)
{
++obj->count = 0;

There is a sequence-point error here,
( probably caused by the precedence of ++ and -> being other than you expect)
You meant:
obj++; obj->count = 0;
?

n->object = obj;
n->next = *phead;
*phead = n;
}

return n;

HTH
AvK

Chris M. Thomasson · Feb 20, 2010

Moi said:
"Richard Heathfield" <[email protected]> wrote in message

Click to expand...

struct object
{
unsigned count;
/* [whatever] */
};

struct node*
insert(struct node** phead,
struct object* obj)
{
struct node* n = malloc(sizeof(*n));

if (n)
{
++obj->count = 0;

Click to expand...

There is a sequence-point error here,
( probably caused by the precedence of ++ and -> being other than you
expect)
You meant:
obj++; obj->count = 0;
?

I meant:

++obj->count;

Well, this kind of non-sense happens when you quickly type code directly
into the damn newsreader!

;^o

Seebs · Feb 20, 2010

But apparently there are some exceptions -- I mean, whatever I may
think of the content and style of Mr. Nilges's posts, the spelling
and grammar and so forth are generally impeccable, aside from the
occasional typo. "Just sayin'."

Yeah. But the general rule seems to be that an area of carelessness suggests
there will be others. So it's reasonable to infer from <malloc.h> that the
rest of the code is *likely* to be poor.

-s

Ike Naar · Feb 20, 2010

[snip]

Click to expand...

While thanking Ben, and rightfully so, for his help, you might have
had the common decency to also acknowledge theft of ideas (TESTER
macro, test suite, etc.) from me.

Your TESTER macro and test suite use a few ideas from others, too.

spinoza1111 · Feb 21, 2010

spinoza1111 said:
spinoza1111 said:

[snip]

Click to expand...

While thanking Ben, and rightfully so, for his help, you might have
had the common decency to also acknowledge theft of ideas (TESTER
macro, test suite, etc.) from me.

Click to expand...

Your TESTER macro and test suite use a few ideas from others, too.

That is not the case, for the most part. The style diverges
considerably from that used here.

Of course, the tester macro used GOOD ideas used before such as
parenthesizing formal parameters, and it is well known that reusable
test suites are good things. But: these ideas are seldom used here and
are falling into desuetude in America, because in fact they attract
mockery and complaints when consistently applied. The ideas I put
forth a solidly expressed enough to be stealable, and part of the
theft of intellectual property is the mockery and marginalization of
the original producer.

Implementing strstr	229	Mar 19, 2010
Future standard GUI library	51	May 18, 2013
Multithreading and compatibility library (libconfig)	1	Jan 23, 2013
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Hello all! Noob here with completely unrealistic ambitions. Happy to join the crew and get good enough to help others.	4	Aug 13, 2024
AW: Pure python standard library and License	0	Mar 4, 2011
What is the most astounding C++ syntax construct?	0	Dec 22, 2022
review of the "container library", part 1/?	18	Mar 1, 2011

Efficency and the standard library

blmblm

spinoza1111

spinoza1111

blmblm

blmblm

blmblm

Malcolm McLean

blmblm

blmblm

blmblm

blmblm

Seebs

Seebs

Chris M. Thomasson

blmblm

Moi

Chris M. Thomasson

Seebs

Ike Naar

spinoza1111

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads