What good is in other script languages string handling which can't be
made in C? C has regex, strcmp, strncat, strncpy, strstr. In case
there's no one, just write one. It's C way. Can you tell more about
this?
Okay.
It's not a question of whether you CAN do them. It's how easy or expressive
it is.
While people have written regex tools (some of which perhaps support
substitution) in C, that doesn't mean that they're comparable.
Lemme drift off-topic a little. Say I've got Spencer's regex library
to hand, or something similar.
Let's compare two pieces of code. Both are going to take the string
"hello, world", and return the longest subset of it which consists of
a punctuation character (one of ,.! for my purposes) followed by some
string of letters, ending with a vowel.
That regular expression is:
[,.!].*[aeiou]
We'll ignore the fiddly bits of the spec; this is just a sort of example
of how things differ.
int
main(void) {
char *pattern = "[,.!].*[aeiou]";
char *haystack = "hello, world";
char *s;
size_t len;
regex_t preg;
regmatch_t pmatch[10];
if (regcomp(&preg, pattern, 0)) {
fprintf(stderr, "Error compiling regex.\n");
exit(1);
}
if (regexec(&preg, haystack, 1, pmatch, 0) != 0) {
fprintf(stderr, "No match.\n");
exit(1);
}
s = haystack + pmatch[0].rm_so;
len = pmatch[0].rm_eo - pmatch[0].rm_so;
printf("%.*s\n", (int) len, s);
regfree(&preg);
return 0;
}
I'm a moderately experienced programmer who's used the regexec library
a few times before. I had to look several things up in the man page to
do this, and my first try had a bug, but this now does in fact print
", wo".
Now let's look at this in a few other languages.
Perl:
"hello, world" =~ /[,.!].*[aeiou]/;
print "$&\n";
Ruby:
puts "hello, world".match(/[,.!].*[aeiou]/)
Want a better example?
Let's do something fancier. Let's search for the first word after
punctuation, and replace it with "kitty".
Ruby:
x = "hello, world!"
x[/, ([a-z]*)/, 1] = "kitty"
puts x
C:
(Omitting context, just body-of-main here.)
char *pattern = ", \\([a-z]*\\)";
char *haystack = "hello, world!";
char *new;
char *s;
size_t len;
regex_t preg;
regmatch_t pmatch[10];
if (regcomp(&preg, pattern, 0)) {
fprintf(stderr, "Error compiling regex.\n");
exit(1);
}
if (regexec(&preg, haystack, 2, pmatch, 0) != 0) {
fprintf(stderr, "No match.\n");
exit(1);
}
s = haystack + pmatch[1].rm_so;
len = pmatch[1].rm_eo - pmatch[1].rm_so;
new = malloc(strlen(haystack) + strlen("kitty") - len + 1);
sprintf(new, "%.*s%s%s", pmatch[1].rm_so, haystack, "kitty",
haystack + pmatch[1].rm_eo);
printf("%s\n", new);
regfree(&preg);
free(new);
Notice a pattern?
It is frankly STUPID to try to do stuff like this in C just because you like
C. Doing it in C because you are writing the implementation of a higher-level
language? Sure. Doing it in C because you need to do a small amount of
string manipulation in a large program which needs to be C for compelling
reasons (say, an operating system kernel)? Sure.
For extra credit: Did I calculate the allocated space for "new" correctly?
Why or why not?
And most importantly: Why the HELL would you ever choose to use a language
where that question is even coherent for string processing? I know reasons
to do this, but they're... rare.
-s