J
jacob navia
Task:
given a text, and a set of characters, find the first occurrence of any
of the characters in the set.
Interface
There are two possibilities:
1)
size_t strFindFirst(char *text,char *set);
The function would return the index of the first character in the string.
2)
char *strFindFirst(char *text,char *set);
The function would return a pointer to a character in the middle of the
string.
Discussion
----------
The first solution is OK if you are programing in C++ where counted
strings are the norm. Lcc-win also provides you with counted strings
using operator overloading and the first solution is choosen in that
context.
For a pure "standard C" solution however, the second solution could be
more interesting: you have immediately the wanted character with just
dereferencing a pointer.
Actually of course both solutions are completely equivalent since you
can add to the start of the string the index returned by the first
solution and obtain a pointer that is equivalent to the second solution.
A first approach:
1 #include <string.h>
2 char *strFindFirstIn(char *str,char *set)
3 {
4 char *pSet;
5 int c = *str;
6 while (*str) {
7 pSet = set;
8 while (*pSet) {
9 if (c == *pSet++) return str;
10 }
11 c = *++str;
12 }
13 return str;
14 }
For each character in the text we search if it matches any of the
characters in the string. If we have N characters in the text and M
characters in the set, we make N x M character comparisons.
A more sophisticated approach:
1 char *strFindFirstIn1(char *str,char *set)
2 {
3 int c = *str;
4 unsigned char tab[256];
5 unsigned char *p=str;
6
7 if (*p==0) return p;
8 memset(tab,0,sizeof(tab));
9 while (*set) {
10 tab[*(unsigned char *)set++]=1;
11 }
12 while (*p) {
13 if (tab[*p]) return p;
14 p++;
15 }
16 return p;
17 }
We make a table of 256 positions filled initially with zeroes. For each
character in the set we fill the table with a "1" value. Then, we go
through the text and see for each character if the position contans a
zero (no match) or a one (match).
Note that for repeated characters in the set, the first solution will
again test for the character... but not in the second solution.
Of course the second solution has a drawback if the text is very short
and the set very small. It needs a zeroing of 256 positions, then the
filling of the occupied positions.
But in the case of a very long text and a longuish set, the second
soution wins since it does only one comparison per character!
How about you?
Can you give a better solution?
Comments welcome.
Here is a test program.
----------------------
1 #include <stdio.h>
2 int main(void)
3 {
4 char *s=
"This sentence will be changed to replace vowels by asterisks";
5 char *p = strFindFirstIn1(s,"aeiou");
6 while (*p) {
7 *p++ = '*';
8 p = strFindFirstIn(p,"aeiou");
9 }
10 printf("%s\n",s);
11 }
jacob
given a text, and a set of characters, find the first occurrence of any
of the characters in the set.
Interface
There are two possibilities:
1)
size_t strFindFirst(char *text,char *set);
The function would return the index of the first character in the string.
2)
char *strFindFirst(char *text,char *set);
The function would return a pointer to a character in the middle of the
string.
Discussion
----------
The first solution is OK if you are programing in C++ where counted
strings are the norm. Lcc-win also provides you with counted strings
using operator overloading and the first solution is choosen in that
context.
For a pure "standard C" solution however, the second solution could be
more interesting: you have immediately the wanted character with just
dereferencing a pointer.
Actually of course both solutions are completely equivalent since you
can add to the start of the string the index returned by the first
solution and obtain a pointer that is equivalent to the second solution.
A first approach:
1 #include <string.h>
2 char *strFindFirstIn(char *str,char *set)
3 {
4 char *pSet;
5 int c = *str;
6 while (*str) {
7 pSet = set;
8 while (*pSet) {
9 if (c == *pSet++) return str;
10 }
11 c = *++str;
12 }
13 return str;
14 }
For each character in the text we search if it matches any of the
characters in the string. If we have N characters in the text and M
characters in the set, we make N x M character comparisons.
A more sophisticated approach:
1 char *strFindFirstIn1(char *str,char *set)
2 {
3 int c = *str;
4 unsigned char tab[256];
5 unsigned char *p=str;
6
7 if (*p==0) return p;
8 memset(tab,0,sizeof(tab));
9 while (*set) {
10 tab[*(unsigned char *)set++]=1;
11 }
12 while (*p) {
13 if (tab[*p]) return p;
14 p++;
15 }
16 return p;
17 }
We make a table of 256 positions filled initially with zeroes. For each
character in the set we fill the table with a "1" value. Then, we go
through the text and see for each character if the position contans a
zero (no match) or a one (match).
Note that for repeated characters in the set, the first solution will
again test for the character... but not in the second solution.
Of course the second solution has a drawback if the text is very short
and the set very small. It needs a zeroing of 256 positions, then the
filling of the occupied positions.
But in the case of a very long text and a longuish set, the second
soution wins since it does only one comparison per character!
How about you?
Can you give a better solution?
Comments welcome.
Here is a test program.
----------------------
1 #include <stdio.h>
2 int main(void)
3 {
4 char *s=
"This sentence will be changed to replace vowels by asterisks";
5 char *p = strFindFirstIn1(s,"aeiou");
6 while (*p) {
7 *p++ = '*';
8 p = strFindFirstIn(p,"aeiou");
9 }
10 printf("%s\n",s);
11 }
jacob