pozz said:
I have a string that is generated from a numerical keypad with
decimal numbers 0-9 and the two special characters * and #,
similar to a phone keypad.
I want to parse the string and make something or something else
depending on the current state and the string itself. Generally
the string can be:
#
*#
*<g>#
<c>*<g>#
<c>#
<c>*#
where
<c> is a 4- to 6-digits code
<g> is the exact digit '1', '2' or '3'.
Depending on the current state, some combinations are valid,
others not. In some cases the code should be valid (already saved
in the system), in other cases it could be new (when a new code
must be entered to change the current one).
I think I have solved the problem trying to split the string in to
three fields: <c>, '*' and <g>. If the first char is a digit, it
must be a 4- to 6- code, otherwise it must be a '*' or a '#', and
so on...
Even if my code works, I don't think this is an elegant solution.
What happens if in the feature I decide to change one combination?
For example, from "#" to "###"? Or from "<c>*#" to "*<c>#"? I
have to recode everything, because my assumption the string can be
split in three fields in that order is not true anymore.
What is the best approach to implement a simple parser like this?
One approach:
1. Devise a data structure that can represent all the
different patterns you have to recognize;
2. Write a function that can scan a string and see if
it matches a particular pattern;
3. Make an instance of the pattern structure for each
of your allowed patterns;
4. Call the scanning function repeatedly, for each pattern,
until a match is found.
For example, we might represent a pattern as a sequence (that is,
an array) of "fields". A field is a set of allowed characters
and a minimum and maximum number of characters allowed in the
field:
typedef struct {
const char *allowed_set;
unsigned char m, n;
} Field;
The final Field of a pattern has 'allowed_set' == 0.
Now we can make an array to hold pointers to the first
element of each legal pattern:
static Field *patterns[] = {
(Field[]){ {"*", 1,1 }, {"123", 1,1 }, {"#", 1,1 }, 0},
(Field[]){ {"*", 1,1 }, {"#", 1,1 }, 0},
(Field[]){ {"#", 1,1 }, 0},
(Field[]){ {"0123456789", 4,6}, {"*", 1,1}, {"123", 1,1}, {"#", 1,1}, 0},
(Field[]){ {"0123456789", 4,6}, {"*", 1,1}, {"#", 1,1}, 0},
(Field[]){ {"0123456789", 4,6}, {"#", 1,1}, 0},
};
A function to scan a string and see if it matches a given pattern
can be written very concisely by using tail recursion:
int
matches_string( Field f[], const char *s ){
if( ! f->allowed_set ) return *s == 0;
size_t k = strspn( s, f->allowed_set );
return k < f->m || k > f->n ? 0 : matches_string( f+1, s+k );
}
(Incidentally I am using the standard library function strspn()
here, but if it isn't available it should be easy enough to write
a replacement - certainly no more than 10 lines of code.)
Putting all this together, here is a program that looks at
each of its arguments and reports which pattern is matched:
#include <string.h>
#include <stdio.h>
typedef struct {
const char *allowed_set;
unsigned char m, n;
} Field;
static Field *patterns[] = {
(Field[]){ {"*", 1,1 }, {"123", 1,1 }, {"#", 1,1 }, 0},
(Field[]){ {"*", 1,1 }, {"#", 1,1 }, 0},
(Field[]){ {"#", 1,1 }, 0},
(Field[]){ {"0123456789", 4,6}, {"*", 1,1}, {"123", 1,1}, {"#", 1,1}, 0},
(Field[]){ {"0123456789", 4,6}, {"*", 1,1}, {"#", 1,1}, 0},
(Field[]){ {"0123456789", 4,6}, {"#", 1,1}, 0},
};
int which_pattern_matches( const char * );
int matches_string( Field f[], const char *s );
int
main( int argc, char *argv[] ){
int i;
for( i = 1; i < argc; i++ ){
int k = which_pattern_matches( argv
);
printf( "Argument %15s matches pattern %3d\n", argv, k );
}
return 0;
}
int
which_pattern_matches( const char *string ){
for( unsigned n = 0; n < sizeof patterns / sizeof patterns[0]; n++ ){
if( matches_string( patterns[n], string ) ) return n;
}
return -1;
}
int
matches_string( Field f[], const char *s ){
if( ! f->allowed_set ) return *s == 0;
size_t k = strspn( s, f->allowed_set );
return k < f->m || k > f->n ? 0 : matches_string( f+1, s+k );
}
A sample compile/run command:
gcc -std=c99 -pedantic example.c && a.out '#' '*#' '*1#' '01234*2#' 'foo'
By the way, it might be useful for the scanning routine to
remember the start of each field as it goes. That can be
done by revising matches_string to take an extra parameter,
the address of a buffer into which to record the field start
locations:
int
matches_string( Field f[], const char *s, const char *r[] ){
... similar to above, with one extra line ...
}