S
Steve
I have the following STL containers:
(using namespace std
typedef vector<string> TokenVector; // e.g [the] [cat] [sat] [on]
[the] [mat]
typedef map<TokenVector, long > NGramMap; // a list of unique n-grams
(as above), and a freq count.
My goal is to search for n-grams that match a pattern with wildcards,
for instance:
[the] [cat] [sat] [?] [the] [mat]
and return a list of all matches with all the possibilities for the
wildcard [?], along with the freq count
Assuming the map is sorted alphabetically, is there some way to search
them - like a regular expression - that return pattern matches?
The only way I can think of at the moment is to search for "[the]
[cat] [sat]", then sift through those matches.
This gets messy, though, when the pattern starts with a wildcard, or
contains 2 or 3 wildcards.
This seems like it might be a common task, so I thought I'd ask here
before re-inventing the wheel.
Does STL have ready-made solutions for this kind of search?
Does it make the problem any easier if it's constrained to _always_
being 6-grams, and that there are never more than 3 wildcards?
Thanks for any clues, even if it's just what topics I need to study in
order to crack it.
Steve
(using namespace std
typedef vector<string> TokenVector; // e.g [the] [cat] [sat] [on]
[the] [mat]
typedef map<TokenVector, long > NGramMap; // a list of unique n-grams
(as above), and a freq count.
My goal is to search for n-grams that match a pattern with wildcards,
for instance:
[the] [cat] [sat] [?] [the] [mat]
and return a list of all matches with all the possibilities for the
wildcard [?], along with the freq count
Assuming the map is sorted alphabetically, is there some way to search
them - like a regular expression - that return pattern matches?
The only way I can think of at the moment is to search for "[the]
[cat] [sat]", then sift through those matches.
This gets messy, though, when the pattern starts with a wildcard, or
contains 2 or 3 wildcards.
This seems like it might be a common task, so I thought I'd ask here
before re-inventing the wheel.
Does STL have ready-made solutions for this kind of search?
Does it make the problem any easier if it's constrained to _always_
being 6-grams, and that there are never more than 3 wildcards?
Thanks for any clues, even if it's just what topics I need to study in
order to crack it.
Steve