tokenizer class

A

aaragon

Hi, I spent some time trying to make the boost::tokenizer work with
some code without success. At the end I found out that the tokenizer
actually takes place when you iterate, no when you construct the
tokenizer object... go figure!
Anyways, I grabbed a tokenize function that I found in www.linuxselfhelp.com
and I converted that function into a class. I guess that this gives
the same functionality of the boost::tokenizer class, it is very
simple and also gives me constant access to the strings in the
container (something that I needed in my code)... it is portable
because it only contains c++ std library classes... plus, it actually
tokenizes the string on construction so you don't have to bother with
documentation to find out that!!!
Well, here is the code. I would appreciate any suggestions on how to
improve it. Thanks in advance.

class Tokenizator {

typedef string TokenType;
typedef vector<TokenType> ContainerType;
typedef ContainerType::const_iterator IteratorType;
ContainerType tokens_;

public:

Tokenizator(const string& str, const string& delimiters = " ") {
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters,
0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters,
lastPos);

while (string::npos != pos || string::npos != lastPos) {
// Found a token, add it to the vector.
tokens_.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
IteratorType begin() {
return tokens_.begin();
}
IteratorType end() {
return tokens_.end();
}
const TokenType& operator[](size_t i) {
return tokens_;
}
};
 
B

Barry

aaragon said:
class Tokenizator {
public:
typedef string TokenType;
private:

typedef vector<TokenType> ContainerType;
public:
typedef ContainerType::const_iterator IteratorType;
private:

ContainerType tokens_;

public:

Tokenizator(const string& str, const string& delimiters = " ") {

why not 'Tokenizer'?
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters,
0);

why?

Tokenizator tknr("\012--34-9533", "-");

for (Tokenizator::IteratorType it = tknr.begin();
it != tknr.end();
++it)
{
cout << *it << endl;
}

what this supposed to produce?
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters,
lastPos);

while (string::npos != pos || string::npos != lastPos) {
// Found a token, add it to the vector.
tokens_.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
IteratorType begin() {
return tokens_.begin();
}
IteratorType end() {
return tokens_.end();
}
const TokenType& operator[](size_t i) {
return tokens_;
}


I think there's no need to provide operator[] for random access
at least you should provide token size/length operation.
or you should use 'tokens_.at(i)', which throws;
any way this is operator[] is a bad idea.

and all the member functions should all be *const*;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top