I want to know that size_type returns -1 (minus one)
size_type never contains -1. It can't, since it is an unsigned
type. (Also, variables and types don't "return" anything. Only
functions return things.)
is safe before I extract one string into two substrings.
First example is safe and second example is not sure.
const basic_string <char>::size_type npos = -1;
Which results in an implicit conversion, according to the rules
of conversion of signed to unsigned. Basically, npos will be
the largest possible value of size_type.
But why are you defining this? (And why are you using
basic_string< char > instead of the typedef std::string?) If,
for convenience, you want a local constant variable (to be able
to write npos, rather than std::string::npos), then:
std::string::size_type const npos = std::string::npos;
is the simplest solution.
basic_string< char >::size_type begin_index, end_index, length_index;
Just a general rule (good practice, not a language requirement):
don't define variables until you can initialize them.
string data = "Hello World!!", token1, token2;
begin_index = data.find_first_not_of( " ", end_index );
end_index = data.find_first_of( " ", begin_index );
token1 = data.substr( begin_index, end_index - begin_index );
length_index = token1.length();
begin_index returns 0 and end_index returns 5. substr is safe.
begin_index = data.find_first_not_of( " ", end_index );
end_index = data.find_first_of( " ", begin_index );
token2 = data.substr( begin_index, end_index - begin_index );
length_index = token2.length();
begin_index returns 6 and end_index returns -1.
Again, end_index doesn't return anything; data.substring returns
std::string::npos. Which is the largest possible value which
can be held in an std::string::size_type.
Is substr safe for token2 because end_index returns -1
indicates space character is not found.
What does the documentation for substr say? What is the meaning
of the second argument? (I don't have my copy of the standard
handy to quote exactly, but what it says is something along the
lines of "the second argument specifies the maximum length of
the returned string", and that the return value is something
like "std::string( s.begin() + position, s.begin() + position +
std::min(length, s.size() - position))".)
Another question---is size_type the same as size_t?
For std::string and std::wstring, yes. If you instantiate
std::basic_string with a non-standard allocator, not
necessarily.
They are always unsigned maximum integer.
No. size_t is an unsigned integer large enough that the size of
the largest possible object can be represented in it. I've used
machines where size_t was 16 bits, for example.
Can I always copy variable from size_type to signed integer or
unsigned integer?
There are several possible answers to that question. If you
mean copy without loss of value, the answer is no; a lot of
modern machines have a 64 bit size_type, but a 32 bit integer
type, and there's no way you can convert a 64 bit type into a 32
bit type without loss of value.
Formally, of course, you can convert to the unsigned
integer---the results of converting to the signed integer are
implementation defined, but on most implementations, the
conversion is well defined as well. But if the value doesn't
fit, you'll get some other value.
Finally, in practice, it's likely that practical constraints
mean that you won't have strings larger than what can be
represented in an int. In which case, there's no problem.
const basic_string <char>::size_type npos = -1;
The results here are implementation defined. It's very likely
that sNpos will end up -1, but it's not guaranteed by the
standard. (And if sNpos does end up -1, then the conversion
back to size_t is guaranteed, so comparison with a size_t will
work.)
unsigned int uNpos = npos;
Perfectly legal, but uNpos will not compare equal to npos on
most 64 bit machines.
I'm not too clear as to what your goal is. First, for better or
for worse, std::string uses an unsigned size_t for all of its
indexing and positionning. Mixing signed and unsigned in C++
often gives surprising results, and should be avoided. (Using
unsigned for numeric values should generally be avoided as well,
but the rule about not mixing is more critical, and trumps this
rule---if an external library uses unsigned, you should stick
with whatever type it uses.)
Also, and this is really just a question of personal preference,
but I prefer by far using the algorithms in <algorithm> to the
special member functions in std::string. Once you're used to
the standard library, it just seems more comfortable working
with iterators than with indexes. And it avoids all of the
issues related to unsigned types in C++. Given that any time
you're going to be processing text, you're going to be using
functions like isalpha, isspace, etc. a lot, the first thing to
do is to defined predicate object types for each of the
functions and its complement. (Macros make this fairly easy.)
Then you use them with std::find_if. So your initial example
becomes:
typedef std::string::const_iterator text_iterator;
std::string const data( "Hello, world!" );
text_iterator begin_token = std::find_if(data.begin(), data.end(),
is_not_space());
text_iterator end_token = std::find_if(begin_token, data.end(),
is_space());
// or is_not_alnum(), or whatever...
std::string const first_token( begin_token, end_token );
(As I say, this is a personal preference, not any established
rule. But IMHO, it fits in better with the philosophy of the
standard library.)