C++ strtok

A

abcd

Hello C++ users,
Greetings.

I have a following question regarding strtok function used for string
tokenizing. As I understand, strtok internally uses static variable to
keep track of the string passed to it so that tokens can be searched
based on delimiter.
After the strtok returns NULL, it means that no tokens are available.

What if now strtok is invoked with another string to search for
tokens?? What happens to the internal static buffer which was
initialized to the previous string, when is that released??

Best Regards
Sumit
 
G

gwowen

What if now strtok is invoked with another string to search for
tokens?? What happens to the internal static buffer which was
initialized to the previous string, when is that released??

There is no static buffer. strtok() modifies the string passed to it
as an argument, by overwriting the delimiter characters with '\0' so
that the return values points to the (modified) input C-string. There
is static state between calls (i.e. where the last tokenization got
to), but no dynamic buffer is needed.
 
V

Vlad from Moscow

Hello C++ users,
Greetings.

I have a following question regarding strtok function used for string
tokenizing. As I understand, strtok internally uses static variable to
keep track of the string passed to it so that tokens can be searched
based on delimiter.
After the strtok returns NULL, it means that no tokens are available.

What if now strtok is invoked with another string to search for
tokens?? What happens to the internal static buffer which was
initialized to the previous string, when is that released??

Best Regards
Sumit

As I understand it does not have any internal static buffer. So there
is no need to release anything. It has a static variable of type point
to char. When you supply another string to process that static pointer
is set to this string. So in the very beginning there is a check
whether supplied string has value of NULL. If it is not equal to NULL
then the static pointer is set to this new value.
 
M

Marcel Müller

I have a following question regarding strtok function used for string
tokenizing. As I understand, strtok internally uses static variable to
keep track of the string passed to it so that tokens can be searched
based on delimiter.
After the strtok returns NULL, it means that no tokens are available.

What if now strtok is invoked with another string to search for
tokens??

In this case the static state is discarded. Once you passed another
string you cannot continue to tokenize the first one.
What happens to the internal static buffer which was
initialized to the previous string, when is that released??

There is nothing to release. The internal state has fixed size and
refers to the string buffer you supplied at the first call. The state is
globally allocated in the data segment of the C++ runtime.

More exactly, modern thread-safe C++ runtimes allocate the storage for
the internal state of strtok as thread local storage. Otherwise strtok
would be almost useless.


In practice I avoid to use strtok at all.

Firstly, because it is not re-entrant. I.e. you must not parse another
string while you have to complete the first one. This divides the
functions that you are allowed to call from within the parser loop into
the ones that never call strtok and the functions that might call strtok.
While it is trivial to decide this for runtime library functions it
becomes error prone for your own code. E.g. an object method you call
might internally call methods that use strtok. You might not be aware of
that.

Secondly strtok modifies the original string in a C style way. C like
string manipulation should not be used in C++ programs because it is
error prone and often a backdoor for security vulnerabilities. As long
as you do not deal with char* in C++ and you only use const char* the
probability of security vulnerabilities is significantly reduced.

Use strspn and strcspn for C style parsing in C++. They will easily
achieve the same behavior than strtok without it's disadvantages. I.e.
they do not modify the input buffer and the internal state is kept at
the local stack.

strtok is mainly supported for C compatibility by the C++ runtime.


Marcel
 
N

none

In this case the static state is discarded. Once you passed another
string you cannot continue to tokenize the first one.


There is nothing to release. The internal state has fixed size and
refers to the string buffer you supplied at the first call. The state is
globally allocated in the data segment of the C++ runtime.

More exactly, modern thread-safe C++ runtimes allocate the storage for
the internal state of strtok as thread local storage. Otherwise strtok
would be almost useless.


In practice I avoid to use strtok at all.

Firstly, because it is not re-entrant. I.e. you must not parse another
string while you have to complete the first one. This divides the
functions that you are allowed to call from within the parser loop into
the ones that never call strtok and the functions that might call strtok.
While it is trivial to decide this for runtime library functions it
becomes error prone for your own code. E.g. an object method you call
might internally call methods that use strtok. You might not be aware of
that.

Secondly strtok modifies the original string in a C style way. C like
string manipulation should not be used in C++ programs because it is
error prone and often a backdoor for security vulnerabilities. As long
as you do not deal with char* in C++ and you only use const char* the
probability of security vulnerabilities is significantly reduced.

Use strspn and strcspn for C style parsing in C++. They will easily
achieve the same behavior than strtok without it's disadvantages. I.e.
they do not modify the input buffer and the internal state is kept at
the local stack.

strtok is mainly supported for C compatibility by the C++ runtime.

Totally agree with Marcel here. strtok is not a very good function to
use anywhere, neither in C nor in C++. Even the manual says so:

---------------------------------------
man strtok
<snip snip>
BUGS
Be cautious when using these functions. If you do use them,
note that:

* These functions modify their first argument.

* These functions cannot be used on constant strings.

* The identity of the delimiting character is lost.

* The strtok() function uses a static buffer while parsing, so
it's not thread safe. Use strtok_r() if this matters to you.
-----------------------------------------

std::string::find() and std::string::substr() can pretty much do
everything the strtok does much more safely. I typically use a
template function that tokenize a string a return vector of string
containing the individual tokens. Much more usable at the cost of
copying a few string. It is very rarely a performance bottle neck.

Yannick
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top