C++ strtok

abcd · Apr 24, 2012

Hello C++ users,
Greetings.

I have a following question regarding strtok function used for string
tokenizing. As I understand, strtok internally uses static variable to
keep track of the string passed to it so that tokens can be searched
based on delimiter.
After the strtok returns NULL, it means that no tokens are available.

What if now strtok is invoked with another string to search for
tokens?? What happens to the internal static buffer which was
initialized to the previous string, when is that released??

Best Regards
Sumit

gwowen · Apr 24, 2012

What if now strtok is invoked with another string to search for
tokens?? What happens to the internal static buffer which was
initialized to the previous string, when is that released??

There is no static buffer. strtok() modifies the string passed to it
as an argument, by overwriting the delimiter characters with '\0' so
that the return values points to the (modified) input C-string. There
is static state between calls (i.e. where the last tokenization got
to), but no dynamic buffer is needed.

Vlad from Moscow · Apr 24, 2012

Hello C++ users,
Greetings.

I have a following question regarding strtok function used for string
tokenizing. As I understand, strtok internally uses static variable to
keep track of the string passed to it so that tokens can be searched
based on delimiter.
After the strtok returns NULL, it means that no tokens are available.

What if now strtok is invoked with another string to search for
tokens?? What happens to the internal static buffer which was
initialized to the previous string, when is that released??

Best Regards
Sumit

As I understand it does not have any internal static buffer. So there
is no need to release anything. It has a static variable of type point
to char. When you supply another string to process that static pointer
is set to this string. So in the very beginning there is a check
whether supplied string has value of NULL. If it is not equal to NULL
then the static pointer is set to this new value.

Marcel Müller · Apr 24, 2012

I have a following question regarding strtok function used for string
tokenizing. As I understand, strtok internally uses static variable to
keep track of the string passed to it so that tokens can be searched
based on delimiter.
After the strtok returns NULL, it means that no tokens are available.

What if now strtok is invoked with another string to search for
tokens??

In this case the static state is discarded. Once you passed another
string you cannot continue to tokenize the first one.

What happens to the internal static buffer which was
initialized to the previous string, when is that released??

There is nothing to release. The internal state has fixed size and
refers to the string buffer you supplied at the first call. The state is
globally allocated in the data segment of the C++ runtime.

More exactly, modern thread-safe C++ runtimes allocate the storage for
the internal state of strtok as thread local storage. Otherwise strtok
would be almost useless.

In practice I avoid to use strtok at all.

Firstly, because it is not re-entrant. I.e. you must not parse another
string while you have to complete the first one. This divides the
functions that you are allowed to call from within the parser loop into
the ones that never call strtok and the functions that might call strtok.
While it is trivial to decide this for runtime library functions it
becomes error prone for your own code. E.g. an object method you call
might internally call methods that use strtok. You might not be aware of
that.

Secondly strtok modifies the original string in a C style way. C like
string manipulation should not be used in C++ programs because it is
error prone and often a backdoor for security vulnerabilities. As long
as you do not deal with char* in C++ and you only use const char* the
probability of security vulnerabilities is significantly reduced.

Use strspn and strcspn for C style parsing in C++. They will easily
achieve the same behavior than strtok without it's disadvantages. I.e.
they do not modify the input buffer and the internal state is kept at
the local stack.

strtok is mainly supported for C compatibility by the C++ runtime.

Marcel

none · Apr 24, 2012

In this case the static state is discarded. Once you passed another
string you cannot continue to tokenize the first one.

There is nothing to release. The internal state has fixed size and
refers to the string buffer you supplied at the first call. The state is
globally allocated in the data segment of the C++ runtime.

More exactly, modern thread-safe C++ runtimes allocate the storage for
the internal state of strtok as thread local storage. Otherwise strtok
would be almost useless.

In practice I avoid to use strtok at all.

Firstly, because it is not re-entrant. I.e. you must not parse another
string while you have to complete the first one. This divides the
functions that you are allowed to call from within the parser loop into
the ones that never call strtok and the functions that might call strtok.
While it is trivial to decide this for runtime library functions it
becomes error prone for your own code. E.g. an object method you call
might internally call methods that use strtok. You might not be aware of
that.

Secondly strtok modifies the original string in a C style way. C like
string manipulation should not be used in C++ programs because it is
error prone and often a backdoor for security vulnerabilities. As long
as you do not deal with char* in C++ and you only use const char* the
probability of security vulnerabilities is significantly reduced.

Use strspn and strcspn for C style parsing in C++. They will easily
achieve the same behavior than strtok without it's disadvantages. I.e.
they do not modify the input buffer and the internal state is kept at
the local stack.

strtok is mainly supported for C compatibility by the C++ runtime.

Totally agree with Marcel here. strtok is not a very good function to
use anywhere, neither in C nor in C++. Even the manual says so:

---------------------------------------
man strtok
<snip snip>
BUGS
Be cautious when using these functions. If you do use them,
note that:

* These functions modify their first argument.

* These functions cannot be used on constant strings.

* The identity of the delimiting character is lost.

* The strtok() function uses a static buffer while parsing, so
it's not thread safe. Use strtok_r() if this matters to you.
-----------------------------------------

std::string::find() and std::string::substr() can pretty much do
everything the strtok does much more safely. I typically use a
template function that tokenize a string a return vector of string
containing the individual tokens. Much more usable at the cost of
copying a few string. It is very rarely a performance bottle neck.

Yannick

Dan McLeran · Apr 24, 2012

What if now strtok is invoked with another string to search for

tokens?? What happens to the internal static buffer which was
initialized to the previous string, when is that released??

Best Regards
Sumit

Have a look at Boost's excellent libraries: http://www.boost.org/doc/libs/1_49_0/libs/tokenizer/

Can't solve problems! please Help	0	Sep 26, 2022
strtok problem	16	Jun 8, 2010
Tokenizer Function (plus rant on strtok documentation)	18	Jul 11, 2006
use of strtok( )	5	Apr 22, 2007
home grown strtok() function for review	4	Sep 19, 2006
strtok & global variables	2	Sep 6, 2004
Question regarding static variable in c++	4	Jan 23, 2013
Help on Parsing	2	Jun 28, 2007

C++ strtok

abcd

gwowen

Vlad from Moscow

Marcel Müller

none

Dan McLeran

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads