strtok ( ) help

E

ern

I'm using strtok( ) to capture lines of input. After I call
"splitCommand", I call strtok( ) again to get the next line. Strtok( )
returns NULL (but there is more in the file...). That didn't happen
before 'splitCommands' entered the picture. The problem is in
splitCommands( ) somehow modifying the pointer, but I HAVE to call that
function. Is there a way to make a copy of it or something ?

/* HERE IS MY CODE */

char * lineOfScript;
const char * delim = "\n";

lineOfScript = strstr(scriptFileBuffer,":preprocess:"); //find starting
place in script
lineOfScript = strtok(lineOfScript,delim); //skip a line
lineOfScript = strtok(NULL,delim); //get next line... now I have a
command
splitCommand(lineOfScript); //this is probably where my pointer gets
messed up...
lineOfScript = strtok(NULL,delim); //get next line, but strtok returns
NULL

//Split up command into seperate words.
//Store words in global array
int splitCommand(char * command){
const char * delimeters = " ";
int i = 0;
g_UserCommands[0] = strtok(command, " ");
while(g_UserCommands != NULL && i < 5){
//printf("%s", g_UserCommands);//for debugging...
i+=1;
g_UserCommands = strtok(NULL, " ");
}
i=0;
return 1;
}
 
V

Vladimir S. Oka

ern said:
I'm using strtok( ) to capture lines of input. After I call
"splitCommand", I call strtok( ) again to get the next line. Strtok( )
returns NULL (but there is more in the file...). That didn't happen
before 'splitCommands' entered the picture. The problem is in
splitCommands( ) somehow modifying the pointer, but I HAVE to call that
function. Is there a way to make a copy of it or something ?

/* HERE IS MY CODE */

char * lineOfScript;
const char * delim = "\n";

lineOfScript = strstr(scriptFileBuffer,":preprocess:"); //find starting
place in script
lineOfScript = strtok(lineOfScript,delim); //skip a line
lineOfScript = strtok(NULL,delim); //get next line... now I have a
command
splitCommand(lineOfScript); //this is probably where my pointer gets
messed up...
lineOfScript = strtok(NULL,delim); //get next line, but strtok returns
NULL

//Split up command into seperate words.
//Store words in global array
int splitCommand(char * command){
const char * delimeters = " ";
int i = 0;
g_UserCommands[0] = strtok(command, " ");
while(g_UserCommands != NULL && i < 5){
//printf("%s", g_UserCommands);//for debugging...
i+=1;
g_UserCommands = strtok(NULL, " ");
}
i=0;
return 1;
}


In splitCommand() you start a new strtok() "session", so when
you return and invoke it next it actually searches from where it
left off in 'command'. This was a local variable to
splitCommand(), and there's no telling what's there once
splitCommand() returns (it actually /is not/ there anymore).

C passes /all/ parameters by value (even pointers), i.e. a copy
is made for the called function. After the function returns the
object ceases to exist. Trying to access it invokes Undefined
Behaviour.

Cheers

Vladimir
 
V

Vladimir S. Oka

Vladimir said:
ern said:
I'm using strtok( ) to capture lines of input. After I call
"splitCommand", I call strtok( ) again to get the next line. Strtok( )
returns NULL (but there is more in the file...). That didn't happen
before 'splitCommands' entered the picture. The problem is in
splitCommands( ) somehow modifying the pointer, but I HAVE to call that
function. Is there a way to make a copy of it or something ?

/* HERE IS MY CODE */

char * lineOfScript;
const char * delim = "\n";

lineOfScript = strstr(scriptFileBuffer,":preprocess:"); //find starting
place in script
lineOfScript = strtok(lineOfScript,delim); //skip a line
lineOfScript = strtok(NULL,delim); //get next line... now I have a
command
splitCommand(lineOfScript); //this is probably where my pointer gets
messed up...
lineOfScript = strtok(NULL,delim); //get next line, but strtok returns
NULL

//Split up command into seperate words.
//Store words in global array
int splitCommand(char * command){
const char * delimeters = " ";
int i = 0;
g_UserCommands[0] = strtok(command, " ");
while(g_UserCommands != NULL && i < 5){
//printf("%s", g_UserCommands);//for debugging...
i+=1;
g_UserCommands = strtok(NULL, " ");
}
i=0;
return 1;
}


In splitCommand() you start a new strtok() "session", so when you return
and invoke it next it actually searches from where it left off in
'command'. This was a local variable to splitCommand(), and there's no
telling what's there once splitCommand() returns (it actually /is not/
there anymore).


Consult your system help/manuals for a detailed description of
how strtok() works. The C Standard describes it in 7.21.5.8.
I've last used it too long ago to dare paraphrase either.

Cheers

Vladimir
 
P

pemo

ern said:
I'm using strtok( ) to capture lines of input. After I call
"splitCommand", I call strtok( ) again to get the next line. Strtok(
) returns NULL (but there is more in the file...). That didn't happen
before 'splitCommands' entered the picture. The problem is in
splitCommands( ) somehow modifying the pointer, but I HAVE to call
that function. Is there a way to make a copy of it or something ?

/* HERE IS MY CODE */

char * lineOfScript;
const char * delim = "\n";

lineOfScript = strstr(scriptFileBuffer,":preprocess:"); //find
starting place in script
lineOfScript = strtok(lineOfScript,delim); //skip a line
lineOfScript = strtok(NULL,delim); //get next line... now I have a
command
splitCommand(lineOfScript); //this is probably where my pointer gets
messed up...
lineOfScript = strtok(NULL,delim); //get next line, but strtok
returns NULL

//Split up command into seperate words.
//Store words in global array
int splitCommand(char * command){
const char * delimeters = " ";
int i = 0;
g_UserCommands[0] = strtok(command, " ");
while(g_UserCommands != NULL && i < 5){
//printf("%s", g_UserCommands);//for debugging...
i+=1;
g_UserCommands = strtok(NULL, " ");
}
i=0;
return 1;
}




strtok(lineOfScript,delim);

After the initial call, strtok() has to 'remember' the data you've asked it
to parse. To do that here, it makes a copy of whatever lineOfScript pointed
to, and stores it in some internal buffer [that it maintains, and you can't
directly access].


strtok(NULL,delim);

When you call it again, passing NULL as the first param, it simply continues
parsing from wherever it previously left off - i.e., it continues to parse
its internal buffer as set by whatever lineOfScript originally pointed to.


strtok("BOO",delim);

Now, if you call it again with a non-NULL initial param, it forgets whatever
data it was previously storing/working on and resets its internal buffer to
whatever data you've just passed in - a copy of "BOO" in this case. So,
whatever you didn't yet parse - that was originally ref'ed by lineOfScript -
is now lost and forgotten.

Bottom line, you can't do what you're trying to do with ...

p = strtok(p1, p2);

while(strtok(NULL, p2))
{
p3 = strtok(p4, ...);

...
}


--
===============================================================
In an attempt to reduce 'unwanted noise' on the 'signal' ...

Disclaimer:

Any comment/code I contribute might =NOT= be 100% portable, nor
semantically correct [read - 'not 100% pedantically correct'].
I don't care too much about that though, and I reckon it's the
same with most 'visitors' here. However, rest assured that any
'essential' (?) corrections WILL almost certainly appear v.soon
[read - 'to add noise as they see fit, a pedant will be along
shortly'].

WARNINGS: Always read the label. No beside-the-point minutiae
filter supplied. Keep away from children. Do not ignite.
===============================================================
 
D

Default User

pemo wrote:

strtok(lineOfScript,delim);

After the initial call, strtok() has to 'remember' the data you've
asked it to parse. To do that here, it makes a copy of whatever
lineOfScript pointed to, and stores it in some internal buffer [that
it maintains, and you can't directly access].

That's not likely. What it will "remember" is the last pointer value
that it returned, which is an offset into the string (probably just a
static char*. If it made a copy of the string, not only would that be
inefficient, but if an operation changed the original string between
calls to strtok() its copy would no longer match.



Brian
 
P

pemo

Default said:
pemo wrote:

strtok(lineOfScript,delim);

After the initial call, strtok() has to 'remember' the data you've
asked it to parse. To do that here, it makes a copy of whatever
lineOfScript pointed to, and stores it in some internal buffer [that
it maintains, and you can't directly access].

That's not likely. What it will "remember" is the last pointer value
that it returned, which is an offset into the string (probably just a
static char*. If it made a copy of the string, not only would that be
inefficient, but if an operation changed the original string between
calls to strtok() its copy would no longer match.

Yes, you're probably right, thanks for the correction ... hold on, brb <time
passes> ...
....
yup, certainly looks like it *is* as you say - for the gcc version at least.
Still, makes one wonder whether your comment ["but if an operation changed
the original string between calls to strtok() its copy would no longer
match"] might either be useful, or else goes against how the docs say
strtok() works.

int main(void)
{
char ar[] = "now is the time for all good men to come to the aid of the
party";

char * p = NULL;

int n = 0;

p = strtok(ar, " ");

while(p != NULL)
{
puts(p);

++n;

if(n % 5 == 0)
{
strcpy(ar, "the quick brown fox jumps over the lazy dog");
}

p = strtok(NULL, " ");
}
}


now
is
the
time
for
jumps
over
the
lazy
dog


--
===============================================================
In an attempt to reduce ‘unwanted noise’ on the ‘signal’ ...

Disclaimer:

Any comment/code I contribute might =NOT= be 100% portable, nor
semantically correct [read - ‘not 100% pedantically correct’].
I don’t care too much about that though, and I reckon it’s the
same with most ‘visitors’ here. However, rest assured that any
‘essential’ (?) corrections WILL almost certainly appear v.soon
[read - ‘to add noise as they see fit, a pedant will be along
shortly’].

WARNINGS: Always read the label. No beside-the-point minutiae
filter supplied. Keep away from children. Do not ignite.
===============================================================
 
D

Default User

pemo said:
Default User wrote:
That's not likely. What it will "remember" is the last pointer value
that it returned, which is an offset into the string (probably just
a static char*. If it made a copy of the string, not only would
that be inefficient, but if an operation changed the original
string between calls to strtok() its copy would no longer match.

Yes, you're probably right, thanks for the correction ... hold on,
brb <time passes> ... ...
yup, certainly looks like it is as you say - for the gcc version at
least. Still, makes one wonder whether your comment ["but if an
operation changed the original string between calls to strtok() its
copy would no longer match"] might either be useful, or else goes
against how the docs say strtok() works.


Remember also that strtok() has to modify the original string, so it
has to have a pointer into that string in all cases. That is, it
doesn't help to work on copy of the string because it has to punch null
characters in place of the delimiters. Also, the return value is a
pointer into the original string (or NULL of course).

The strtok() syntax and semantics are well into the "cheap, fast, and
dirty" style.



Brian
 
P

pemo

Default said:
pemo said:
Default User wrote:
That's not likely. What it will "remember" is the last pointer value
that it returned, which is an offset into the string (probably just
a static char*. If it made a copy of the string, not only would
that be inefficient, but if an operation changed the original
string between calls to strtok() its copy would no longer match.

Yes, you're probably right, thanks for the correction ... hold on,
brb <time passes> ... ...
yup, certainly looks like it is as you say - for the gcc version at
least. Still, makes one wonder whether your comment ["but if an
operation changed the original string between calls to strtok() its
copy would no longer match"] might either be useful, or else goes
against how the docs say strtok() works.


Remember also that strtok() has to modify the original string, so it
has to have a pointer into that string in all cases. That is, it
doesn't help to work on copy of the string because it has to punch
null characters in place of the delimiters. ...
The strtok() syntax and semantics are well into the "cheap, fast, and
dirty" style.

Yup, ok, and I guess, from 7.21.5.8.2 "A sequence of calls to the strtok
function breaks *the string pointed to by s1* into a ..." [pretty much]
implies that a copy should *not* be made [hmmmm ????]. Whether it really
*is certain though* ... if this [the std] were a law!

Ok, I reckon the meaning *is clear* [gulp] in this case.
Also, the return value is a pointer into the original string (or NULL of
course).

Jeez, I really want to be the last one to want to play *the pedantic card on
c.l.c* (surely, c.std.c is where *certain types* should 'go play'), but the
std says ...

Now, it's late [here] and I've not bothered to parse *all* the previous
paras in the std to see if there's a case for categorically stating that
'token', in this context, *is* necessaraily a member of the set of things in
the set of inputs to strtok(). But, if there's not a case, then "returns a
pointer to the first character of a token" doesn't, I think, preclude strtok
returning a pointer into some local [or any other] buffer, rather than the
one encoding the original string [the input] ... just that [perhaps], at the
time, the semantics of what token it is pointing to tallys with what its
input is?

As to the answer to this ... I actually don't give much of a damn [*a *****
actually], I'm more of a computational linguist these days, and it's *the
language* that interests me mostly now [my X3J11 days are a distant
memory] - and how, something that often appears at first sight reasonably
clear, can, in actual fact, be anything but! However, I'd rather ... than
be a pedant about it all now.


--
===============================================================
In an attempt to reduce ‘unwanted noise’ on the ‘signal’ ...

Disclaimer:

Any comment/code I contribute might =NOT= be 100% portable, nor
semantically correct [read - ‘not 100% pedantically correct’].
I don’t care too much about that though, and I reckon it’s the
same with most ‘visitors’ here. However, rest assured that any
‘essential’ (?) corrections WILL almost certainly appear v.soon
[read - ‘to add noise as they see fit, a pedant will be along
shortly’].

WARNINGS: Always read the label. No beside-the-point minutiae
filter supplied. Keep away from children. Do not ignite.
===============================================================
 
D

Default User

pemo wrote:

Now, it's late [here] and I've not bothered to parse all the previous
paras in the std to see if there's a case for categorically stating
that 'token', in this context, is necessaraily a member of the set of
things in the set of inputs to strtok(). But, if there's not a case,
then "returns a pointer to the first character of a token" doesn't, I
think, preclude strtok returning a pointer into some local [or any
other] buffer, rather than the one encoding the original string [the
input] ... just that [perhaps], at the time, the semantics of what
token it is pointing to tallys with what its input is?

What's unclear about this?

A sequence of calls to the strtok function breaks the
string pointed to by s1 into a sequence of tokens, each of
which is delimited by a character from the string pointed to
by s2.

"Breaks the string". Not forms some copies. Read how tokens are found
and formed. It pretty well lays out the state machine for you.




Brian
 
P

pemo

Default said:
pemo wrote:

Now, it's late [here] and I've not bothered to parse all the previous
paras in the std to see if there's a case for categorically stating
that 'token', in this context, is necessaraily a member of the set of
things in the set of inputs to strtok(). But, if there's not a case,
then "returns a pointer to the first character of a token" doesn't, I
think, preclude strtok returning a pointer into some local [or any
other] buffer, rather than the one encoding the original string [the
input] ... just that [perhaps], at the time, the semantics of what
token it is pointing to tallys with what its input is?

What's unclear about this?

A sequence of calls to the strtok function breaks the
string pointed to by s1 into a sequence of tokens, each of
which is delimited by a character from the string pointed to
by s2.

"Breaks the string". Not forms some copies. Read how tokens are found
and formed. It pretty well lays out the state machine for you.

Ho && Hum

--
===============================================================
In an attempt to reduce ‘unwanted noise’ on the ‘signal’ ...

Disclaimer:

Any comment/code I contribute might =NOT= be 100% portable, nor
semantically correct [read - ‘not 100% pedantically correct’].
I don’t care too much about that though, and I reckon it’s the
same with most ‘visitors’ here. However, rest assured that any
‘essential’ (?) corrections WILL almost certainly appear v.soon
[read - ‘to add noise as they see fit, *a pedant* will be along
shortly’].

WARNINGS: Always read the label. No beside-the-point minutiae
filter supplied. Keep away from children. Do not ignite.
===============================================================
 
R

rayw

w00t

:)

--
===============================================================
In an attempt to reduce ‘unwanted noise’ on the ‘signal’ ...

Disclaimer:

Any comment/code I contribute might =NOT= be 100% portable, nor
semantically correct [read - ‘not 100% pedantically correct’].
I don’t care too much about that though, and I reckon it’s the
same with most ‘visitors’ here. However, rest assured that any
‘essential’ (?) corrections WILL almost certainly appear v.soon
[read - ‘to add noise as they see fit, *a pedant* will be along
shortly’].

WARNINGS: Always read the label. No beside-the-point minutiae
filter supplied. Keep away from children. Do not ignite.
===============================================================

pemo said:
Default said:
pemo wrote:

Now, it's late [here] and I've not bothered to parse all the previous
paras in the std to see if there's a case for categorically stating
that 'token', in this context, is necessaraily a member of the set of
things in the set of inputs to strtok(). But, if there's not a case,
then "returns a pointer to the first character of a token" doesn't, I
think, preclude strtok returning a pointer into some local [or any
other] buffer, rather than the one encoding the original string [the
input] ... just that [perhaps], at the time, the semantics of what
token it is pointing to tallys with what its input is?

What's unclear about this?

A sequence of calls to the strtok function breaks the
string pointed to by s1 into a sequence of tokens, each of
which is delimited by a character from the string pointed to
by s2.

"Breaks the string". Not forms some copies. Read how tokens are found
and formed. It pretty well lays out the state machine for you.

Ho && Hum

--
===============================================================
In an attempt to reduce ‘unwanted noise’ on the ‘signal’ ...

Disclaimer:

Any comment/code I contribute might =NOT= be 100% portable, nor
semantically correct [read - ‘not 100% pedantically correct’].
I don’t care too much about that though, and I reckon it’s the
same with most ‘visitors’ here. However, rest assured that any
‘essential’ (?) corrections WILL almost certainly appear v.soon
[read - ‘to add noise as they see fit, *a pedant* will be along
shortly’].

WARNINGS: Always read the label. No beside-the-point minutiae
filter supplied. Keep away from children. Do not ignite.
===============================================================
 
G

Gregory Pietsch

ern said:
I'm using strtok( ) to capture lines of input. After I call
"splitCommand", I call strtok( ) again to get the next line. Strtok( )
returns NULL (but there is more in the file...). That didn't happen
before 'splitCommands' entered the picture. The problem is in
splitCommands( ) somehow modifying the pointer, but I HAVE to call that
function. Is there a way to make a copy of it or something ?

/* HERE IS MY CODE */

<code snipped; too easy to rewrite for debugging purposes>

First of all, you don't have to call any function if you know how to
manipulate strings. ;-)

Here's an idea of how to use the strtok() function. Assuming that you
don't mind trashing the contents of a string s, and t is your token
pointer:

for (t = s; (t = strtok(t, delimiters)) != 0; t = 0)

will give you a loop that extracts the tokens one at a time from s.

If you need code to capture lines of input, just look at FreeDOS Edlin,
available from either ibiblio or alt.sources. That code reads input a
character at a time, taking advantage of stdio's file buffering
mechanism, and starts a new line when it encounters '\n' in the text.
Another way of doing it (but you have to be careful about buffering) is
to use fgets() to read a long string and then testing the last
character read for the newline. WARNING: Do not use gets(), for it is
the tool of the Devil and can lead to buffer overruns, Satan's minions
streaming out of your nose at very high speeds, and other delightful
undefined behavior.

Here's what Wikipedia in its C book section says about strtok:

The strtok function

char *strtok(char *restrict s1, const char *restrict delimiters);

A sequence of calls to strtok() breaks the string pointed to by s1 into
a sequence of tokens, each of which is delimited by a byte from the
string pointed to by delimiters. The first call in the sequence has s1
as its first argument, and is followed by calls with a null pointer as
their first argument. The separator string pointed to by delimiters may
be different from call to call.

The first call in the sequence searches the string pointed to by s1 for
the first byte that is not contained in the current separator string
pointed to by delimiters. If no such byte is found, then there are no
tokens in the string pointed to by s1 and strtok() shall return a null
pointer. If such a byte is found, it is the start of the first token.

The strtok() function then searches from there for a byte that is
contained in the current separator string. If no such byte is found,
the current token extends to the end of the string pointed to by s1,
and subsequent searches for a token shall return a null pointer. If
such a byte is found, it is overwritten by a null byte, which
terminates the current token. The strtok() function saves a pointer to
the following byte, from which the next search for a token shall start.

Each subsequent call, with a null pointer as the value of the first
argument, starts searching from the saved pointer and behaves as
described above.

The strtok() function need not be reentrant. A function that is not
required to be reentrant is not required to be thread-safe.

Because the strtok() function must save state between calls, and you
could not have two tokenizers going at the same time, the Single Unix
Standard defined a similar function, strtok_r(), that does not need to
save state. Its prototype is this:

char *strtok_r(char *s, const char *delimiters, char **lasts);

The strtok_r() function considers the null-terminated string s as a
sequence of zero or more text tokens separated by spans of one or more
characters from the separator string delimiters. The argument lasts
points to a user-provided pointer which points to stored information
necessary for strtok_r() to continue scanning the same string.

In the first call to strtok_r(), s points to a null-terminated string,
delimiters to a null-terminated string of separator characters, and the
value pointed to by lasts is ignored. The strtok_r() function shall
return a pointer to the first character of the first token, write a
null character into s immediately following the returned token, and
update the pointer to which lasts points.

In subsequent calls, s is a null pointer and lasts shall be unchanged
from the previous call so that subsequent calls shall move through the
string s, returning successive tokens until no tokens remain. The
separator string delimiters may be different from call to call. When no
token remains in s, a NULL pointer shall be returned.

The following public-domain code for strtok and strtok_r codes the
former as a special case of the latter:

#include <string.h>
/* strtok_r */
char *(strtok_r)(char *s, const char *delimiters, char **lasts)
{
char *sbegin, *send;
sbegin = s ? s : *lasts;
sbegin += strspn(sbegin, delimiters);
if (*sbegin == '\0') {
*lasts = "";
return NULL;
}
send = strpbrk(sbegin, delimiters);
if (*send != '\0')
*send++ = '\0';
*lasts = send;
return sbegin;
}
/* strtok */
char *(strtok)(char *restrict s1, const char *restrict delimiters)
{
static char *ssave = "";
return strtok_r(s1, delimiters, &ssave);
}


HTH, Gregory Pietsch
 
P

pete

Gregory said:
send = strpbrk(sbegin, delimiters);
if (*send != '\0')

That's wrong.
strpbrk returns NULL when no characters from delimiters
are found in sbegin.

It should be either:
send = strpbrk(sbegin, delimiters);
if (send != NULL)
or
send = sbegin + strcspn(sbegin, delimiters);
if (*send != '\0')
 
P

pete

pete said:
That's wrong.
strpbrk returns NULL when no characters from delimiters
are found in sbegin.
send = sbegin + strcspn(sbegin, delimiters);
if (*send != '\0')

I think that one's best.
I don't think that send should be a able to aquire a
null pointer value.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top