using strtok to mark delimiters as tokens

G

gpaps87

hi,
i wanted to know whether we can use strtok command to mark delimiters
as tokens as well.In Java,we have a command:

StringTokennizer(String str, String delimiters, boolean
delimAsToken)

which considers the delimiters as tokens,too.Can strtok accomplish
this requirement?or could you please let me know if there is any other
command in C that would carry out this task?
 
S

santosh

hi,
i wanted to know whether we can use strtok command to mark delimiters
as tokens as well.In Java,we have a command:

StringTokennizer(String str, String delimiters, boolean
delimAsToken)

which considers the delimiters as tokens,too.Can strtok accomplish
this requirement?

You could force strtok to consider delimiters as tokens by specifying a
different delimiter string at the next call to it. But this might not
be quite what you want though.
or could you please let me know if there is any other
command in C that would carry out this task?

You can use a string function like strstr or strcspn to locate the
necessary substring and tokenise the string yourself. In fact, it may
be more robust to use these functions and write your own tokeniser, as
strtok has a number of caveats you need to be aware of to use it
safely.
 
F

Falcon Kirtaran

hi,
i wanted to know whether we can use strtok command to mark delimiters
as tokens as well.In Java,we have a command:

StringTokennizer(String str, String delimiters, boolean
delimAsToken)

which considers the delimiters as tokens,too.Can strtok accomplish
this requirement?or could you please let me know if there is any other
command in C that would carry out this task?

I don't think there is. This is because the function is generally
implemented by replacing the delimiter with a null.

It'd be really easy to write your own, however, that preserves the
delimiting character. For instance, it might look something like:

char * strtok_d(char * str, const char * tokens, char * delim) {
int i1 = 0;
int i2 = 0;

static char * istr;
if (str) istr = str; /*if str is non-null, use str instead of the*/
/*stored value*/

while (istr[i1]) {
while (tokens[i2]) {
if (istr[i1] == tokens[i2]) {
*delim = tokens[i2]; /*save the token*/
tokens[i2] = 0; /*tokenize*/
istr += i2; /*set a new istr for the next call*/
return istr;
};
};

i2 = 0; /*re-init after use*/
i1++; /*iterate through*/
};

return 0; /*should not arrive here*/
};

The only thing you'd need to take care of is that both parameters must
be null-terminated strings. Also, you'd have to check if the third
parameter when dereferenced is not 0. Obviously, the third parameter is
just a single char (by reference), not a string.

The function could return it as a char *, but that would massively
modify the behaviour of the function, and would probably require it to
allocate a char array that you'd later have to free(). Calls to this
can't be mixed interchangeably with calls to strtok().

I've not tested this function at all; it's purely theoretical :p
 
M

Mark Bluemel

hi,
i wanted to know whether we can use strtok command to mark delimiters
as tokens as well.In Java,we have a command:

StringTokennizer(String str, String delimiters, boolean
delimAsToken)

which considers the delimiters as tokens,too.Can strtok accomplish
this requirement?

No, it can't.
or could you please let me know if there is any other
command in C that would carry out this task?

You could use one of the techniques others have already suggested, and
write your own.

Or you could do what's recommended with later versions of Java and move
to using a regular expression parser to pick apart the string.
 
M

Mark Bluemel

Falcon said:
hi,
i wanted to know whether we can use strtok command to mark delimiters
as tokens as well.In Java,we have a command:
[snip]

It'd be really easy to write your own, however, that preserves the
delimiting character. For instance, it might look something like:

char * strtok_d(char * str, const char * tokens, char * delim) {
[snip]

};

The only thing you'd need to take care of is that both parameters must
be null-terminated strings. Also, you'd have to check if the third
parameter when dereferenced is not 0. Obviously, the third parameter is
just a single char (by reference), not a string.

[snip] Calls to this
can't be mixed interchangeably with calls to strtok().

And due to the (very dubious) design decision to hold state in a static
variable, it is not thread-safe or even capable of being used on more
than one string at a time, for example if you wanted to further parse
a field extracted from a string before moving on to the next.

There are better solutions out there, I'm sure.
 
R

Richard Bos

I don't think there is. This is because the function is generally
implemented by replacing the delimiter with a null.

s/generally/required to be/.
It'd be really easy to write your own, however, that preserves the
delimiting character. For instance, it might look something like:

char * strtok_d(char * str, const char * tokens, char * delim) {

Don't call it that, though. Names starting in str and a letter are
reserved for the implementation's use.
The only thing you'd need to take care of is that both parameters must
be null-terminated strings.

No shit. This is generally true of string-mangling functions: they take
strings, not randomly filled arrays.

Another useful modification for your home-grown strtok() replacement
could be (depending on your requirements) that it does not skip empty
tokens. strtok() does so; this is sometimes desired behaviour, but to
me, it's usually a hindrance.

Richard
 
D

Default User

santosh said:
You could force strtok to consider delimiters as tokens by specifying
a different delimiter string at the next call to it. But this might
not be quite what you want though.

Not really. strtok() punches a hole in the string where that first
delimiter was. No way to save it. Besides, presumably the OP wants to
keep using the delimiter set.

I'm not sure what the exact semantics of the Java call are, it would
have to be specified better. What I would do would be an approach more
like PHP explode(), and produce an array or list of strings. Then it
would be relatively easy to save the delimiters too.




Brian
 
F

Falcon Kirtaran

Mark said:
Falcon said:
hi,
i wanted to know whether we can use strtok command to mark delimiters
as tokens as well.In Java,we have a command:
[snip]

It'd be really easy to write your own, however, that preserves the
delimiting character. For instance, it might look something like:

char * strtok_d(char * str, const char * tokens, char * delim) {
[snip]

};

The only thing you'd need to take care of is that both parameters must
be null-terminated strings. Also, you'd have to check if the third
parameter when dereferenced is not 0. Obviously, the third parameter
is just a single char (by reference), not a string.

[snip] Calls to this can't be mixed interchangeably with calls to
strtok().

And due to the (very dubious) design decision to hold state in a static
variable, it is not thread-safe or even capable of being used on more
than one string at a time, for example if you wanted to further parse
a field extracted from a string before moving on to the next.

There are better solutions out there, I'm sure.

strtok() itself is not thread-safe, however, and also can't be used on
more than one string at a time.
 
F

Falcon Kirtaran

Richard said:
s/generally/required to be/.


Don't call it that, though. Names starting in str and a letter are
reserved for the implementation's use.


No shit. This is generally true of string-mangling functions: they take
strings, not randomly filled arrays.

Another useful modification for your home-grown strtok() replacement
could be (depending on your requirements) that it does not skip empty
tokens. strtok() does so; this is sometimes desired behaviour, but to
me, it's usually a hindrance.

Richard

I don't think mine actually does skip empty tokens. If it finds an
empty token, it'll just return the next delimiter and a pointer to a
string that starts with null.
 
M

Malcolm McLean

Can strtok accomplish
this requirement?or could you please let me know if there is any other
command in C that would carry out this task?
sscanf() might be what you are looking for.

Alternatively, if you are implementing something like a BASIC interpreter,
you need a struct with the string, the read position, and the top token
read. Then you have a match() function to get rid of the current token and
get a new one, triggering an error if there is no match (for instance for
closing paretheses].
 
J

jaysome

s/generally/required to be/.


Don't call it that, though. Names starting in str and a letter are
reserved for the implementation's use.

Strictly speaking, this is not true.

The standard states that function names that begin with str and a
*lowercase* letter may be reserved. Thus, for example, a
function name such as stricmp is reserved, whereas a function name such
as strIcmp is not reserved--the user is free to define such a function.

In my opinion, if you want to both define a function name that begins
with "str" and at the same time avoid using a reserved identifier, then
use a name that begins with "str_", i.e., str_icmp.

If this seems overly pedantic, then there's a good chance I'd agree with
you.

For example, the function name strip_leading_white_space will never, ever
conflict with a function name defined by an implementation (the
implementation would in all probability, if it in fact defined such a
function, use something such as stripleadingwhitespace or striplws or
strlws or some other name).

The probability that an identifier you define conflicts with an
identifier defined by the implementation decreases exponentially,
multiplied by some constant, with the number of underscores used in your
identifier.

In my experience, a single underscore is all that is needed, even if your
identifier does begin with "str" and (followed by) a lowercase letter.
 
M

Mark Bluemel

Falcon said:
Mark said:
Falcon said:
(e-mail address removed) wrote:
hi,
i wanted to know whether we can use strtok command to mark delimiters
as tokens as well.In Java,we have a command:
[snip]

It'd be really easy to write your own, however, that preserves the
delimiting character. For instance, it might look something like:

char * strtok_d(char * str, const char * tokens, char * delim) {
[snip]

};

The only thing you'd need to take care of is that both parameters
must be null-terminated strings. Also, you'd have to check if the
third parameter when dereferenced is not 0. Obviously, the third
parameter is just a single char (by reference), not a string.

[snip] Calls to this can't be mixed interchangeably with calls to
strtok().

And due to the (very dubious) design decision to hold state in a static
variable, it is not thread-safe or even capable of being used on more
than one string at a time, for example if you wanted to further parse
a field extracted from a string before moving on to the next.

There are better solutions out there, I'm sure.

strtok() itself is not thread-safe, however, and also can't be used on
more than one string at a time.

Yeah, so if you're writing a replacement, you might as well write a
better one...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top