strtok segfaults in CLI but not in GDB

P

Pietro Cerutti

Hello,
here I have a strange problem with a real simple strtok example.

The program is as follows:

### BEGIN STRTOK ###

#include <string.h>
#include <stdio.h>

int main()
{
char *input1 = "Hello, World!";

char *tok;

tok = strtok(input1, " ");
if(tok) printf("%s\n", tok);

tok = strtok(NULL, " ");
if(tok) printf("%s\n", tok);

return(0);

}

### END STRTOK ###


Now, when I run it from the command line, I get a bus error:

### BEGIN COMMAND LINE OUTPUT ###
gcc -ggdb -Wall -o strtok strtok.c
./strtok
Bus error (core dumped)
Exit 138

### END COMMAND LINE OUTPUT ###

When I run it step by step in GDB, the program terminates normally:

### BEGIN DEBUGGER OUTPUT ###
gdb ./strtok
GNU gdb 6.1.1 [FreeBSD]
[snip]GDB copyright and bla bla[/snip]
(gdb) break main
Breakpoint 1 at 0x8048570: file strtok.c, line 6.
(gdb) run
Starting program: /home/piter/strtok

Breakpoint 1, main () at strtok.c:6
6 char *input1 = "Hello, World!";
(gdb) next
10 tok = strtok(input1, " ");
(gdb)
11 if(tok) printf("%s\n", tok);
(gdb)
Hello,
13 tok = strtok(NULL, " ");
(gdb)
14 if(tok) printf("%s\n", tok);
(gdb)
World!
16 return(0);
(gdb)
18 }
(gdb)
0x08048485 in _start ()
(gdb)
Single stepping until exit from function _start,
which has no line number information.

Program exited normally.
(gdb)

### END DEBUGGER OUTPUT ###

Is there something I'm missing wrt C and/or strtok, or it's rather a
problem related to my environment (in which case I'll be happy to post
in the right newsgroup) ?

Thanx in advance
 
I

Ian Collins

Pietro said:
Hello,
here I have a strange problem with a real simple strtok example.

The program is as follows:

### BEGIN STRTOK ###

#include <string.h>
#include <stdio.h>

int main()
{
char *input1 = "Hello, World!";

char *tok;

tok = strtok(input1, " ");

strtok alters its input. You are passing it a string literal, modifying
a string literal invokes the demons of undefined behavior. Don't.
 
R

Richard Heathfield

Pietro Cerutti said:
Hello,
here I have a strange problem with a real simple strtok example.

The program is as follows:

### BEGIN STRTOK ###

#include <string.h>
#include <stdio.h>

int main()
{
char *input1 = "Hello, World!";

char *tok;

tok = strtok(input1, " ");

strtok modifies the string you pass it. You pass it a string literal.
You're not allowed to modify string literals.

Change

char *input1 = "Hello, World!";

to

char input1[] = "Hello, World!";
 
P

Pietro Cerutti

Pietro said:
char *input1 = "Hello, World!";

just in case, I know that the string to be tokenized shouldn't be a
constant, but rather an array of chars.
So, it should be declared as

char input1[14] = "Hello, World!";

The thing I don't understand is: why does it works in GDB?
 
I

Ian Collins

Pietro said:
Pietro said:
char *input1 = "Hello, World!";

just in case, I know that the string to be tokenized shouldn't be a
constant, but rather an array of chars.
So, it should be declared as

char input1[14] = "Hello, World!";

The thing I don't understand is: why does it works in GDB?
Luck?
 
C

Chris Dollin

Pietro said:
here I have a strange problem with a real simple strtok example.

Guess: you're trying to use it on a literal string.
The program is as follows:

### BEGIN STRTOK ###

#include <string.h>
#include <stdio.h>

int main()
{
char *input1 = "Hello, World!";

char *tok;

tok = strtok(input1, " ");
if(tok) printf("%s\n", tok);

tok = strtok(NULL, " ");
if(tok) printf("%s\n", tok);

return(0);

}

(fx:dancing) Yes!

`strtok` writes to its argument -- it sticks nuls in there to make
the strings it returns.

You're not allowed to write into a string literal: that gets you
undefined behaviour.

An implementation may just write into the string. Or it may abort in
some way. Or it may ignore the write. Or it may write somewhere else
entirely. Or it may mail a report to your co-coders, or start a game
of rogue, or book you a holiday in the Lake District, or set fire to
your keyboard, or arrange a date with your Most Preferred Person.

[That last one never seems to happen, though.]
 
P

Pietro Cerutti

Ian said:
Pietro said:
Pietro said:
char *input1 = "Hello, World!";
just in case, I know that the string to be tokenized shouldn't be a
constant, but rather an array of chars.
So, it should be declared as

char input1[14] = "Hello, World!";

The thing I don't understand is: why does it works in GDB?
Luck?

Ya, maybe.

The point is:
I understand what UB means, so WW3 could start now and I'd know why...

But if a string literal is - by definition - not modifiable, then how
can it happen that GDB actually modifies it using strtok?
 
P

Pietro Cerutti

Chris said:
You're not allowed to write into a string literal: that gets you
undefined behaviour.

An implementation may just write into the string.

Uh? So you mean that a string literal isn't unmodifiable by definition?
 
K

Keith Thompson

Pietro Cerutti said:
Pietro said:
char *input1 = "Hello, World!";

just in case, I know that the string to be tokenized shouldn't be a
constant, but rather an array of chars.
So, it should be declared as

char input1[14] = "Hello, World!";

The thing I don't understand is: why does it works in GDB?

Because it invokes undefined behavior. There are no rules about what
happens. It can crash, it can "work", it can make demons fly out of
your nose.

(I suppose string literals are stored in write-protected memory when
your program runs normally, but not when it runs under gdb -- which
seems odd.)
 
P

Pietro Cerutti

Keith said:
(I suppose string literals are stored in write-protected memory when
your program runs normally, but not when it runs under gdb -- which
seems odd.)

Yes it's weird, but it's a logical explanation.
I'll investigate with the freebsd people..
Thank you.
 
R

Richard Tobin

Pietro Cerutti said:
But if a string literal is - by definition - not modifiable, then how
can it happen that GDB actually modifies it using strtok?

It's not modifiable in that you're not allowed to modify it. It's not
required that the implementation signal an error when you do it. It's
a constraint on you, not on the system.

My guess as to why you don't see an error with GDB is that the
debugger needs the text segment to be writable, so that it can set
breakpoints.

-- Richard
 
K

Keith Thompson

Pietro Cerutti said:
Ian said:
Pietro said:
Pietro Cerutti wrote:

char *input1 = "Hello, World!";
just in case, I know that the string to be tokenized shouldn't be a
constant, but rather an array of chars.
So, it should be declared as

char input1[14] = "Hello, World!";

The thing I don't understand is: why does it works in GDB?
Luck?

Ya, maybe.

The point is:
I understand what UB means, so WW3 could start now and I'd know why...

But if a string literal is - by definition - not modifiable, then how
can it happen that GDB actually modifies it using strtok?

I think you don't *quite* understand what UB means.

The actual definition (C99 3.4.3) is:

behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard
imposes no requirements

and C99 6.4.5p6 says:

[...] If the program attempts to modify such an array, the
behavior is undefined.

For example, consider this program:

#include <stdio.h>
int main(void)
{
char *s = "Hello, world";
s[0] = 'J'; /* attempt to modify a string literal */
puts(s);
return 0;
}

One of the infinitely many possibly results is that the string literal
is actually modified, and the program prints "Jello, world".

The standard doesn't say that string literals are not modifiable. It
says that attempting to modify a string literal invokes undefined
behavior.
 
C

Chris Dollin

Pietro said:
Uh? So you mean that a string literal isn't unmodifiable by definition?

Yes, that's what I (well, the C standard) says.

Specifically, it says that if you attempt to write into a string literal,
/the effect is undefined/. Anything can happen. C washes it's hands of
your code. It cares not. Mind the gap. Do as you will.

An implementation may implement this freedom by changing the content of
the literal, if that's convenient.

Hence: don't go writing into string literals. Even though it /might/
get you a date, it probably won't, and I am assured that nasal demons
are not fun to have.
 
P

Pietro Cerutti

Keith said:
The standard doesn't say that string literals are not modifiable. It
says that attempting to modify a string literal invokes undefined
behavior.

Got it. Thanks!
 
P

Pietro Cerutti

Richard said:
My guess as to why you don't see an error with GDB is that the
debugger needs the text segment to be writable, so that it can set
breakpoints.

GDB on Debian/GNU Linux gives an error when I try to modify it.
On FreeBSD it doesn't, that's why I'm asking right now the FreeBSD
people whether the behavior is wanted or erroneous.

Thanx
 
P

Pietro Cerutti

Chris said:
Yes, that's what I (well, the C standard) says.

Specifically, it says that if you attempt to write into a string literal,
/the effect is undefined/. Anything can happen. C washes it's hands of
your code. It cares not. Mind the gap. Do as you will.

An implementation may implement this freedom by changing the content of
the literal, if that's convenient.

Hence: don't go writing into string literals. Even though it /might/
get you a date, it probably won't, and I am assured that nasal demons
are not fun to have.

Clear. Thanks to you too.
 
R

Richard Heathfield

Pietro Cerutti said:
GDB on Debian/GNU Linux gives an error when I try to modify it.

That's an acceptable outcome of undefined behaviour.
On FreeBSD it doesn't,

So's that.

that's why I'm asking right now the FreeBSD
people whether the behavior is wanted or erroneous.

It is neither Debian nor FreeBSD, but rather your program, that is
erroneous.
 
R

Richard Tobin

GDB on Debian/GNU Linux gives an error when I try to modify it.

That's an acceptable outcome of undefined behaviour.
On FreeBSD it doesn't,

So's that.

that's why I'm asking right now the FreeBSD
people whether the behavior is wanted or erroneous.

It is neither Debian nor FreeBSD, but rather your program, that is
erroneous.[/QUOTE]

I think he meant "erroneous" in the sense of a mistake, rather than
a violation of the C standard.

It certainly seems desirable to have programs behave the same way
under the debugger as without it, so it would be good if the FreeBSD
version could be changed. Meanwhile, we at least have a clue that if
a segmentation fault goes away in the debugger then the cause may well
be attempted modification of literal strings.

-- Richard
 
D

Don Porges

Keith Thompson said:
Pietro Cerutti said:
Ian said:
Pietro Cerutti wrote:
Pietro Cerutti wrote:

char *input1 = "Hello, World!";
just in case, I know that the string to be tokenized shouldn't be a
constant, but rather an array of chars.
So, it should be declared as

char input1[14] = "Hello, World!";

The thing I don't understand is: why does it works in GDB?

Luck?

Ya, maybe.

The point is:
I understand what UB means, so WW3 could start now and I'd know why...

But if a string literal is - by definition - not modifiable, then how
can it happen that GDB actually modifies it using strtok?

I think you don't *quite* understand what UB means.

The actual definition (C99 3.4.3) is:

behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard
imposes no requirements

and C99 6.4.5p6 says:

[...] If the program attempts to modify such an array, the
behavior is undefined.

For example, consider this program:

#include <stdio.h>
int main(void)
{
char *s = "Hello, world";
s[0] = 'J'; /* attempt to modify a string literal */
puts(s);
return 0;
}

One of the infinitely many possibly results is that the string literal
is actually modified, and the program prints "Jello, world".

The standard doesn't say that string literals are not modifiable. It
says that attempting to modify a string literal invokes undefined
behavior.

<OT>
Yes, _but_: from the point of view of gdb users and maintainers, they
may still consider it a gdb bug if, on a single platform, _any_ program
executes differently under gdb than it does when run normally. After all, the
underlying problem -- writing into r/o storage -- could be triggered from
an assembler program. And gdb doesn't have the same standards-contract
relationship with anything that a C implementation does.

It is, however, a separate issue from the fact that the program invokes UB.
</OT>
 
K

Keith Thompson

CBFalconer said:
To get an error with gcc, add "-Wwrite-strings" to the command. No
quote chars used.

That will cause gcc to emit a warning message if it can determine at
compilation time that you've attempted to modify a string literal.

Actually, it will generate warnings even in some cases where you
*don't* attempt to modify a string literal. It works by internally
applying a "const" qualifier to the array type. So, for example:

% cat c.c
char *s = "Hello, world";
% gcc -c c.c
% gcc -c -Wwrite-strings c.c
c.c:1: warning: initialization discards qualifiers from pointer target type

I haven't attempted to modify the string literal, but by assigning its
address to a (non-const) char*, I've created the potential to do so.
It would be nice if gcc were a bit smarter about this, at least
marking the array type as some kind of "pseudo-const" so it can give
more sensible warning messages. But since an implementation can warn
about anything it likes, I don't believe the "-Wwrite-strings" option
causes gcc to be non-conforming (unless you also add "-Werror").

Consider the following program:

#include <stdio.h>
int main(void)
{
const char *s = "Hello, world";
char *bogus = (char*)s;
bogus[0] = 'J';
puts(s);
return 0;
}

It attempts to modify a string literal, and gcc doesn't complain about
it (during compilation) even with "-Wwrite-strings", because I hid the
evil part behind a pointer cast that dropped the "const" qualifier.
On the system I'm using, it dies with a segmentation fault at run time
-- *unless* I specify "-fwritable-strings", in which case it happily
prints "Jello, world".

Most of this is gcc-specific, of course. The topical point is that,
apart from the fact that the "-Wwrite-strings -Werror" combination
causes some valid programs to be rejected, all this behavior conforms
to the standard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top