'strlen' : Why valgrind reports invalid read ?

M

m.labanowicz

Hello,

+ gcc --version
01: gcc (Ubuntu/Linaro 4.7.2-11precise2) 4.7.2
02: Copyright (C) 2012 Free Software Foundation, Inc.
03: This is free software; see the source for copying conditions. There is NO
04: warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
05:
+ cat main.c
01: #include <stdio.h>
02: #include <string.h>
03: #include <stdlib.h>
04: char * bar(char const * str);
05: char * bar(char const * str) {
06: char * buf = (char *)malloc(strlen(str) + 1);
07: if (buf) {
08: size_t len;
09: strcpy(buf, str);
10: printf("buf=%p\r\n", (void *)buf);
11: len = strlen(buf);
12: if (len) {
13: return buf;
14: }
15: free(buf);
16: }
17: return NULL;
18: }
19: int main(void) {
20: free((void *)bar(""));
21: return EXIT_SUCCESS;
22: }
+ gcc -O2 -ansi -pedantic -W -Wall -Werror -Wextra main.c
+ ./a.out
01: buf=0x7e0010
+ valgrind ./a.out
01: ==28564== Memcheck, a memory error detector
02: ==28564== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
03: ==28564== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
04: ==28564== Command: ./a.out
05: ==28564==
06: ==28564== Invalid read of size 4
07: ==28564== at 0x4006F4: bar (in /home/M.Labanowicz/work/valgissue/a.out)
08: ==28564== by 0x40057D: main (in /home/M.Labanowicz/work/valgissue/a.out)
09: ==28564== Address 0x51f1040 is 0 bytes inside a block of size 1 alloc'd
10: ==28564== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
11: ==28564== by 0x4006C6: bar (in /home/M.Labanowicz/work/valgissue/a.out)
12: ==28564== by 0x40057D: main (in /home/M.Labanowicz/work/valgissue/a.out)
13: ==28564==
14: buf=0x51f1040
15: ==28564==
16: ==28564== HEAP SUMMARY:
17: ==28564== in use at exit: 0 bytes in 0 blocks
18: ==28564== total heap usage: 1 allocs, 1 frees, 1 bytes allocated
19: ==28564==
20: ==28564== All heap blocks were freed -- no leaks are possible
21: ==28564==
22: ==28564== For counts of detected and suppressed errors, rerun with: -v
23: ==28564== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

Best Regards
 
X

Xavier Roche

+ gcc -O2 -ansi -pedantic -W -Wall -Werror -Wextra main.c

You may add -g3 to get some debugging infos even with O2.
+ valgrind ./a.out

Humm, old version of valgrind ?
==8217== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 4)

This might be an optimized version of strlen() triggering a false
warning, BTW.
 
M

m.labanowicz

You may add -g3 to get some debugging infos even with O2.

gcc -g3 -O2 -ansi -pedantic -W -Wall -Werror -Wextra main.c
....
06: ==28834== Invalid read of size 4
07: ==28834== at 0x4006F4: bar (main.c:11)
08: ==28834== by 0x40057D: main (main.c:20)
09: ==28834== Address 0x51f1040 is 0 bytes inside a block of size 1 alloc'd
10: ==28834== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
11: ==28834== by 0x4006C6: bar (main.c:6)
12: ==28834== by 0x40057D: main (main.c:20)
....

It seems that this is an 'strlen' bug in [gcc+valgrind].
 
S

Shao Miller

Hello,

+ gcc --version
01: gcc (Ubuntu/Linaro 4.7.2-11precise2) 4.7.2
02: Copyright (C) 2012 Free Software Foundation, Inc.
03: This is free software; see the source for copying conditions. There is NO
04: warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
05:
+ cat main.c
01: #include <stdio.h>
02: #include <string.h>
03: #include <stdlib.h>
04: char * bar(char const * str);
05: char * bar(char const * str) {
06: char * buf = (char *)malloc(strlen(str) + 1);
07: if (buf) {
08: size_t len;
09: strcpy(buf, str);
10: printf("buf=%p\r\n", (void *)buf);
11: len = strlen(buf);
12: if (len) {
13: return buf;
14: }
15: free(buf);
16: }
17: return NULL;
18: }
19: int main(void) {
20: free((void *)bar(""));
21: return EXIT_SUCCESS;
22: }
+ gcc -O2 -ansi -pedantic -W -Wall -Werror -Wextra main.c
+ ./a.out
01: buf=0x7e0010
+ valgrind ./a.out
01: ==28564== Memcheck, a memory error detector
02: ==28564== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
03: ==28564== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
04: ==28564== Command: ./a.out
05: ==28564==
06: ==28564== Invalid read of size 4
07: ==28564== at 0x4006F4: bar (in /home/M.Labanowicz/work/valgissue/a.out)
08: ==28564== by 0x40057D: main (in /home/M.Labanowicz/work/valgissue/a.out)
09: ==28564== Address 0x51f1040 is 0 bytes inside a block of size 1 alloc'd
10: ==28564== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
11: ==28564== by 0x4006C6: bar (in /home/M.Labanowicz/work/valgissue/a.out)
12: ==28564== by 0x40057D: main (in /home/M.Labanowicz/work/valgissue/a.out)
13: ==28564==
14: buf=0x51f1040
15: ==28564==
16: ==28564== HEAP SUMMARY:
17: ==28564== in use at exit: 0 bytes in 0 blocks
18: ==28564== total heap usage: 1 allocs, 1 frees, 1 bytes allocated
19: ==28564==
20: ==28564== All heap blocks were freed -- no leaks are possible
21: ==28564==
22: ==28564== For counts of detected and suppressed errors, rerun with: -v
23: ==28564== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)

I would take a guess that 'strcpy' is trying to copy four bytes at a
time. It's possible that it's got a bug for when the string length is <
4. Perhaps you could experiment with the strings "1", "12", "123", and
see if the behaviour is the same for all but the last one.
 
M

m.labanowicz

I would take a guess that 'strcpy' is trying to copy four bytes at a
time. It's possible that it's got a bug for when the string length is <
4. Perhaps you could experiment with the strings "1", "12", "123", and
see if the behaviour is the same for all but the last one.

Hmm, in such case violation should take place before 'printf'.

I have modified code:
....
strcpy(buf, str);
printf("buf=%p\r\n", (void *)buf);fflush(stdout);
len = strlen(buf);
printf("KOKO\r\n");fflush(stdout);
....
Result:
....
06: buf=0x51f1040
07: ==4252== Invalid read of size 4
08: ==4252== at 0x400784: bar (main.c:11)
09: ==4252== by 0x4005FD: main (main.c:21)
10: ==4252== Address 0x51f1040 is 0 bytes inside a block of size 1 alloc'd
11: ==4252== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
12: ==4252== by 0x400746: bar (main.c:6)
13: ==4252== by 0x4005FD: main (main.c:21)
14: ==4252==
15: KOKO
....

I have also tried with the different length of string:
"" -> error
"1" -> error
"12" -> error
"123" -> OK:CLEAR
"1234" -> error
"12345" -> error
"123456" -> error
"1234567" -> OK:CLEAR
"12345678" -> failed

It seems that strlen reads 4 bytes-words.
I assume that it is illegal.
 
S

Shao Miller

Hmm, in such case violation should take place before 'printf'.

I have modified code:
...
strcpy(buf, str);
printf("buf=%p\r\n", (void *)buf);fflush(stdout);
len = strlen(buf);
printf("KOKO\r\n");fflush(stdout);
...
Result:
...
06: buf=0x51f1040
07: ==4252== Invalid read of size 4
08: ==4252== at 0x400784: bar (main.c:11)
09: ==4252== by 0x4005FD: main (main.c:21)
10: ==4252== Address 0x51f1040 is 0 bytes inside a block of size 1 alloc'd
11: ==4252== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
12: ==4252== by 0x400746: bar (main.c:6)
13: ==4252== by 0x4005FD: main (main.c:21)
14: ==4252==
15: KOKO
...

I have also tried with the different length of string:
"" -> error
"1" -> error
"12" -> error
"123" -> OK:CLEAR
"1234" -> error
"12345" -> error
"123456" -> error
"1234567" -> OK:CLEAR
"12345678" -> failed

It seems that strlen reads 4 bytes-words.
I assume that it is illegal.

Yes after posting I refreshed my news-reader and saw that you suspected
'strlen'. That makes more sense, given the valgrind output about the
allocated storage.
 
G

glen herrmannsfeldt

Hmm, in such case violation should take place before 'printf'.
I have modified code:
...
strcpy(buf, str);
printf("buf=%p\r\n", (void *)buf);fflush(stdout);
len = strlen(buf);
printf("KOKO\r\n");fflush(stdout);
(snip)
It seems that strlen reads 4 bytes-words.
I assume that it is illegal.

Seems to me that if an implementation knew how it allocated strings,
such that reading the three bytes after one didn't cause any problems,
it should be able to get away with it.

For many systems, unless you are at the end of addressing space and
doing unaligned reads, it should not cause a problem.

In that case, I would call it a valgrind bug. That is, it isn't
consistent with the access patterns of the underlying implementation.

-- glen
 
N

Nick Bowler

(e-mail address removed) wrote: [...]
It seems that strlen reads 4 bytes-words.
I assume that it is illegal.

Seems to me that if an implementation knew how it allocated strings,
such that reading the three bytes after one didn't cause any problems,
it should be able to get away with it.

For many systems, unless you are at the end of addressing space and
doing unaligned reads, it should not cause a problem.

In that case, I would call it a valgrind bug. That is, it isn't
consistent with the access patterns of the underlying implementation.

I doubt there's any bug in the tools (valgrind or C implementation) here.
Most likely, the OP's configured suppressions are incomplete or out of
date, perhaps due to a botched installation of valgrind.

Cheers,
Nick
 
B

Barry Schwarz

Unrelated to your problem but:

Line 04 serves no purpose. A prototype is needed only if the
definition is not in scope when the function is called. Since the
definition of bar() follows immediately, this can never be the case.

The cast on line 6 serves no purpose. malloc() returns a void*
and there is an implicit conversion to any "pointer to object" type.

The \r on line 10 may not do what you think. The C library
combined with your operating system will convert the \n to the
appropriate "end of line" character(s). The extra \r may just confuse
any future attempt to read the data.

The cast in line 20 serves no purpose. A prototype for free() is
in scope so the compiler knows to convert the argument to void*.
 
B

Ben Bacarisse

Hmm, in such case violation should take place before 'printf'.

I have modified code:
...
strcpy(buf, str);
printf("buf=%p\r\n", (void *)buf);fflush(stdout);
len = strlen(buf);
printf("KOKO\r\n");fflush(stdout);
...
Result:
...
06: buf=0x51f1040
07: ==4252== Invalid read of size 4
08: ==4252== at 0x400784: bar (main.c:11)
09: ==4252== by 0x4005FD: main (main.c:21)
10: ==4252== Address 0x51f1040 is 0 bytes inside a block of size 1 alloc'd
11: ==4252== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
12: ==4252== by 0x400746: bar (main.c:6)
13: ==4252== by 0x4005FD: main (main.c:21)
14: ==4252==
15: KOKO
...

I have also tried with the different length of string:
"" -> error
"1" -> error
"12" -> error
"123" -> OK:CLEAR
"1234" -> error
"12345" -> error
"123456" -> error
"1234567" -> OK:CLEAR
"12345678" -> failed

It seems that strlen reads 4 bytes-words.
I assume that it is illegal.

You used -O2, yes? When I do that the second strlen is inlined and does
indeed check in 4-byte chunks:

movq %rbx, %rsi
..L3:
movl (%rsi), %eax
addq $4, %rsi
leal -16843009(%rax), %ecx
notl %eax
andl %eax, %ecx
andl $-2139062144, %ecx
je .L3

The first strlen is not inlined, because gcc can't assume that 4-byte
accesses are safe -- it can assume that after the malloc.

At least one source says "using Valgrind with code that has been
compiled with optimisation options could give incorrect results" and
recommends use -O0. This seems reasonable since I don't think valgrind
can do anything to avoid reporting an error in cases like this.
 
J

Jorgen Grahn

You may add -g3 to get some debugging infos even with O2.

I'd formulate that a bit differently. In gcc, there's no connection
between -g and optimization; the problem here is simply that -g is
missing. The symptom is that valgrind cannot report the source file
and line for the violations it finds.

(-g versus -g3 should be irrelevant in this case. I had to look it
up: -g3 is like -g, but also includes macro definitions. Unclear to
me when that's useful.)

/Jorgen
 
J

Jorgen Grahn

(e-mail address removed) wrote: [...]
It seems that strlen reads 4 bytes-words.
I assume that it is illegal.

Seems to me that if an implementation knew how it allocated strings,
such that reading the three bytes after one didn't cause any problems,
it should be able to get away with it.

For many systems, unless you are at the end of addressing space and
doing unaligned reads, it should not cause a problem.

In that case, I would call it a valgrind bug. That is, it isn't
consistent with the access patterns of the underlying implementation.

I doubt there's any bug in the tools (valgrind or C implementation) here.
Most likely, the OP's configured suppressions are incomplete or out of
date, perhaps due to a botched installation of valgrind.

Yes. The "suppressions" are Valgrind's tool for letting the library
get away with making assumptions about its own internals.

/Jorgen
 
X

Xavier Roche

Le 18/01/2013 12:02, (e-mail address removed) a écrit :
It seems that this is an 'strlen' bug in [gcc+valgrind].

Not really a bug, but a careful optimization.

What I understand (not perfectly, I did not dig into the actual code) is
that strlen() reads chunks of bytes (4 or 8 bytes) at once to find the
ending \0. This allows to speed up the whole process.

This is illegal strictly speaking, except that the low-level
implementation knows that if you stay on the same address page, reading
few bytes after the ending is harmless. Technically, you will read the
terminating \0 AND few garbage bytes, but theses ones will be simply
ignored since the \0 is seen.

Typically, if you reserve one (blank) memory page and ask strlen() on
the last byte of the page, you should not end up with a SEGV.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top