SIGKILL

C

cerr

Hi There,

I had to add a certain portion of code to an application which had
been considered to run stable (bewfore my addition). Now the QA guy
came back to me saying that he's seeing a SIGKILL after a while
(several hours) since my code addition. The code I added simply writes
a string (PIDMessageBuf - declared private) and a at runtime generated
timestamp into a text file a la:
Code:
		    if(gpsDataObj->getGPSLatitude() < 49.1937 && gpsDataObj-[QUOTE]
getGPSLatitude() > 49.1292){[/QUOTE]
		      struct tm *ptr;
		      time_t sec;
		      time(&sec);
		      char tmpFileName[500];

		      ptr = localtime(&sec);
		      sprintf(tmpFileName, "/var/log/PIDdata%d%d%d",
			      (ptr->tm_year + 1900),
			      (ptr->tm_mon + 1),
			      ptr->tm_mday);
		      ofstream PIDfile (tmpFileName,ios::app);
		      if (PIDfile.is_open()){
			PIDfile << PIDMessageBuf << ", " << ptr->tm_year + 1900 << "/" <<
ptr->tm_mon + 1 << "/" << ptr->tm_mday << ", " << ptr->tm_hour << ":"
<< ptr->tm_min << ":" << ptr->tm_sec << endl;
			PIDfile.close();
		      }
		      else{
			cout<< "Couldn't open " << tmpFileName << endl;
			}
		    }
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop that comes
around once a second.
Any hints or suggestions are greatly appreciated!
Thank you!
 
I

Ian Collins

Hi There,

I had to add a certain portion of code to an application which had
been considered to run stable (bewfore my addition). Now the QA guy
came back to me saying that he's seeing a SIGKILL after a while
(several hours) since my code addition. The code I added simply writes
a string (PIDMessageBuf - declared private) and a at runtime generated
timestamp into a text file a la:
Code:
[/QUOTE]
[QUOTE]
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop that comes
around once a second.
Any hints or suggestions are greatly appreciated!

Get them to either a) send you a core or b) run the code in a debugger
and give you a shout when it aborts.
 
C

cerr

Hi There,
I had to add a certain portion of code to an application which had
been considered to run stable (bewfore my addition). Now the QA guy
came back to me saying that he's seeing a SIGKILL after a while
(several hours) since my code addition. The code I added simply writes
a string (PIDMessageBuf - declared private) and a at runtime generated
timestamp into a text file a la:
Code:
[/QUOTE]
[QUOTE]
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop that comes
around once a second.
Any hints or suggestions are greatly appreciated!

Get them to either a) send you a core or b) run the code in a debugger
and give you a shout when it aborts.
Well, it was run in gdb but gdb doesn't say anything else than SIGKILL
either and after the sigkill you can't make a backtrace cause the app
terminated...
How would I get a core dump that i can do something with? :eek:
 
F

Fred

Hi There,

I had to add a certain portion of code to an application which had
been considered to run stable (bewfore my addition). Now the QA guy
came back to me saying that he's seeing a SIGKILL after a while
(several hours) since my code addition. The code I added simply writes
a string (PIDMessageBuf - declared private) and a at runtime generated
timestamp into a text file a la:
Code:
                    if(gpsDataObj->getGPSLatitude() < 49.1937 && gpsDataObj->getGPSLatitude() > 49.1292){

                      struct tm *ptr;
                      time_t sec;
                      time(&sec);
                      char tmpFileName[500];

                      ptr = localtime(&sec);
                      sprintf(tmpFileName, "/var/log/PIDdata%d%d%d",
                              (ptr->tm_year + 1900),
                              (ptr->tm_mon + 1),
                              ptr->tm_mday);
                      ofstream PIDfile (tmpFileName,ios::app);
                      if (PIDfile.is_open()){
                        PIDfile << PIDMessageBuf << ", " << ptr->tm_year + 1900 << "/" <<
ptr->tm_mon + 1 << "/" << ptr->tm_mday << ", " << ptr->tm_hour << ":"
<< ptr->tm_min << ":" << ptr->tm_sec << endl;
                        PIDfile.close();
                      }
                      else{
                        cout<< "Couldn't open " << tmpFileName << endl;
                        }
                    }
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop that comes
around once a second.
Any hints or suggestions are greatly appreciated!

What is PIDMessageBuf ? Where is it defined? How is it defined?
Is it defined?

What happens if you output to cout instead of PIDfile?
 
C

cerr

Hi There,
I had to add a certain portion of code to an application which had
been considered to run stable (bewfore my addition). Now the QA guy
came back to me saying that he's seeing a SIGKILL after a while
(several hours) since my code addition. The code I added simply writes
a string (PIDMessageBuf - declared private) and a at runtime generated
timestamp into a text file a la:
Code:
                    if(gpsDataObj->getGPSLatitude() < 49.1937 && gpsDataObj->getGPSLatitude() > 49.1292){[/QUOTE]
[QUOTE]
                      struct tm *ptr;
                      time_t sec;
                      time(&sec);
                      char tmpFileName[500];[/QUOTE]
[QUOTE]
                      ptr = localtime(&sec);
                      sprintf(tmpFileName, "/var/log/PIDdata%d%d%d",
                              (ptr->tm_year + 1900),
                              (ptr->tm_mon + 1),
                              ptr->tm_mday);
                      ofstream PIDfile (tmpFileName,ios::app);
                      if (PIDfile.is_open()){
                        PIDfile << PIDMessageBuf << ", " << ptr->tm_year + 1900 << "/" <<
ptr->tm_mon + 1 << "/" << ptr->tm_mday << ", " << ptr->tm_hour << ":"
<< ptr->tm_min << ":" << ptr->tm_sec << endl;
                        PIDfile.close();
                      }
                      else{
                        cout<< "Couldn't open " << tmpFileName << endl;
                        }
                    }
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop that comes
around once a second.
Any hints or suggestions are greatly appreciated!

What is PIDMessageBuf ?  Where is it defined? How is it defined?
Is it defined?
it is declared as a private std::string in the header.
What happens if you output to cout instead of PIDfile?
Worth a try....
 
C

cerr

Hi There,
I had to add a certain portion of code to an application which had
been considered to run stable (bewfore my addition). Now the QA guy
came back to me saying that he's seeing a SIGKILL after a while
(several hours) since my code addition. The code I added simply writes
a string (PIDMessageBuf - declared private) and a at runtime generated
timestamp into a text file a la:
Code:
                    if(gpsDataObj->getGPSLatitude() < 49.1937 && gpsDataObj->getGPSLatitude() > 49.1292){ 
                      struct tm *ptr;
                      time_t sec;
                      time(&sec);
                      char tmpFileName[500]; 
                      ptr = localtime(&sec);
                      sprintf(tmpFileName, "/var/log/PIDdata%d%d%d",
                              (ptr->tm_year + 1900),
                              (ptr->tm_mon + 1),
                              ptr->tm_mday);
                      ofstream PIDfile (tmpFileName,ios::app);
                      if (PIDfile.is_open()){
                        PIDfile << PIDMessageBuf << ", " << ptr->tm_year + 1900 << "/" <<
ptr->tm_mon + 1 << "/" << ptr->tm_mday << ", " << ptr->tm_hour << ":"
<< ptr->tm_min << ":" << ptr->tm_sec << endl;
                        PIDfile.close();
                      }
                      else{
                        cout<< "Couldn't open " << tmpFileName << endl;
                        }
                    }
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop that comes
around once a second.
Any hints or suggestions are greatly appreciated!
What is PIDMessageBuf ?  Where is it defined? How is it defined?
Is it defined?

it is declared as a private std::string in the header.
What happens if you output to cout instead of PIDfile?

Worth a try....

Now I got a "child terminated with signal 9" on the shell.... what
does that mean, any clues? :eek:
Sems to be something from pthread but uhm... :-?
Thanks,
 
A

AnonMail2005

Hi There,

I had to add a certain portion of code to an application which had
been considered to run stable (bewfore my addition). Now the QA guy
came back to me saying that he's seeing a SIGKILL after a while
(several hours) since my code addition. The code I added simply writes
a string (PIDMessageBuf - declared private) and a at runtime generated
timestamp into a text file a la:
Code:
                    if(gpsDataObj->getGPSLatitude() < 49.1937 && gpsDataObj->getGPSLatitude() > 49.1292){

                      struct tm *ptr;
                      time_t sec;
                      time(&sec);
                      char tmpFileName[500];

                      ptr = localtime(&sec);
                      sprintf(tmpFileName, "/var/log/PIDdata%d%d%d",
                              (ptr->tm_year + 1900),
                              (ptr->tm_mon + 1),
                              ptr->tm_mday);
                      ofstream PIDfile (tmpFileName,ios::app);
                      if (PIDfile.is_open()){
                        PIDfile << PIDMessageBuf << ", " << ptr->tm_year + 1900 << "/" <<
ptr->tm_mon + 1 << "/" << ptr->tm_mday << ", " << ptr->tm_hour << ":"
<< ptr->tm_min << ":" << ptr->tm_sec << endl;
                        PIDfile.close();
                      }
                      else{
                        cout<< "Couldn't open " << tmpFileName << endl;
                        }
                    }
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop that comes
around once a second.
Any hints or suggestions are greatly appreciated!
Thank you!


Since you mentioned threads, I do believe that localtime is not thread
safe. It uses an internal static variable and returns a pointer to
that. Check out the localtime_r function instead.

HTH
 
C

cerr

Hi There,
I had to add a certain portion of code to an application which had
been considered to run stable (bewfore my addition). Now the QA guy
came back to me saying that he's seeing a SIGKILL after a while
(several hours) since my code addition. The code I added simply writes
a string (PIDMessageBuf - declared private) and a at runtime generated
timestamp into a text file a la:
Code:
                    if(gpsDataObj->getGPSLatitude() < 49.1937 && gpsDataObj->getGPSLatitude() > 49.1292){[/QUOTE]
[QUOTE]
                      struct tm *ptr;
                      time_t sec;
                      time(&sec);
                      char tmpFileName[500];[/QUOTE]
[QUOTE]
                      ptr = localtime(&sec);
                      sprintf(tmpFileName, "/var/log/PIDdata%d%d%d",
                              (ptr->tm_year + 1900),
                              (ptr->tm_mon + 1),
                              ptr->tm_mday);
                      ofstream PIDfile (tmpFileName,ios::app);
                      if (PIDfile.is_open()){
                        PIDfile << PIDMessageBuf << ", " << ptr->tm_year + 1900 << "/" <<
ptr->tm_mon + 1 << "/" << ptr->tm_mday << ", " << ptr->tm_hour << ":"
<< ptr->tm_min << ":" << ptr->tm_sec << endl;
                        PIDfile.close();
                      }
                      else{
                        cout<< "Couldn't open " << tmpFileName << endl;
                        }
                    }
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop that comes
around once a second.
Any hints or suggestions are greatly appreciated!
Thank you!

Since you mentioned threads, I do believe that localtime is not thread
safe.  It uses an internal static variable and returns a pointer to
that.  Check out the localtime_r function instead.

Hey, Thanks for pointing that out! I replaced any occurences of
localtime() (there's plenty of them) with localtime_r() and back it
goes to QA, gotta see if that's leading to an improvment. *crossing my
fingers*
Thanks dude!
 
M

Michael Doubez

Hey, Thanks for pointing that out! I replaced any occurences of
localtime() (there's plenty of them) with localtime_r() and back it
goes to QA, gotta see if that's leading to an improvment. *crossing my
fingers*
Thanks dude!

You make a random change and send back the program to QA ?

Wouldn't you prefer to locate the bug and be sure you made the right
change ?
I expect a good valgrind would have given you the answer.
 
J

James Kanze

On 03/18/10 09:20 AM, cerr wrote:
I had to add a certain portion of code to an application
which had been considered to run stable (bewfore my
addition). Now the QA guy came back to me saying that he's
seeing a SIGKILL after a while (several hours) since my code
addition. The code I added simply writes a string
(PIDMessageBuf - declared private) and a at runtime
generated timestamp into a text file a la:
Code:
[/QUOTE]
[QUOTE]
I cannot see how this code would lead to a SIGKILL, anyone?
Oh by the way, this is running in a threaded while(1) loop
that comes around once a second. Any hints or suggestions
are greatly appreciated!
Get them to either a) send you a core or b) run the code in a debugger
and give you a shout when it aborts.

SIGKILL doesn't give a core. I don't even know if you can do
anything with it in the debugger. And it almost always comes
from outside the process. I'd guess that there's something
monitoring the processes in the environment, which decides that
his process is up to no good, so kills it. Maybe his
modification makes some monitoring software think it's a virus.
 
J

James Kanze

It means SIGKILL, what your QA guy is telling you. SIGKILL is
signal 9. man pages are your friends. "man 7 signal",
orhttp://manpages.courier-mta.org/htmlman7/signal.7.html

According to Posix, it's not guaranteed. (But practically all
Unix do agree here.)
As you can see in the convenient table in the middle of the
man page, signal 9 is called … drumroll … SIGKILL. Who woulda
thunk it?
One thing about C++ -- bugs may not necessarily manifest
themselves right away. For various reasons, which are too
boring to go into, stomping on a few random bytes of memory,
or dereferencing an uninitialized pointer, may go completely
unnoticed at first. But, a long time later, an innocuous
change elsewhere in the program -- a few lines of added code,
or a few lines of removed code -- subtly changes the contents
of your compiled program in such a manner that the new
internal memory layout of the code, or a slightly different
heap allocation pattern, suddenly makes those few stomped
bytes of memory be something important. Result: an ugly crash,
and you're staring at the innocent bit of code that you just
changed, and wondering how the FRAK could that possibly change
anything?

A SIGKILL is *not* a crash, at least not on a normal Unix. A
SIGKILL is what you send when you want a program to terminate,
and it refuses to do so otherwise. Of course, it's quite
possible to generate a signal 9 from your own code. It's just
very unlikely to happen accidentally.
 
J

Jorgen Grahn

You make a random change and send back the program to QA ?
Wouldn't you prefer to locate the bug and be sure you made the right
change ?

Depends on how his QA is organized, I guess. But personally I'd rather
waste my own time than someone elses with long-shots.
I expect a good valgrind would have given you the answer.

Except for the special meaning of SIGKILL mentioned elsewhere in the
thread. There is almost certainly a watchdog of some kind somewhere
in the system, and his job is to (a) find it, (b) check its logs, (c)
find out what it assumes about the processes it watches.

/Jorgen
 
C

cerr

Depends on how his QA is organized, I guess. But personally I'd rather
waste my own time than someone elses with long-shots.
I got valgrind goign - finally... but am having troubles reading its
output. I get stuff back like this one e.g.:
==18816== at 0x41C37DF: ??? (in /lib/tls/i686/cmov/libc-2.10.1.so)
==18816== by 0x41C33CF: strtol (in /lib/tls/i686/cmov/
libc-2.10.1.so)
==18816== by 0x41C0740: atoi (in /lib/tls/i686/cmov/libc-2.10.1.so)
==18816== by 0x8058469:
gpsnmeareader::storeGPSdata(gpsnmeareader::NMEA_STR)
(gpsnmeareader.cpp:306)
==18816== by 0x8057F89: gpsnmeareader::run() (gpsnmeareader.cpp:
257)
==18816== by 0x804E9F9: TSPThread::StartThread(void*)
(tspthread.cpp:37)
==18816== by 0x417F80D: start_thread (in /lib/tls/i686/cmov/
libpthread-2.10.1.so)
==18816== by 0x425F8DD: clone (in /lib/tls/i686/cmov/
libc-2.10.1.so)
==18816== Address 0x436d29d is 13 bytes inside a block of size 15
free'd
==18816== at 0x402454D: operator delete(void*) (vg_replace_malloc.c:
346)
==18816== by 0x40D935C:
std::string::_Rep::_M_destroy(std::allocator<char> const&) (in /usr/
lib/libstdc++.so.6.0.13)
==18816== by 0x40DAD6B: std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::~basic_string() (in /
usr/lib/libstdc++.so.6.0.13)
==18816== by 0x80583B1:
gpsnmeareader::storeGPSdata(gpsnmeareader::NMEA_STR)
(gpsnmeareader.cpp:302)
==18816== by 0x8057F89: gpsnmeareader::run() (gpsnmeareader.cpp:
257)
==18816== by 0x804E9F9: TSPThread::StartThread(void*)
(tspthread.cpp:37)
==18816== by 0x417F80D: start_thread (in /lib/tls/i686/cmov/
libpthread-2.10.1.so)
==18816== by 0x425F8DD: clone (in /lib/tls/i686/cmov/
libc-2.10.1.so)

Now what does that mean? All i can read is "Address 0x436d29d is 13
bytes inside a block of size 15 free'd" but i have no clue where
0x436d29d is... :eek: Well there clearly seems to be issues in
gpsnmeareader.cpp but how do i dig further?

Thanks,
 
C

cerr

On Thu, 2010-03-18, Michael Doubez wrote:
[snip]
Hey, Thanks for pointing that out! I replaced any occurences of
localtime() (there's plenty of them) with localtime_r() and back
it goes to QA, gotta see if that's leading to an improvment.
*crossing my fingers*
Thanks dude!
You make a random change and send back the program to QA ?
Wouldn't you prefer to locate the bug and be sure you made the
right change ?
Depends on how his QA is organized, I guess. But personally I'd
rather waste my own time than someone elses with long-shots.
I expect a good valgrind would have given you the answer.
 I got valgrind goign - finally... but am having troubles reading its
output. I get stuff back like this one e.g.:
==18816==    at 0x41C37DF: ??? (in /lib/tls/i686/cmov/libc-2.10.1.so)
==18816==    by 0x41C33CF: strtol (in /lib/tls/i686/cmov/
libc-2.10.1.so)
==18816==    by 0x41C0740: atoi (in /lib/tls/i686/cmov/libc-2.10.1.so)
==18816==    by 0x8058469:
gpsnmeareader::storeGPSdata(gpsnmeareader::NMEA_STR)
(gpsnmeareader.cpp:306)

It seems you try to access some std::string data buffer here




==18816==    by 0x8057F89: gpsnmeareader::run() (gpsnmeareader.cpp:
257)
==18816==    by 0x804E9F9: TSPThread::StartThread(void*)
(tspthread.cpp:37)
==18816==    by 0x417F80D: start_thread (in /lib/tls/i686/cmov/
libpthread-2.10.1.so)
==18816==    by 0x425F8DD: clone (in /lib/tls/i686/cmov/
libc-2.10.1.so)
==18816==  Address 0x436d29d is 13 bytes inside a block of size 15
free'd
==18816==    at 0x402454D: operator delete(void*)
(vg_replace_malloc.c: 346)
==18816==    by 0x40D935C:
std::string::_Rep::_M_destroy(std::allocator<char> const&) (in /usr/
lib/libstdc++.so.6.0.13)
==18816==    by 0x40DAD6B: std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::~basic_string() (in /
usr/lib/libstdc++.so.6.0.13)
==18816==    by 0x80583B1:
gpsnmeareader::storeGPSdata(gpsnmeareader::NMEA_STR)
(gpsnmeareader.cpp:302)

Which has been destroyed here (possibly at scope exit). I would take a
very careful look of the function residing in gpsnmeareader.cpp lines
302-306.
==18816==    by 0x8057F89: gpsnmeareader::run() (gpsnmeareader.cpp:
257)
==18816==    by 0x804E9F9: TSPThread::StartThread(void*)
(tspthread.cpp:37)
==18816==    by 0x417F80D: start_thread (in /lib/tls/i686/cmov/
libpthread-2.10.1.so)
==18816==    by 0x425F8DD: clone (in /lib/tls/i686/cmov/
libc-2.10.1.so)
Now what does that mean? All i can read is "Address 0x436d29d is 13
bytes inside a block of size 15 free'd" but i have no clue where
0x436d29d is... :eek: Well there clearly seems to be issues in
gpsnmeareader.cpp but how do i dig further?

The diagnostics seem quite good so the error should be obvious when you
look at the source code. Of course, there is no guarantee valgrind has
actually spot the problem, but in general I have found it quite up to the
task.
Hm okay, found out that atoi() isn't thread safe and replaced all
occurences with strtol() and now i'm seeing things like
==27829== 7 errors in context 5 of
13:
==27829== Thread
5:
==27829== Invalid read of size
1
==27829== at 0x41C36BD: ??? (in /lib/tls/i686/cmov/
libc-2.10.1.so)
==27829== by 0x41C33CF: strtol (in /lib/tls/i686/cmov/
libc-2.10.1.so)

What's that about? Strtol is thread safe it says... I'm not sure... :eek:
 
M

Michael Doubez

On Thu, 2010-03-18, Michael Doubez wrote:
[snip]
Hey, Thanks for pointing that out! I replaced any occurences of
localtime() (there's plenty of them) with localtime_r() and back
it goes to QA, gotta see if that's leading to an improvment.
*crossing my fingers*
Thanks dude!
You make a random change and send back the program to QA ?
Wouldn't you prefer to locate the bug and be sure you made the
right change ?
Depends on how his QA is organized, I guess. But personally I'd
rather waste my own time than someone elses with long-shots.
I expect a good valgrind would have given you the answer.
 I got valgrind goign - finally... but am having troubles reading its
output. I get stuff back like this one e.g.:
==18816==    at 0x41C37DF: ??? (in /lib/tls/i686/cmov/libc-2.10.1.so)
==18816==    by 0x41C33CF: strtol (in /lib/tls/i686/cmov/
libc-2.10.1.so)
==18816==    by 0x41C0740: atoi (in /lib/tls/i686/cmov/libc-2.10.1.so)
==18816==    by 0x8058469:
gpsnmeareader::storeGPSdata(gpsnmeareader::NMEA_STR)
(gpsnmeareader.cpp:306)
It seems you try to access some std::string data buffer here
Which has been destroyed here (possibly at scope exit). I would take a
very careful look of the function residing in gpsnmeareader.cpp lines
302-306.
The diagnostics seem quite good so the error should be obvious when you
look at the source code. Of course, there is no guarantee valgrind has
actually spot the problem, but in general I have found it quite up to the
task.

Hm okay, found out that atoi() isn't thread safe and replaced all
occurences with strtol() and now i'm seeing things like

IMO you got it wrong. this kind of error has nothing to do with atoi()
being not thread safe in you implementation (I wonder why atoi is not
thread safe, because of the locale?).

What valgrind told you is that you are deleting a memory location that
as been free-ed. If it is a string you may have an issue with COW but
I doubt it.

Try other options of valgrind or even get gdb to break on the
suspicious line.

[snip]
 
C

cerr

On Thu, 2010-03-18, Michael Doubez wrote:
[snip]
Hey, Thanks for pointing that out! I replaced any occurences of
localtime() (there's plenty of them) with localtime_r() and back
it goes to QA, gotta see if that's leading to an improvment.
*crossing my fingers*
Thanks dude!
You make a random change and send back the program to QA ?
Wouldn't you prefer to locate the bug and be sure you made the
right change ?
Depends on how his QA is organized, I guess. But personally I'd
rather waste my own time than someone elses with long-shots.
I expect a good valgrind would have given you the answer.
 I got valgrind goign - finally... but am having troubles reading its
output. I get stuff back like this one e.g.:
==18816==    at 0x41C37DF: ??? (in /lib/tls/i686/cmov/libc-2.10.1.so)
==18816==    by 0x41C33CF: strtol (in /lib/tls/i686/cmov/
libc-2.10.1.so)
==18816==    by 0x41C0740: atoi (in /lib/tls/i686/cmov/libc-2.10.1.so)
==18816==    by 0x8058469:
gpsnmeareader::storeGPSdata(gpsnmeareader::NMEA_STR)
(gpsnmeareader.cpp:306)
It seems you try to access some std::string data buffer here
==18816==    by 0x8057F89: gpsnmeareader::run() (gpsnmeareader.cpp:
257)
==18816==    by 0x804E9F9: TSPThread::StartThread(void*)
(tspthread.cpp:37)
==18816==    by 0x417F80D: start_thread (in /lib/tls/i686/cmov/
libpthread-2.10.1.so)
==18816==    by 0x425F8DD: clone (in /lib/tls/i686/cmov/
libc-2.10.1.so)
==18816==  Address 0x436d29d is 13 bytes inside a block of size 15
free'd
==18816==    at 0x402454D: operator delete(void*)
(vg_replace_malloc.c: 346)
==18816==    by 0x40D935C:
std::string::_Rep::_M_destroy(std::allocator<char> const&) (in /usr/
lib/libstdc++.so.6.0.13)
==18816==    by 0x40DAD6B: std::basic_string<char,
std::char_traits<char>, std::allocator<char> >::~basic_string() (in /
usr/lib/libstdc++.so.6.0.13)
==18816==    by 0x80583B1:
gpsnmeareader::storeGPSdata(gpsnmeareader::NMEA_STR)
(gpsnmeareader.cpp:302)
Which has been destroyed here (possibly at scope exit). I would take a
very careful look of the function residing in gpsnmeareader.cpp lines
302-306.
==18816==    by 0x8057F89: gpsnmeareader::run() (gpsnmeareader.cpp:
257)
==18816==    by 0x804E9F9: TSPThread::StartThread(void*)
(tspthread.cpp:37)
==18816==    by 0x417F80D: start_thread (in /lib/tls/i686/cmov/
libpthread-2.10.1.so)
==18816==    by 0x425F8DD: clone (in /lib/tls/i686/cmov/
libc-2.10.1.so)
Now what does that mean? All i can read is "Address 0x436d29d is 13
bytes inside a block of size 15 free'd" but i have no clue where
0x436d29d is... :eek: Well there clearly seems to be issues in
gpsnmeareader.cpp but how do i dig further?
The diagnostics seem quite good so the error should be obvious when you
look at the source code. Of course, there is no guarantee valgrind has
actually spot the problem, but in general I have found it quite up to the
task.
Hm okay, found out that atoi() isn't thread safe and replaced all
occurences with strtol() and now i'm seeing things like

IMO you got it wrong. this kind of error has nothing to do with atoi()
being not thread safe in you implementation (I wonder why atoi is not
thread safe, because of the locale?).

What valgrind told you is that you are deleting a memory location that
as been free-ed. If it is a string you may have an issue with COW but
I doubt it.

okay now I get tons of messages from valgrind what really worries me.
No one has ever ran this piece of code through valgrind as valgrind
hasn't compiled on the target platform till the other day. That's
scarry.
Ok, now the first error looks like this (above messages seem to be
init messages where stuff is read from or redirected to):
==30794== Conditional jump or move depends on uninitialised value(s)
==30794== at 0x40C0F88: std::eek:streambuf_iterator<char,
std::char_traits said:
::_M_insert_int<long>(std::eek:streambuf_iterator<char,
std::char_traits<char> >, std::ios_base&, char, long) const (in /usr/
lib/libstdc++.so.6.0.13)
==30794== by 0x40C120C: std::num_put<char,
std::ostreambuf_iterator said:
::do_put(std::eek:streambuf_iterator<char, std::char_traits<char> >,
std::ios_base&, char, long) const (in /usr/lib/libstdc++.so.6.0.13)
==30794== by 0x40D1434: std::eek:stream&
std::eek:stream::_M_insert<long>(long) (in /usr/lib/libstdc++.so.6.0.13)
==30794== by 0x40D15C3: std::eek:stream::eek:perator<<(int) (in /usr/lib/
libstdc++.so.6.0.13)
==30794== by 0x804CFFD: BlackBox::prepareFileandHeader()
(blackbox.cpp:208)
==30794== by 0x804D98F: BlackBox::start(std::string, logger*, int,
int, char const*) (blackbox.cpp:335)
==30794== by 0x806AE9E: PRGDaemon::work() (prgdaemon.cpp:359)
==30794== by 0x806ADB3: PRGDaemon::runWork() (prgdaemon.cpp:339)
==30794== by 0x804C608: main (prg.cpp:74)
Now, I understand "Conditional jump or move depends on uninitialised
value(s)" but how would i figure out where this ius happening? This is
just a whole lot of information and i'm not quite clear on how to read/
interpret this... a little support is appreciated!

Thanks a lot!
 
J

Jorgen Grahn

....
....
....


This message comes from deep inside standard library. You are calling it
through std::eek:stream::eek:perator<<(int) at blackbox.cpp line 208. You can
safely assume that the standard library works correctly, so all errors,
if any, must be in your code. In this case about the only thing which can
be uninitialized is the int you are passing. Find out where it is coming
from, and make sure it is initialized properly. For example, if it is
from a memory block allocated with malloc(), you can change the malloc()
call to calloc() (after finding out why this memory part is unused, of
course).

A few things to keep in mind:

- It's not a given that valgrind, on a platform to which it was ported
"the other day"? works perfectly, or that the libraries don't give
any warnings. I believe it comes with a big database of warnings which
are harmless and should not be shown to the user -- for a certain
libc, processor, OS and so on.

- I note that you've lost sight of the original bug -- the SIGKILL.
Fixing bugs valgrind shows you is honorable work, but it is rarely top
priority.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top