Real pain with SegFoult

C

Cubanan

Hi guys,

Actualy I'm facing REALY big problem with program for my M.Sc.
For three days I'am looking for a bug, but I'm not able to find it. It
is realy disaster for me becouse it is one of them which appears only
sometime, on some machines and with some parameters making it IMPOSSIBLE
to debug. Not goint into details, let's me tell you that I'am making it
on my home machine (Gentoo, AMD64) and trying from time to time on
remote big gun - Solaris (2.9 sparc). As you can predict, it (almost)
always works fine at home but on Solaris it gives me Seqmentation Foults
and Bus Errors (I've never known that there is such a thing.. till now
:). I'm doing a lot of malloc/realloc/free stuff so I think there have
to be some memory leaks...

I've googled a lot and I've tried some stuff:

1. It seems, that some time ago (when gcc was 2.9.5) it was a
-fcheck-memory-usage parameter that adds instruction to compiled
programam witch checks for memory leaks, out-of-bound errors and so on...
I belive, it would help me very much, but I think it is disabled now.
Was that mechanism removed or name of paremter changed?

2. Realizing that '-fcheck-memory-usage' is not going to help me I
started to look for outside soft. First was ccalloc... but I found it
useless.
It not only doesn't compile on my Solaris but also make some problems on
my linux while linking my stuff causing such blame as:
b.c: In function 'mkstr':
b.c:5: warning: incompatible implicit declaration of built-in function
'strcpy'
b.c:5: warning: incompatible implicit declaration of built-in function
'strlen'
/usr/lib/ccmalloc-gcc.o:(.eh_frame+0x12): undefined reference to
`__gxx_personality_v0'
collect2: ld returned 1 exit status
... so I given up with ccalloc

3. Next was valgrind. It seems to work fine, but only on my home linux.
I can't be compiled on sparc machine.
Thanks to it, I've fixed some minor bugs, but my stuff still segFoults.
I can't do anything more with that tool.

So, at the moment, I've tried everything I can find on google but I
haven't solve my leaking problem.

Do you have any experience with it? Do you know any other stuff which
may help me?

thank for ANY help
ps

I've also heard someting about mpatrol, and maybe I'll try it later.
 
R

Richard Heathfield

Cubanan said:
Hi guys,

Actualy I'm facing REALY big problem with program for my M.Sc.
For three days I'am looking for a bug, but I'm not able to find it. It
is realy disaster for me becouse it is one of them which appears only
sometime, on some machines and with some parameters making it
IMPOSSIBLE
to debug.

"It's not impossible." - Luke Skywalker.

Step 1: crank up your warning level.
Step 2: look at every cast - most of them will be wrong, and you should
remove them. Only leave the correct casts in place. Possibly this will
leave no casts in your program.
Step 3: recompile.
Step 4: look at every warning - most of them will highlight problems in
your code, and you should fix the code so as not to get these warnings,
without adding any casts in the process.
Step 5: look at every single pointer usage. A pointer can be in any one
of three states: (a) points to a valid object or function; (b) is a
null pointer; (c) has an indeterminate value. Make sure you never
dereference a pointer that is in state (b) or state (c).
Step 6: look at every call to free(). You should never pass a value to
free() that has not been returned to you by malloc(), calloc(), or
realloc(); having passed it that value, you should not use that value
again (unless of course it is handed to you by a subsequent call to
malloc(), calloc(), or realloc().
Step 7: look at every array access. If you have an array with N
elements, you should only ever be accessing elements 0 through N-1.
Step 8: look for assumptions that your program makes about itself, about
the way that it works. For example, you may have a function that
assumes a pointer passed to it may be dereferenced. If so, assert the
assumption! assert(p != NULL); before using that pointer. If the
assumption is correct, the assertion will never, ever fire. If the
assumption is incorrect, the assertion may fire, and then you'll know
where your code needs to be fixed.
Step 9: all of the above is general debugging advice, with steps 5
through 8 applying specifically to your immediate problem (although
you'll get much more benefit from them if you have first followed steps
1 through 4). If you still can't find the bug, start chopping bits out
of your program (#if 0 is useful for this), with the intent of reducing
the program to the minimal possible program that still manifests the
bug.
Step 10: if step 9 didn't make your bug obvious to you, post this
minimal program to comp.lang.c and ask again.
Not goint into details,

That's a mistake. See steps 1 through 10, especially 9 and 10.
:). I'm doing a lot of malloc/realloc/free stuff so I think there have
to be some memory leaks...

It's not a given, but yes, it sounds depressingly likely in this case.
b.c: In function 'mkstr':
b.c:5: warning: incompatible implicit declaration of built-in function
'strcpy'
b.c:5: warning: incompatible implicit declaration of built-in function
'strlen'

#include <string.h>
 
K

Kenneth Brody

Richard said:
Cubanan said:


"It's not impossible." - Luke Skywalker.

I thought Luke was the one who said "that's impossible" after Yoda
lifts the X-wing out of the swamp?
Step 1: crank up your warning level. [... snip ...]
:). I'm doing a lot of malloc/realloc/free stuff so I think there have
to be some memory leaks...

It's not a given, but yes, it sounds depressingly likely in this case.
[...]

Some platforms come with (or have available) debug versions of malloc
and friends, which can sanity check the heap and report as soon as a
corruption is found. (Assuming a corrupt heap is the cause, of course.)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
R

Richard Heathfield

Kenneth Brody said:
I thought Luke was the one who said "that's impossible" after Yoda
lifts the X-wing out of the swamp?

Yes indeed, but when he was still New, he was more Hopeful:

"It's not impossible. I used to bullseye womp rats in my T-16 back home,
they're not much bigger than two meters."
 
J

jacob navia

Cubanan said:
Hi guys,

Actualy I'm facing REALY big problem with program for my M.Sc.
For three days I'am looking for a bug, but I'm not able to find it. It
is realy disaster for me becouse it is one of them which appears only
sometime, on some machines and with some parameters making it IMPOSSIBLE
to debug.

This kind of bugs are just a bit more difficult to debug
than the others, but basically they are NOT very difficult
since you have a segmentation fault, what greatly SIMPLIFIES
debugging. You have a starting point.

Much more difficult to debug are WRONG RESULTS, without any trap
at all. THOSE are nightmares.

Not goint into details, let's me tell you that I'am making it
on my home machine (Gentoo, AMD64) and trying from time to time on
remote big gun - Solaris (2.9 sparc).

You should go into the details if you want help here.
>
As you can predict, it (almost)
always works fine at home but on Solaris it gives me Seqmentation Foults
and Bus Errors (I've never known that there is such a thing.. till now
:). I'm doing a lot of malloc/realloc/free stuff so I think there have
to be some memory leaks...


Memory leaks do not provoke a segmentation fault, not immediately anyhow.
I've googled a lot and I've tried some stuff:

1. It seems, that some time ago (when gcc was 2.9.5) it was a
-fcheck-memory-usage parameter that adds instruction to compiled
programam witch checks for memory leaks, out-of-bound errors and so on...
I belive, it would help me very much, but I think it is disabled now.
Was that mechanism removed or name of paremter changed?

Do not use obsolete tools. gcc 2.95 is no longer maintained since
ages.
2. Realizing that '-fcheck-memory-usage' is not going to help me I
started to look for outside soft. First was ccalloc... but I found it
useless.
It not only doesn't compile on my Solaris but also make some problems on
my linux while linking my stuff causing such blame as:
b.c: In function 'mkstr':
b.c:5: warning: incompatible implicit declaration of built-in function
'strcpy'
b.c:5: warning: incompatible implicit declaration of built-in function
'strlen'
/usr/lib/ccmalloc-gcc.o:(.eh_frame+0x12): undefined reference to
`__gxx_personality_v0'
collect2: ld returned 1 exit status

This means that you are mixing c and c++ code.

Link with g++ or eliminate the c++ code.
.. so I given up with ccalloc

Yes, the obvious reason is that you do not debug. You just try stuff,
without putting the effort to find out WHY things fail.

Type __gxx_personality_v0 in google and see what happens.
3. Next was valgrind. It seems to work fine, but only on my home linux.
I can't be compiled on sparc machine.

It is Intel architecture specific as far as I remember.
Thanks to it, I've fixed some minor bugs, but my stuff still segFoults.
I can't do anything more with that tool.

So, at the moment, I've tried everything I can find on google but I
haven't solve my leaking problem.

You haven't tried the debugger.
Do you have any experience with it?

Yes, I have used a debugger a lot. I even wrote one.
First, start your program under debugger control. Wait till
the crash happens. THEN:

See what is happening, examine the values of the variables,
DEBUG!!!

If the program runs under linux and crashes under Solaris
it could be an alignment problem. Do you make casts anywhere
without taking ints only at correctly aligned positions?

char *pData;

int *pintData = pData+3;

int m = *pintData; // should crash in Solaris... if you use Sparc.
> Do you know any other stuff which
may help me?

Yes. Use the Solaris debugger. Read the debugger documentation and
DEBUG!
thank for ANY help
ps

I've also heard someting about mpatrol, and maybe I'll try it later.

Stop wasting your time.

Debug your code. You will have to do it anyway.

jacob
 
O

Old Wolf

As you can predict, it (almost) always works fine
at home but on Solaris it gives me Seqmentation Foults
and Bus Errors (I've never known that there is such a thing..

That could be due to alignment errors, e.g.
incorrect code like:
char *ptr = malloc(100);
[...]
int x = * (int *) (ptr + 2);

Of course, it could be something else entirely.
First order of business would be to decide
whether your code is C or C++, and make sure
you are using the appropriate compiler or linker.
Don't compile C code with g++ !
 
I

Ian Collins

Cubanan said:
Hi guys,

Actualy I'm facing REALY big problem with program for my M.Sc.
For three days I'am looking for a bug, but I'm not able to find it. It
is realy disaster for me becouse it is one of them which appears only
sometime, on some machines and with some parameters making it IMPOSSIBLE
to debug. Not goint into details, let's me tell you that I'am making it
on my home machine (Gentoo, AMD64) and trying from time to time on
remote big gun - Solaris (2.9 sparc). As you can predict, it (almost)
always works fine at home but on Solaris it gives me Seqmentation Foults
and Bus Errors (I've never known that there is such a thing.. till now
:). I'm doing a lot of malloc/realloc/free stuff so I think there have
to be some memory leaks...
Best to ask on a platform specific group for the available tools.

<OT> Use the native Sparc compiler and run the program under dbx with
access checking enabled.</OT>
 
C

Cubanan

Thanks for your reply.

It seems, that I've been following your steps from 1 to 8 since the
problem occured. Also, I've even been using -Wall and -pedantic
parameters, which gives me no warnings (I'm not kidding)!. Can I crank
up my warning level even more?

Now, the only hope is chopping my soft as you described in step 9...

Wish me luck :)

Cubanan

Richard Heathfield napisa?(a):
 
C

Cubanan

Thanks for your advices Jacob.

Of course I AM debuging my program all the time. I fact, I've fixed a
lot of minor bugs in that way. Unfortunatelly, there are two problems
that make my big bug impossible to find, as follows:

1. My program behave in many diffrent ways depends on parameters I put
while compiling and during runtime. Putting such a stuff to g++ like -g
-Ox and feeding my program with different data makes my program to: runs
normal and makes fine output, cause segFoults or busErrors at different
places. Sometime, it is enough to add -g to g++ and makes my program
work good... and makes debuging useless

Yes, of course... there are always some parameters and input data which
cause segFoult during gdb-ing. BUT (and here comes the real pain):

2. All such a segFoults are uncatchable becouse they appear in free()
procedure. They does'n appear in my code. I can only follow computation
to find the right free(). It is obvious that real bug have place
somewhere earlier... that's why I'm rather looking for a stuff that
detect memory leaks and all the problems assiociated with that than
debugign it one more time.

Any hint may help...

Cubanan


jacob navia napisa³(a):
 
I

Ian Collins

Cubanan wrote:

Please don't top-post.
2. All such a segFoults are uncatchable becouse they appear in free()
procedure. They does'n appear in my code. I can only follow computation
to find the right free(). It is obvious that real bug have place
somewhere earlier... that's why I'm rather looking for a stuff that
detect memory leaks and all the problems assiociated with that than
debugign it one more time.
You are either freeing something twice, freeing something that wasn't
returned by malloc, or damaging the allocator's internals by writing
beyond a block or through a freed pointer.

The course of action I posted earlier will help identify which one.

Tools will help identify the type of crash, but you will still have to
find the root cause.
 
R

Richard Heathfield

Cubanan said:
Thanks for your reply.

It seems, that I've been following your steps from 1 to 8 since the
problem occured. Also, I've even been using -Wall and -pedantic
parameters, which gives me no warnings (I'm not kidding)!. Can I crank
up my warning level even more?

You can. The warning switches I use on gcc are:

-W -Wall -ansi -pedantic -Wformat-nonliteral -Wcast-align
-Wpointer-arith -Wbad-function-cast -Wmissing-prototypes
-Wstrict-prototypes -Wmissing-declarations -Winline -Wundef
-Wnested-externs -Wcast-qual -Wshadow -Wconversion -Wwrite-strings
-ffloat-store -O2

If you wish to discuss gcc further (e.g. to comment on the above), I
suggest that you take up that sub-discussion in a gcc newsgroup, since
the details of particular compilers are off-topic here.
Now, the only hope is chopping my soft as you described in step 9...

If you do this properly, one of two things will happen:

1) you will find and fix your problem;
2) you will end up with the smallest possible C program that still
compiles correctly and exhibits the problem.

If it's (2), post that program here, and we'll take a look.
 
C

Carramba

Richard Heathfield skrev:
Cubanan said:


You can. The warning switches I use on gcc are:

-W -Wall -ansi -pedantic -Wformat-nonliteral -Wcast-align
-Wpointer-arith -Wbad-function-cast -Wmissing-prototypes
-Wstrict-prototypes -Wmissing-declarations -Winline -Wundef
-Wnested-externs -Wcast-qual -Wshadow -Wconversion -Wwrite-strings
-ffloat-store -O2

[Off-topic]
just got curios... do you use all these switches at once?
 
R

Richard Heathfield

Carramba said:

[Off-topic]
just got curios... do you use all these switches at once?

Yes.

If you don't find your bug as a result of creating a minimal compilable
program that still exhibits the problem, post that minimal program here
for analysis.
 
S

santosh

Carramba said:
Richard Heathfield skrev:
Cubanan said:


You can. The warning switches I use on gcc are:

-W -Wall -ansi -pedantic -Wformat-nonliteral -Wcast-align
-Wpointer-arith -Wbad-function-cast -Wmissing-prototypes
-Wstrict-prototypes -Wmissing-declarations -Winline -Wundef
-Wnested-externs -Wcast-qual -Wshadow -Wconversion -Wwrite-strings
-ffloat-store -O2

[Off-topic]
just got curios... do you use all these switches at once?

I would presume that using pieces of them dilutes their effectiveness.
 
F

Flash Gordon

jacob navia wrote, On 25/07/07 18:31:

Default User wrote:
> For short replies top posting is better.

It is only better if you want people to ignore you.
> You see? Didn't have to scroll down a lot :)

No, I had to see your comment was at the top, look down to see what it
was in response to, then look back up to see what you had said, and
finally I had to fix your rude behaviour. A lot *more* work.

You have two choices, either conform to the norms as has been requested
and have people help you, or don't and find that the most knowledgeable
people decide to ignore you.
 
K

Keith Thompson

jacob navia said:
For short replies top posting is better.
Wrong.

You see? Didn't have to scroll down a lot :)

Since the whole thing fit on one screen, I didn't have to scroll at
all.
 
K

Kelsey Bjarnason

[snips]

-W -Wall -ansi -pedantic -Wformat-nonliteral -Wcast-align
-Wpointer-arith -Wbad-function-cast -Wmissing-prototypes
-Wstrict-prototypes -Wmissing-declarations -Winline -Wundef
-Wnested-externs -Wcast-qual -Wshadow -Wconversion -Wwrite-strings
-ffloat-store -O2

[Off-topic]
just got curios... do you use all these switches at once?

Using them one at a time requires a lot of extra compilation time. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top