Seriously struggling with C

F

Flash Gordon

Dik said:
Again not received the original, that is why I respond to this.



What I am missing here is that using breakpoints debugging code can be
*more* time consuming than using printf's to give the state. I once
had to debug a program I had written (80k+ lines of code). On some
machine it did not work. It appeared that on the umpteenth occurrence
of some call to some routine something was wrong. It is impossible
to detect such using breakpoints or watchpoints. Using proper printf's
and scrutinising the output will get you much faster the answer to why
it did not work.

Personally I find debuggers extremely useful under *some* conditions. In
other situations, I find printf (or a more sophisticated logging system)
far more useful.

One true, but extreme, example where a debugger and ICE combination were
invaluable was when trying to find what was causing all units to crash
when taken out of storage and powered up. Every unit crashed in about
the same place in its power up tests, all wrote garbage over the
display, and generally gave all the symptoms of the processor having run
off in to the wild blue yonder for some reason. I had examined the code
(which I did not write) on a few occasions trying to find any possible
reason for the crashes. I could find none. After many attempts at
playing with various break condition I eventually caught the problem. I
could see quite clearly in the trace that just before it all went to pot
the processor had read a *different* instruction than the ROM actually
contained. Before anyone convinces me that debuggers are of no use (most
have said limited use) they will have to explain to me how I could have
found that and proved it to anyone else *without* the use of the debugger.

Before anyone says ah, but that is a once in a lifetime situation, I've
also managed to catch other "impossible" crashes in debuggers and
demonstrate to people that it was actually the hardware doing something
screwy.

I've also used logic analysers in the same way I might use a debugger to
see what a program is doing where I had no way of capturing realistic
input data. The code was actually implementing a control loop, so the
input for one loop depended on the output of the previous loop *and* the
outside world. Capturing selective data with some very clever triggers
(as complex as you can use with many debuggers) I could then use the
information to work out how the algorithm was failing. I actually used
this method on at least three different algorithms on the same system,
and also used it to prove to the HW engineers yet again when the
hardware was faulty.

A bigger use for debuggers for me is when we have built a beta test or
production version of the software (a lot of which is not written by me)
and someone doing testing can easily crash the software but I can't (or
a customer has crashed it when coming in to do testing for us) and I
attach the debugger to examine what state they have got the program in
to. Sometimes the call stack is sufficient to point in the right
direction, sometimes examining the states of variable provides a big
insight, often I just pass the information on to another developer who
then examines the code and finds the problem.

Sometimes I use a debugger to break the code at specific points and see
what the state is because I am too lazy to add in the printf statements
and rebuild.

However, I am gradually extending the logging throughout the code in a
way that can easily be enabled at runtime (by setting an environment
variable) and as I extend it to cover more of the functionality of the
program I am gradually finding it of more and more use.

So my position is both tools have there usage, and which you use more
will depend on a lot of things outside your control, such as the quality
of the HW, the quality of code written by others, the variability of
external inputs, how reproducible problems are etc.

I almost forgot, another time when a debugger was invaluable was whith a
highly complex processor where I had thought a particular combination of
options on an assembler instruction was valid, the assembler accepted
it, but stepping through in the debugger because I could not see how the
code was failing I saw that the disassembly showed a roll where I
specified a shift. Not C, but a use of a debugger worked where other
tools had failed and neither myself nor another software developer could
see anything wrong with the code. On this code we really were after
every clock cycle we could get and it was sometimes worth the half hour
it took to work out if a particularly complex instruction was allowed.
--
Flash Gordon
Living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidlines and intro -
http://clc-wiki.net/wiki/Intro_to_clc
 
H

Herbert Rosenau

Also when the problem occurs on something like the millionth call to some
routine?

Yes. A conditional breakpoint does the trick. Let the call pass
undebugged 999.999 times and trace the 1,000.000th run through.

A debugger is a nice tool to catch bugs of nearly all kinds but you
have to use your brain to use it right.

You may fidget around your code for weeks with the debugger when you
can't get a plan how to get the point failing its job. You will debug
only minutes through some hundret millions lines of recursive code
when you knows what you needs. Using a debugger needs to know how to
use that tool right and having a plan how to reach the corner the
cause for the bug sits. Then inspect the code and data until you sees
what goes why wrong. Often enough you can path the data while editing
the source and go on to the next flaw.

A debug session of less than 30 minutes will avoid days of
implementing debug printf, running the program, revert the prits to
other datas, endless recompiles to get the output you needs, only to
see that the data you prints out at least absolutely helpless and
restart the cycles of edit, compile, only to fail again and again.

When you have learned how to use your debeg effectively you would fire
up your debuggee under control of the debug, set conditional and
unconditional break- and watchpoints and then start the run to get a
picture of the variable on the critical points, tracing throu the
suspicious statements, running over uncritical functions until you
have it.

Then, when all bugs seems to be fixed you will use other tools for
automated regression tests, falling back to the debug when ever there
is a need for until all test conditions are flawless completed and the
application is ready for either public beta or GA.

Having code inspections of any kind will not preserve you from having
bugs in the code, even as they will reduce them significantly if the
inspectors are excellent programmers.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 
I

Ian Collins

Micah said:
I've only recently started using this model as much as possible, and
it has turned out to be quite helpful for me.
Good, spread the word!
However, I personally still use a debugger quite a lot. If for no
other reason, I frequently like to reassure my paranoid self that the
unit-test itself is actually testing what I think it is. As a
side-benefit, stepping through the code this way (which doesn't
technically require the aid of a debugger) can sometimes reveal
further unit tests that I have neglected to write.
This is typical of someone starting out in TDD, I was exactly the same.
Over time your tests will improve and you will learn to trust the
process and your tests.
 
F

Flash Gordon

Ian said:
Richard G. Riley wrote:

The sense isn't false if the tests are good and written first. In my
opinion, test added after the code is written are second rate.

On one of the projects I worked on early in my career (written in
Pascal) I don't think any of us ever used the debugger and we did not
write the tests until after we wrote the code, and even then the tests
were system level. However, when we did write the tests we went a long
way out of our way to try to think of every single way we could break
the system. Running out software with some of the kit switched off,
unplugging cables whilst it was running, swapping 525 line cards in to
what was meant to be a 625 line system (we had a 525 line variant with
the same code base) etc. During that testing we found a significant
number of bugs. During the remaining 10 years of my time in the company,
through many versions of the SW, including getting fresh graduates to no
experience or domain knowledge to do changes, the customers found very
few bugs. However, rerunning these manual system level tests after
changes *did* find problems, and each time the customer found a bug we
extended our tests to catch the bug.

So tests written after the code can be *very* effective, but you have to
actively *try* to break the code in your testing rather than trying to
prove that it is correct.

I've seen far more problems with testing written either by an
independent team or where the tests have been designed before the coding
where people have been trying to prove the code correct than I have with
tests written after the fact with people actively trying to prove the
code is wrong. Of course, the ideal would probably be to write the test
first but to write them as an active attempt to prove the software *wrong*.

I still stand by the opinion that in some situation debuggers are
invaluable, even though in this project we didn't use them and generally
did not even need debugging output apart from a debug output on crash.
Highly partitioned SW and destructive testing proved more than sufficient.
--
Flash Gordon
Living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidlines and intro -
http://clc-wiki.net/wiki/Intro_to_clc
 
F

Flash Gordon

Chris said:
This is where and ICE is essential Let it run the system using filters
and watch points Also most good ones will let you out timing constraints
and bands on Also conditional break points with actions etc It's
virtually the only way to find this sort of problem (assuming you have
used static analysis to get rid of the silly stuff first)

OK, I'll just open up the PC and hook up and ICE. Now, I wonder what
location Linux will load the application at this time...
However printf is just about the worst thing you can use in this case as
it changes the memory map and the timing

The rest of what the server is doing (in my case these days) also
changes the timings. For example, if the 201st user is running a complex
report it will slow down the daemon that is making a SOAP request. As to
the memory map, these days I have a logging system (which wraps up
printf calls) which allows me to enable debug logging when required even
on a production build with no change to the SW and so minimal change to
the memory map. Disabling the optimiser so the debugger is more useful
sometimes prevents the program from crashing so using a debugger can be
*harder* than using printf statements in at least some situations.

With one recent problem, adding in calls to the debug logging framework
to more sections of the code allowed me to eliminate one of the two
programmes as the cause of timing related problem. I could see quite
clearly from the log that this program was doing the correct thingss in
the right order. I then went through using more printfs (well, calls to
a similar debugging system implemented in Java) and it showed exactly
what the bug was that allowed things to go wrong when the timing was
exactly wrong.

On an embedded system I worked on, on the other hand, being able to
break all 20 processors approximately simultaneously allowed me to
examine the state of various parts of the system (often only looking at
a few of the 20 debuggers that were running in synchronisation) and then
resume running without having had things get seriously out of step. This
was invaluable. Sometimes on this system I also literally had to sit
down and count clock cycles on printouts to work out what the timing
relationship would be.
--
Flash Gordon
Living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidlines and intro -
http://clc-wiki.net/wiki/Intro_to_clc
 
I

Ian Collins

Flash said:
On one of the projects I worked on early in my career (written in
Pascal) I don't think any of us ever used the debugger and we did not
write the tests until after we wrote the code, and even then the tests
were system level. However, when we did write the tests we went a long
way out of our way to try to think of every single way we could break
the system. Running out software with some of the kit switched off,
unplugging cables whilst it was running, swapping 525 line cards in to
what was meant to be a 625 line system (we had a 525 line variant with
the same code base) etc. During that testing we found a significant
number of bugs. During the remaining 10 years of my time in the company,
through many versions of the SW, including getting fresh graduates to no
experience or domain knowledge to do changes, the customers found very
few bugs. However, rerunning these manual system level tests after
changes *did* find problems, and each time the customer found a bug we
extended our tests to catch the bug.
What you are describing are what I'd call acceptance tests, working on
the code form the outside confirming that the system behaves as expected
by the customer. Unit tests written as part of the TDD process test
individual components of the system, down to individual function level
form the inside. They test the logic according to the programmer's
understanding.

I'd always recommend both types of testing.
So tests written after the code can be *very* effective, but you have to
actively *try* to break the code in your testing rather than trying to
prove that it is correct.
Indeed. But not unit tests written after the code.
I've seen far more problems with testing written either by an
independent team or where the tests have been designed before the coding
where people have been trying to prove the code correct than I have with
tests written after the fact with people actively trying to prove the
code is wrong. Of course, the ideal would probably be to write the test
first but to write them as an active attempt to prove the software *wrong*.
When doing TDD, every unit test fails until the code to make it pass is
written.
 
C

CBFalconer

Ben said:
.... snip ...

I doubt that anyone here is trying to say that a debugger cannot
be useful for finding bugs. I personally would take the position
that a debugger is a tool that *can* be used for finding bugs.
It is more of a personal preference whether the debugger *should*
be the first avenue of attack for hunting a bug. For me,
personally, it isn't; for you, I can see that it is.

This interminable thread was launched when one luser troll (since
plonked for refusal to stay on topic) insisted that code should be
formatted to ease debugger use, and I responded with the fact that
I have not used a debugger in anger for years. The troll then made
nasty noises and became even more objectionable. It has since been
seen here only in quotes.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>
 
F

Flash Gordon

Herbert said:
Yes. A conditional breakpoint does the trick. Let the call pass
undebugged 999.999 times and trace the 1,000.000th run through.

A debugger is a nice tool to catch bugs of nearly all kinds but you
have to use your brain to use it right.

<snip>

Oh look, my debugger broke on the file open failing. <fx: looks at file
system> Odd, the file is there and with correct permissions. <fx: Looks
at nicely timestamped logs from system here and system 200 miles away>
Ah, I see, that server 200 miles away is taking longer than the
specified time to put the file on my server.

There are times when debug logs can be *far* easier to use than a
debugger. Not always in my opinion, but they do exist.

Or another situation that really has happened to me. Two different
programs one in Java the other in C are interacting in an incorrect
manner. However, the system only fails more than once every few weeks
when under heavy load at customer sites in the week before month end
when they are running hundreds of invoices through the system hourly. Do
I ask my customer if they will let me know each time they start a new
session so I can attach a debugger to it (I can't wait the minimum
likely time of weeks minimum it would take for me to replicate the
problem once and don't know *exactly* what the customers 50 users in
that department are doing to cause it) or do I spend an hour putting in
some debug logging and ask them to run with that build for a bit?

I chose putting in some debug logging. I had a significant piece of the
puzzle a couple of days later (they could not install the debug build
immediately) and the next day I had the solution. Far less work than
using a debugger would have been.

It is quite common for those reporting bugs to miss out some critical
piece of information in what they are doing. The addition of easily
enabled logging is gradually making it far easier for us to see exactly
what the customer is doing to cause the failure, something a debugger
can never do.
--
Flash Gordon
Living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidlines and intro -
http://clc-wiki.net/wiki/Intro_to_clc
 
F

Flash Gordon

Ian said:
What you are describing are what I'd call acceptance tests, working on
the code form the outside confirming that the system behaves as expected
by the customer.

No, we were not trying to confirm it behaved as expected. Quite the
reverse, we were being as devious as possible in trying to prove it did
*not* work. We succeeded in that. Our customer acceptance tests, on the
other hand, where designed to demonstrate it worked as expected and took
far less time.
> Unit tests written as part of the TDD process test
individual components of the system, down to individual function level
form the inside. They test the logic according to the programmer's
understanding.

We did tests at the system level designed to exercise the specific
functions, and sometimes specific if statements or exception trap.
I'd always recommend both types of testing.

As would I.
Indeed. But not unit tests written after the code.

In this instance they proved highly effective. In terms of customer
reported bugs and customer perception of the quality of the code, it is
probably about the most successful production project I have come
across. 50000 lines of code and I think under 10 customer reported bugs
in 10 years. Even if I am a factor of 10 out that is still only 1 bug
per 500 lines of code over a 10 year period, or 1 but per 5000 lines per
year.
When doing TDD, every unit test fails until the code to make it pass is
written.

That could be said of any test. If the test checks if 5 is returned when
the input is 3, then until you have put something inside the function
body of course it will fail.

When we were testing this code we tried to get the code to fail by
having devices absent that should always be there, intermittent
communications over what we knew were definitely reliable links, trying
to force it to do division by zero (faking it so that it missed the
reference frequency in a frequency response test) and so on. This is
something you can apply to unit testing, system level testing, or any
other form of testing, but it is also something that in my experience
many people do *not* do whatever form of testing they are doing.

I'm not disputing the benefits of TDD, nor saying that today I would do
things the same way we did in 1990 in the Test Engineering Department
(making test equipment) where I used to work. I'm saying that:
1) Methods other than TDD can be successful in the right situation
2) Whatever testing you are doing the tests should be designed to prove
in every conceivable way *and* in inconceivable ways that the code is
*wrong*.

Part of 2 is testing boundary conditions, part is forgetting what the
requirements on external systems are and what is possible for them (You
know that a user can't press a key 1000 time a second, don't you. Forget
that because you have forgotten that a HW fault could have the same
effect as a user pressing the key 1000 time a second) and part is
working on the assumption that you know damn well there is a bug
somewhere that you couldn't possibly conceive, so you need to test the
inconceivable.

BTW, I've seen a HW fault causing the same effect as a user pressing a
key at a stupidly high rate. The logic circuit basically became an
oscillator for as long as a key was held down, so I know damn well that
what most would consider inconceivable is not only possible
theoretically, but sometimes actually happens in real life.

I also spent time as I say working in the Test Engineering Department,
and since we were building test equipment (which also had to test
itself) we developed the attitude of assuming that the SW has to survive
and continue working properly as much as possible even if fundamental
parts of the system are failing in ways you can't conceive of, a
philosophy that I find very useful.
--
Flash Gordon
Living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidlines and intro -
http://clc-wiki.net/wiki/Intro_to_clc
 
D

Dik T. Winter

>
> Of course. Thats why breakpoints exist. I would do the following -
>
> 1) Isolate as close as possible to where/why bug is happening.

And, how do you exactly do that? Once I have isolated the problem as
close as possible, I know fairly fast what the problem is. And it does
not matter whether it is a compiler bug, a hardware bug, or a program
bug. I have encountered all three.

But one of my programs in its initial version gave that there were no
solutions to the problem. However, I did know that, mathematically,
solutions should exist, I only wrote the program to determine what the
actual solutions were. Where to start?

So, what I did do was insert printf's that did show whether the input was
properly stored. Next I removed the previous printf's and inserted printf's
that did show that the input data was actually interpreted correctly in all
cases. Going on this way I could determine the problem with only three
runs of the program. (Somewhere an increment was inside a loop rather
than outside the loop. The net effect was that many cases were missing,
and amongst them the cases that should provide solutions.)
> Whayt debuggers have you used? Maybe they were not suitable for the
> job in hand?

Oh well. I have used source code debuggers and instruction code debuggers
on many occasions, on more platforms than you can imagine. I disremember
all their names.
 
D

Dik T. Winter

>
> The debugger would format the data as it was declared : it would be
> fairly obvious to anyone stepping the code that a DOUBLE was being
> forced into an INT for example.

Yes, once you suspect that was happening in the code, it would be obvious
through a debugger. But it would be just as obvious looking at the
source (as I did).

The only problem was that the program did not give results close to what
was (mathematically) expected, so where to start? Something is wrong,
but you have really no idea what. (And indeed, mathematics proved to
be right and the program was wrong.)
 
D

Dik T. Winter

>
> That is why I use an ICE.

What *is* an ICE? In my 30+ years of programming experience I never
have seen the acronym.
> Especially where it is something like a
> certain combination of interrupts causing a problem in timing every now
> and again.

The program I write have nothing to do with interrupts and timing at all,
so what are you talking about?
> The ICE should be part of the unit and system test system. When things
> go wrong then it becomes the debugger.

What system are you talking about? I am doing a simple combinatorial
problem. A simple back-tracking problem, it goes wrong when your
back-tracking goes wrong.

BTW, changing memory lay-out can be beneficial when finding bugs. This
was one of the ways I detected an off-by-one error in the garbage
collector of the Algol 68 system we used quite a long time ago.
 
C

Chris Torek

[re tracing and debugging]
What *is* ICE? I never heard about it. And I am doing programming for
over 30 years.

"In-Circuit Emulator", typically a special piece of hardware you
put around a CPU (or entire card, or whatever) that captures the
signals at each pin (or card edge or whatever -- but note that
most card- or bus-level devices are not called ICEs but rather
called "logic analyzers", despite doing much the same thing).
Additional software interprets those and tells you what the CPU
(or other hardware) was doing at each clock cycle.

ICEs can be very useful in tracking down Heisenbugs. However, they
are not the be-all and end-all even at the hardware level. The
Heisenburg effect applies even there: sometimes adding wires adds
enough parasitic capacitance to change the behavior of a
supposedly-digital circuit.
 
C

Chris Torek

I also spent time as I say working in the Test Engineering Department,
and since we were building test equipment (which also had to test
itself) we developed the attitude of assuming that the SW has to survive
and continue working properly as much as possible even if fundamental
parts of the system are failing in ways you can't conceive of, a
philosophy that I find very useful.

Of course, this all has to be judged on a cost/benefit basis. It
is difficult to use the software to test itself if the CPU is not
even executing the boot instructions stored in the boot ROM, for
instance. :)

I often find it useful to check for "impossible" conditions in
low-level code, but I have to trade that against the lack (at that
point) of a strategy for dealing with such conditions, and the
slowdown effect of performing an "unnecessary" test.
 
V

Vladimir S. Oka

Dik said:
What *is* an ICE? In my 30+ years of programming experience I never
have seen the acronym.

ICE stands for In Circuit Emulator. It's often heavily used in
development and debugging of embedded systems. These, being non-hosted
environments, rarely, if ever, have an OS you can use to run
development applications.

Modern CPU manufacturers provide (a standardised) hardware interface
(e.g. JTAG) which exposes the CPU internals and allows you to poke into
it using a hardware plug-on connected to your PC which runs the
debugger. This then enables you to place breakpoints, read and write to
memory, stop and start execution, all from the comfort of your hosted
environment, and essentially without disturbing the system being
tested. The software tools provided usually (at least the good ones)
combine a debugger with at least an execution profiler, but may also
include other useful tools for analysing and debugging the system.

This is not a very exhaustive, and probably not pedantically correct
explanation, but I didn't want to be too lengthy (or my English fails
me this early morning).
The program I write have nothing to do with interrupts and timing at
all, so what are you talking about?

The problems with interrupts is that you generally have no idea when (or
even if) they'll happen. If you allow for more than one type (e.g.
keyboard, mouse) they can have different priorities and be allowed to
interrupt each other, i.e. one interrupt service routine (ISR, here's
one more acronym) can easily be interrupted itself for another to run.

Even in a single threaded execution model getting the priorities and
various dependencies wrong can wreak havoc, and be hard to debug. If
you have a multitasking environment, especially with hard real time
requirements, and multiple asynchronous interrupts, it gets even worse
(e.g. a mobile phone will typically run at least half a dozen tasks,
and can be interrupted from at least as much sources, at least one of
the tasks will have hard real time requirements, and will be severely
short of time anyway).
What system are you talking about? I am doing a simple combinatorial
problem. A simple back-tracking problem, it goes wrong when your
back-tracking goes wrong.

Probably the ones like a mobile phone. It's rarely possible to backtrack
your problem, especially with difficult bugs).

However, having been involved in just such development, and on the hard
real time part as well, I can attest that I had to resort to an ICE or
a debugger maybe three times in the past 5 years. Judicious debug
logging and analysing code at hand (third party legacy, no less) proved
to be very successful.

I found that the ICEs and the such are of most use for the developers of
hardware drivers for the embedded (and other) devices.

--
BR, Vladimir

It is wise to keep in mind that neither success nor failure is ever
final.
-- Roger Babson
 
V

Vladimir S. Oka

Flash said:
No, we were not trying to confirm it behaved as expected. Quite the
reverse, we were being as devious as possible in trying to prove it
did *not* work. We succeeded in that. Our customer acceptance tests,
on the other hand, where designed to demonstrate it worked as expected
and took far less time.

I think there's a misunderstanding on which side of the table you and
Ian are assuming you're sitting at. I believe Ian meant /he/ was a
customer running acceptance tests on, say, a piece of 3rd party
software. You seem to be looking at it from the 3rd party POV.

IMHO, both of you are correct. When we sell a device we run it through a
battery of standardised tests to prove to the customer it works. When
we buy in a piece of software to use in our device, we test it in all
sorts of nasty ways looking to break it.

Obviously, internally, we also have tests for /our/ code that try to
break it before it gets to the customer.

--
BR, Vladimir

The opposite of a correct statement is a false statement. But the
opposite
of a profound truth may well be another profound truth.
-- Niels Bohr
 
V

Vladimir S. Oka

Herbert said:
Yes. A conditional breakpoint does the trick. Let the call pass
undebugged 999.999 times and trace the 1,000.000th run through.

Not if you're running a system with (hard) real time requirements.
Conditional breakpoints slow things down horribly, and almost
invariably give you a system that just simply does not work, as it
breaks all the timing constraints. It's much better to add a condition
to the code itself, and when it hits N-th execution stop and dump out
any system state you're interested in.
 
H

Herbert Rosenau

Herbert Rosenau wrote:
Oh look, my debugger broke on the file open failing. <fx: looks at file
system> Odd, the file is there and with correct permissions. <fx: Looks
at nicely timestamped logs from system here and system 200 miles away>
Ah, I see, that server 200 miles away is taking longer than the
specified time to put the file on my server.

There are times when debug logs can be *far* easier to use than a
debugger. Not always in my opinion, but they do exist.

I've found never such a situation. It was always at lest
- knowing how to use the debug
- exact knowledge of how the system works
(what does an API do excactly on which condition and what side
effects
are prior, during and after the system api gets called are active
Or another situation that really has happened to me. Two different
programs one in Java the other in C are interacting in an incorrect
manner. However, the system only fails more than once every few weeks
when under heavy load at customer sites in the week before month end
when they are running hundreds of invoices through the system hourly. Do
I ask my customer if they will let me know each time they start a new
session so I can attach a debugger to it (I can't wait the minimum
likely time of weeks minimum it would take for me to replicate the
problem once and don't know *exactly* what the customers 50 users in
that department are doing to cause it) or do I spend an hour putting in
some debug logging and ask them to run with that build for a bit?

Oh, for such remote environment we had setup our CORBA applications
running on the same mashine as CORBA is designed to work local or
remote even under another OS. Setting up an test environment had not
even a single change of the code needed - but a bit defferent
environment. So debugging server and client in parallel was more nice
as to fiund some technices to setup debug logfiles.
I chose putting in some debug logging. I had a significant piece of the
puzzle a couple of days later (they could not install the debug build
immediately) and the next day I had the solution. Far less work than
using a debugger would have been.

It is quite common for those reporting bugs to miss out some critical
piece of information in what they are doing. The addition of easily
enabled logging is gradually making it far easier for us to see exactly
what the customer is doing to cause the failure, something a debugger
can never do.

I had an similar situation. One of our customers reported constantly
an abnormal end of one application - but nobody and nothing was able
to reproduce that until we crippled a mashine to exact the same
hardware environment the user had. So it was easy in 9 from 10 runs to
produce the abort - but it was impossible to find the cause for even
then. Using a debugger to catch it was resulting in unable to get the
abort reproduced! Using the same debugger then more advanced and it
was absolutely clear what the cause was and using the brain it was
easy to fix by manually syncronise the threads at that critical point
against the stand rules. The next version of the OS brought a new
system API to make that syncronising inside the kernel to make that
problem going away.

But anyway once found the cause me and all collegauges of my company
had learned how to avoud that problem by design. The cause was: when
the main thread was dying before any other thread was alredy die
(means loosed any occurence of the thread in the thread table of the
kernel) the whole process crashed.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,175
Messages
2,570,946
Members
47,497
Latest member
PilarLumpk

Latest Threads

Top