N
Neal Barney
I have a C program which runs on a device using a Zilog Z180
microprocessor. While it can address 1MB of RAM, it can only address
64KB at any given time. And of that only 16KB can be used for "stack
and heap space". So I'm running in a very memory constricted
environment.
The program "speaks" a proprietary protocol which sends ASCII strings
back and forth from the device to the server. Within the past couple of
months we've been noticing errors in our server logs and people have
been complaing of failures. Upon viewing server logs we've noticed the
string the device is sending is missing it's header.
The string gets built like so:
/* Global variable */
char Message[512];
/* Local variable */
char tmpMsg[512];
/* var1 - var4 (local variables) get populated with values... */
/* Build the header */
sprintf( tmpMsg, "|XX|%s|0|00.00|XXXX:XXXX|XXXX:%s|XXXX:%s|XXXX:%s",
var1, var2, var3, var4 );
/* X's are alphas and zeros are numbers */
if ( conditionTrue )
strcat( tmpMsg, ",S" );
else
strcat( tmpMsg, ",H" );
/* Continue building the rest of the string... */
/* Put msg length at beginning of string*/
len = strlen( tmpMsg );
sprintf( Message, "%04d%s", len, tmpMsg );
return;
Now when the device transmits the string to the server, the server logs
something similar to:
0089,S|XXXX:XXXX|XXXX:XXXX|... (and so on)
or sometimes something like
0093 ,S|XXXX:XXXX|XXXX:XXXX|... (and so on)
The entire header appears to be getting lost. The calculated message
length matches the size of the string without the header, which may
suggest:
a) The sprintf that places the header in tmpMsg is failing (but why?).
b) A bug in strcat for the C library I'm using is wiping out the
string/starting at the beginning of the string. (It doesn't do it later
in the same function or elsewhere in the program).
c) "Something else" is walking on that part of memory.
d) ...?
This is not 100% reproduceable (in fact, in the past 9 months I have
only personally seen it happen once on my test device, and I can't
reproduce it there). But we are seeing this happen in production
multiple times a day from different devices. Unplugging the device and
plugging it back in will often "fix" the problem (though it may occur
again in the future).
What are the most likely causes for this problem? Solutions?
microprocessor. While it can address 1MB of RAM, it can only address
64KB at any given time. And of that only 16KB can be used for "stack
and heap space". So I'm running in a very memory constricted
environment.
The program "speaks" a proprietary protocol which sends ASCII strings
back and forth from the device to the server. Within the past couple of
months we've been noticing errors in our server logs and people have
been complaing of failures. Upon viewing server logs we've noticed the
string the device is sending is missing it's header.
The string gets built like so:
/* Global variable */
char Message[512];
/* Local variable */
char tmpMsg[512];
/* var1 - var4 (local variables) get populated with values... */
/* Build the header */
sprintf( tmpMsg, "|XX|%s|0|00.00|XXXX:XXXX|XXXX:%s|XXXX:%s|XXXX:%s",
var1, var2, var3, var4 );
/* X's are alphas and zeros are numbers */
if ( conditionTrue )
strcat( tmpMsg, ",S" );
else
strcat( tmpMsg, ",H" );
/* Continue building the rest of the string... */
/* Put msg length at beginning of string*/
len = strlen( tmpMsg );
sprintf( Message, "%04d%s", len, tmpMsg );
return;
Now when the device transmits the string to the server, the server logs
something similar to:
0089,S|XXXX:XXXX|XXXX:XXXX|... (and so on)
or sometimes something like
0093 ,S|XXXX:XXXX|XXXX:XXXX|... (and so on)
The entire header appears to be getting lost. The calculated message
length matches the size of the string without the header, which may
suggest:
a) The sprintf that places the header in tmpMsg is failing (but why?).
b) A bug in strcat for the C library I'm using is wiping out the
string/starting at the beginning of the string. (It doesn't do it later
in the same function or elsewhere in the program).
c) "Something else" is walking on that part of memory.
d) ...?
This is not 100% reproduceable (in fact, in the past 9 months I have
only personally seen it happen once on my test device, and I can't
reproduce it there). But we are seeing this happen in production
multiple times a day from different devices. Unplugging the device and
plugging it back in will often "fix" the problem (though it may occur
again in the future).
What are the most likely causes for this problem? Solutions?