This post contains the promised C code version of the infix to Polish
notation convertor. When run from the command line it should execute
all cases used to test the C Sharp version correctly. Its string
handling is simpler owing to the limitations of C.
Okay.
#include <stdio.H>
#include <stdlib.H>
Don't capitalize the "H" -- while it may happen to work on your system,
it means the code won't compile on others. Even an easily-fixed compile
error makes people likely to discount your code.
#define MAX_POLISH_LENGTH 100
int errorHandler(char *strMessage)
{
printf("\n%s\n", strMessage); return 0;
}
This always returns the same value, why does it have a return value at
all?
If this is for an error message, why isn't it using stderr for its output?
The double-\n thing is a bit unusual; normally a convention is adopted
where messages should either end with (normally) or start with (less
often) a newline, rather than providing two. In particular, if there are
two consecutive error messages for some reason, this would produce a blank
line between them.
It would seem much more useful to implement this function using ... arguments
and the underlying vfprintf() to allow formatted messages to be displayed.
If I were writing a function to this effect, I'd probably do:
void
errorHandler(char *msg, ...) {
va_list ap;
va_start(ap, fmt);
vfprintf(stderr, fmt, ap);
va_end(ap);
}
You prefix the variable's name with a type. While that certainly CAN be
a reasonable thing, it's important to ensure that the type used is a real
abstract type, not merely a language type -- language types are already
type checked.
int errorHandlerSyntax(int intIndex, char *strMessage)
You're using Systems Hungarian here, which is crap. Don't. Apps Hungarian
is what Simonyi proposed and demonstrated to be useful. Systems Hungarian
isn't useful, and leads to bugs later.
{
printf("\nError at character %d: %s\n",
intIndex,
strMessage);
return 0;
}
This is a useless message. It doesn't tell the user what the character
IS, nor does it give the user any feedback as to the nature of the message.
And, again, suggest that you implement format arguments.
int stringLength(char *strInstring)
{
int intIndex1;
for (intIndex1 = 0;
strInstring[intIndex1] != '\0';
intIndex1++) { }
return intIndex1;
}
Why the **** did you even write this? strlen() exists for a reason.
And just for extra credit: This name is reserved for use exclusively
by the implementation, since it starts with "str" followed by a lowercase
letter. In practice it's likely harmless in modern systems because
names are usually case-sensitive, and the standard doesn't define library
functions with camelCase.
But it's still stupid.
int stringAppendChar(char *strInstring,
char chrNew,
int intMaxLength)
{
int intLength = stringLength(strInstring);
if (intLength >= intMaxLength - 1)
{
errorHandler
("Cannot append character: string too long");
return 0;
}
strInstring[intLength] = chrNew;
strInstring[intLength + 1] = '\0';
return -1;
}
int
stringAppendChar(char *s, char c, size_t max) {
char *end = strchr(s, '\0');
if (end - s < max) {
*end++ = c;
*end = '\0';
return end - s;
} else {
return -1;
}
}
Note that I've changed the sense of the return value. This is because
by convention -1 indicates a failure. Returning the actual length,
rather than a meaningless value, gives the caller some additional
information.
Note that I've preserved the probable-bug, which is that if you have an
array of 15 characters, and you call this function with the limit 15 on
an array containing 14 non-zero characters and a null byte,
it'll write the 16th character of the 15-character array. That's almost
always a bad choice.
I've omitted the ridiculous error message call, for reasons which shall
become apparent.
int appendPolish(char *strPolish, char chrNext)
{
return
((*strPolish == '\0' || *strPolish == ' ')
?
-1
:
stringAppendChar(strPolish,
' ',
MAX_POLISH_LENGTH))
&&
stringAppendChar(strPolish,
chrNext,
MAX_POLISH_LENGTH)
&&
stringAppendChar(strPolish, ' ', MAX_POLISH_LENGTH);
}
This is just plain horrid. The layout is surreal, and the logic
opaque.
I think you mean:
int appendPolish(char *strPolish, char chrNext)
{
if (*strPolish == ' ' || *strPolish == '\0')
return -1;
... wait, this doesn't even make sense.
}
This function aborts if the FIRST character of strPolish is a space or
a null, but then appends characters off at the end of strPolish. This
makes no sense. You'll have to put in some kind of explanatory comment
to justify that.
There's also the fact that calling stringAppendChar three times in a row
is stupid. Figure out how many characters are left, and if there's enough
room for three, append all three at once. Appending one or two of them
is not useful.
int mulFactor(char *strInfix,
char *strPolish,
int *intPtrIndex,
int intEnd)
{ /* mulFactor = LETTER | NUMBER | '(' expression ')' */
int intIndexStart = 0;
int intLevel = 0;
char chrNext = ' ';
int intInner;
if (*intPtrIndex > intEnd)
return errorHandlerSyntax(*intPtrIndex,
"mulFactor unavailable");
This is a crappy error message. The problem has nothing at all to do with
mulFactor. It has to do with you having, apparently, run out of room
or something. So say so; give a message like "string too long".
This is the point where you should be calling a different function that
handles parenthesized expressions, IMHO.
if (!expression(strInfix,
strPolish,
&intInner,
*intPtrIndex - 2))
Ugh.
Seems like this would be a great time to refactor this more sanely. In
particular, you're having the higher-level code (mulfactor) try to guess
the range of the substring.
What happens when someone asks you to support strings? Then suddenly
mulfactor() has to be smart enough to find the end of:
(a + ")")
and you're making a ton of work. Wrong. When you see the (, pass it off
to the function which tries to find an expression followed by a trailing ),
and have the expression parser hand what it found so far back up when it
sees the ) -- at which point the parens function finds its trailing ) and is
happy.
char *infix2Polish(char *strInfix)
{
int intIndex = 0;
char *strPolish = malloc(MAX_POLISH_LENGTH);
Sure enough, you did what I thought you'd do. You allocate MAX_POLISH_LENGTH
characters, but then, you use MAX_POLISH_LENGTH as the cap you pass to
strAppendChar, meaning that if your Polish string contains MAX_POLISH_LENGTH-1
characters, you will end up writing one past the end of the array.
It turns out that the fencepost conventions people use in C are there for
a good reason.
if (strPolish == 0)
errorHandler("Can't get storage");
strPolish[0] = '\0';
Since errorHandler doesn't exit in any way, you're still going to segfault
(or whatever) when you reach this line.
if (!expression(strInfix,
strPolish,
&intIndex,
stringLength(strInfix) - 1))
{
errorHandler("Can't parse expression");
return "";
Oh, that's great. You are now returning pointers from this function which
MAY OR MAY NOT be the results of malloc(), so you can't free them safely.
This should be:
strPolish[0] = '\0';
return strPolish;
printf("Expect 10 113 + a *: %s\n",
infix2Polish("(10+113)*a"));
Because infix2Polish allocates memory, but you never free the results, this
is a memory leak.
int main(int intArgCount, char **strArgs)
Renaming the args to main, not useful. Declaring them when you're not using
them, also not useful.
return without a value in function not returning void
Okay, I will concede one point: You would actually be better off reading
Schildt's books than muddling along without. You'd be even better off reading
a better C book, but really, any port in a storm.
-s