Optimisable?

C

craigbeanhead

Hi clc readers,

I'm working on a problem, and I've got some working code, but I think
it can be improved. In particular I'd like to get rid of the nasty
goto statement that I've used, but I'm unsure as to how to go about
rewriting that section.

The problem is to find a word of x length, where the sum of the
characters (using 1 for A, 2 for B, etc) is equal to y.

I'm calling the program like this: "myprog.exe 5 39" - this means
"find me all the words of length 5, where the characters add up to
39". I've got two solutions to the problem, one version creates
strings such as AAAAA, AAAAB, and the second version searches a
newline-delimited dictionary of words.

It seems that each time I post code here (or someone else posts code
here), it ends up being improved :) I hope someone here can help me
improve the code I've written so far.

Thanks,
Craig

Code follows:

/*Cut here*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define MAX_LINE_LENGTH 100

int findstring(int nLength, int nSum, char* sFile);
int makewords(int nLength, int nSum);
char tochar(int i);
int tonum(char c);
int strsum(char* s);
void ucase(char *s);

int main(int argc, char *argv[])
{
int nLength, nSum, nMatches;
char *stop = NULL;
char** words = NULL;

if(argc!=3)
{
puts("Please supply two arguments, the word length, and the desired
sum of the characters");
exit(EXIT_FAILURE);
}

nLength = (int)strtol(argv[1], &stop, 10);
nSum = (int)strtol(argv[2], &stop, 10);

/* nMatches = makewords(nLength, nSum, words); */
nMatches = findstring(nLength, nSum, "dictionary.txt");

printf("%d matches found\n", nMatches);

exit(EXIT_SUCCESS);
}

int findstring(int nLength, int nSum, char* sFile)
{
FILE* fp = fopen(sFile, "r");
char line[MAX_LINE_LENGTH + 1];
int i = 0;

while(fgets(line, MAX_LINE_LENGTH, fp) != NULL)
{
line[strlen(line)-1] = '\0';
upper(line);

if(strlen(line) != nLength) continue;
/* printf("Sum of %s is %d, Aiming for %d\n", line, strsum(line),
nSum); */
if(strsum(line) != nSum) continue;
printf("%s\n", line);
i++;
}

fclose(fp);
return i;
}

int makewords(int nLength, int nSum)
{
char *chars = malloc(nLength+1);
int i,j=0;

/* reset the memory to all 'A's */
for(i = 0; i<nLength; i++)
chars = tochar(1);
chars[nLength] = '\0';

/* forever */
while(1)
{
int i;

/* test for string sum */
if(strsum(chars) == nSum)
{
printf("%s\n", chars);
j++;
}

chars[nLength-1]++;

start_again:
for(i=nLength-1; i>=0; i--)
if(tonum(chars) >= 27)
{
if(i==0) return 0;
chars = 'A';
chars[i-1]++;
goto start_again;
}
}

free(chars);
return j;
}

/* Takes 1, returns 'A', takes 2, returns 'B' */
char tochar(int i)
{
return 'A' + i - 1;
}

/* Takes 'A', returns 1, takes 'B', returns 2 */
int tonum(char c)
{
return c - 'A' + 1;
}

/* Takes "ABC", returns 1 + 2 + 3 = 6 */
int strsum(char *s)
{
int i=0;
char *c;

while(*(c = s++))
i += tonum(*c);

return i;
}

/* takes "a WordOrTwo", returns "A WORDORTWO" */
void ucase(char *s)
{
while(*s++ = toupper(*s))
;
}
 
P

Pieter Droogendijk

On 4 Aug 2003 05:45:25 -0700
Hi clc readers,

I'm working on a problem, and I've got some working code, but I think
it can be improved. In particular I'd like to get rid of the nasty
goto statement that I've used, but I'm unsure as to how to go about
rewriting that section.

int makewords(int nLength, int nSum)
{
char *chars = malloc(nLength+1);
int i,j=0;

/* reset the memory to all 'A's */
for(i = 0; i<nLength; i++)
chars = tochar(1);
chars[nLength] = '\0';

/* forever */
while(1)
{
int i;

/* test for string sum */
if(strsum(chars) == nSum)
{
printf("%s\n", chars);
j++;
}

chars[nLength-1]++;

start_again:
for(i=nLength-1; i>=0; i--)
if(tonum(chars) >= 27)
{
if(i==0) return 0;
chars = 'A';
chars[i-1]++;
goto start_again;


instead of goto, try simply resetting the counter to it's initial state:
i=nLength;
continue;
note that the length is decremented by the loop, so leaving out '-1' is legal in
this case.
}
}

free(chars);
return j;
}

<snip>
 
A

Arthur J. O'Dwyer

I'm working on a problem, and I've got some working code, but I think
it can be improved. In particular I'd like to get rid of the nasty
goto statement that I've used, but I'm unsure as to how to go about
rewriting that section.

The problem is to find a word of x length, where the sum of the
characters (using 1 for A, 2 for B, etc) is equal to y.

I'm calling the program like this: "myprog.exe 5 39" - this means
"find me all the words of length 5, where the characters add up to
39". I've got two solutions to the problem, one version creates
strings such as AAAAA, AAAAB, and the second version searches a
newline-delimited dictionary of words.

A good study of the possible algorithms involved might help.
Here's a hint: for values 5 39, there is a mapping between the
strings you want and the bit-patterns

111100000000000000000000000000000000000000
111010000000000000000000000000000000000000
111001000000000000000000000000000000000000
....
110110000000000000000000000000000000000000
110101000000000000000000000000000000000000
....
000000000000000000000000100010000000001100
....
000000000000000000000000000000000000001111

So, if you can find an algorithm that produces those bit-strings,
you can find an algorithm that produces the strings you want.
Also, you can easily determine from what I've written above *how*
*many* strings there are for 5 39, how many for 2 3, etc., and
make sure that your current program is getting the right answers.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define MAX_LINE_LENGTH 100

int findstring(int nLength, int nSum, char* sFile);
int makewords(int nLength, int nSum);
char tochar(int i);
int tonum(char c);
int strsum(char* s);
void ucase(char *s);

int main(int argc, char *argv[])
{
int nLength, nSum, nMatches;

Please don't use tabs in Usenet posts. Set your editor
to use regular spaces instead. (Don't use Hungarian
notation either, but that's *slightly* less of a PITA
to decipher than badly-indented code.)
char *stop = NULL;
char** words = NULL;

if(argc!=3)
{
puts("Please supply two arguments, the word length, and the"
" desired sum of the characters");

Break lines at 65-79 characters. This is why string concatenation was
invented. :)
exit(EXIT_FAILURE);
}

nLength = (int)strtol(argv[1], &stop, 10);
nSum = (int)strtol(argv[2], &stop, 10);

Spurious casts. Remove them.
Also, check 'stop' for error conditions. You
went ahead and declared 'stop' (when you could've
just used NULL, for all the error-checking you do);
you might as well make sure nLength and nSum have
valid values before you continue.
/* nMatches = makewords(nLength, nSum, words); */
nMatches = findstring(nLength, nSum, "dictionary.txt");

printf("%d matches found\n", nMatches);

exit(EXIT_SUCCESS);
}

int findstring(int nLength, int nSum, char* sFile)
{
FILE* fp = fopen(sFile, "r");
char line[MAX_LINE_LENGTH + 1];
int i = 0;

Again, no error-checking on 'fp'. What I will usually do
in a case like this is open and close the file inside 'main',
and pass the (FILE *) to the processing function. That way,
'main' can use the argument list without passing chunks of
it to other functions, and my algorithm-type functions aren't
cluttered up with I/O handling.

Check that 'fp' is not NULL. What should you do if it is?
while(fgets(line, MAX_LINE_LENGTH, fp) != NULL)
{
line[strlen(line)-1] = '\0';

Silly. Although I see what you're trying to do here - you
want to remove any trailing '\n' from the input buffer.
So why not *say* that?

char *p;
if (( p = strchr(line,'\n') ))
*p = '\0';

(Double parentheses to silence gcc warnings, not for any
syntactical reason.)
upper(line);

if(strlen(line) != nLength) continue;

And then, since you don't use the '\n' anyway, wouldn't it be
easier to skip the whole '\n'-stripping thing and just write

if (strlen(line) != nLength+1) continue;

Unless you're actually going to try to make this a *real*
program, in which case you'll want to strip *all* whitespace
from the front and back of the line, not just a single character
from the back.
/* printf("Sum of %s is %d, Aiming for %d\n", line,
strsum(line), nSum); */

BTW: strsum() is not your identifier. It belongs to the compiler
implementation, because it begins with 'str' and then a lower-case
letter. Use 'str_sum' or 'sumLetters' or 'countalpha' or something.
if(strsum(line) != nSum) continue;
printf("%s\n", line);
i++;
}

fclose(fp);
return i;
}

int makewords(int nLength, int nSum)
{
char *chars = malloc(nLength+1);
int i,j=0;

Bad practice. Write

int i;
int j = 0;

to make it clear *which* variable is initialized and which
isn't. Note also the use of whitespace in my version.
/* reset the memory to all 'A's */
for(i = 0; i<nLength; i++)
chars = tochar(1);


Your comment doesn't match your code. The code
is setting everything equal to tochar(1), which
*happens* to be 'A', but the implication is that
that might change (otherwise, you'd have written
'A' literally in the code). I'd write:

for (i=0; i < nLength; ++i)
chars = 'A';

or, if I really cared enough to think about it,

memset(chars, 'A', nLength);

memset(chars, tonum(1), nLength);

would also work, if you wanted to keep the old
semantics for some reason - and would be faster.
chars[nLength] = '\0';

/* forever */
while(1)
{
int i;

/* test for string sum */
if(strsum(chars) == nSum)
{
printf("%s\n", chars);
j++;
}

chars[nLength-1]++;

start_again:
for(i=nLength-1; i>=0; i--)
if(tonum(chars) >= 27)
{
if(i==0) return 0;
chars = 'A';
chars[i-1]++;
goto start_again;
}


This should be a FAQ (in fact, I thought it was!).
Consider the mapping between the following values:

0 -> AAAAAAAAA
1 -> AAAAAAAAB
2 -> AAAAAAAAC
...
25 -> AAAAAAAAZ
26 -> AAAAAAABA
...

Now write a program that incorporates that mapping.
Consider how to convert from a number to its base-26
representation.
}

free(chars);
return j;
}

/* Takes 1, returns 'A', takes 2, returns 'B' */
char tochar(int i)
{
return 'A' + i - 1;
}

/* Takes 'A', returns 1, takes 'B', returns 2 */
int tonum(char c)
{
return c - 'A' + 1;
}

These two functions have terrible names (they sound like
they're meant to map 'A'->65 or whatever), and the functions
themselves don't work on EBCDIC. IOW, they're not completely
portable. You could write

int tonum(char c)
{
static const char letters[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
if (isalpha(c)) return strchr(letters, c) - letters;
return 0;
}

for example.

/* Takes "ABC", returns 1 + 2 + 3 = 6 */
int strsum(char *s)
{
int i=0;
char *c;

while(*(c = s++))
i += tonum(*c);

return i;
}

Yuck! Did you consider

int sum;
for (sum = 0; *s; ++s)
sum += tonum(*s);
return sum;

It's much easier to read than that nested
assignment thingie you've got there, and
doesn't need the extra pointer variable.
Also, 'i' has been changed to 'sum', reflecting
its purpose rather than its data type.
/* takes "a WordOrTwo", returns "A WORDORTWO" */
void ucase(char *s)

You called this function 'upper' in the code that
called it. I'm surprised it linked.
{
while(*s++ = toupper(*s))
;
}

Undefined behavior. You have written the pointerified
equivalent of

while (i++ = f(i)) ...

Diagnosis: You're still too enamored of obfuscation to
write good code. Try again, and this time *don't* *do*
*anything* *clever*!

HTH,
-Arthur
 
D

David Rubin

craigbeanhead wrote:

[snip]
The problem is to find a word of x length, where the sum of the
characters (using 1 for A, 2 for B, etc) is equal to y.

I'm calling the program like this: "myprog.exe 5 39" - this means
"find me all the words of length 5, where the characters add up to
39". I've got two solutions to the problem, one version creates
strings such as AAAAA, AAAAB, and the second version searches a
newline-delimited dictionary of words.

This is the same as solving "how many ways can I generate a monotonic
sequence of positive integers which sums to X?"

The first solution is to create a routine which finds sequences by
(recursively) adding subsequences. For example, X=5 produces

1+1+1+1+1
2+1+1+1
2+2+1
2+3
4+1
3+1+1
5
etc

The second solution, using a dictionary, is simply providing you can
pre-process the dictionary. Then, you just "sign" each word with its
associated value, sort, (store,) and search.

/david
 
B

Bjorn Reese

Arthur J. O'Dwyer said:
These two functions have terrible names (they sound like
they're meant to map 'A'->65 or whatever), and the functions
themselves don't work on EBCDIC. IOW, they're not completely
portable. You could write

int tonum(char c)
{
static const char letters[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
if (isalpha(c)) return strchr(letters, c) - letters;
return 0;
}

This will fail for international locales, as the letters array
only contains ASCII characters.
 
A

Arthur J. O'Dwyer

Arthur J. O'Dwyer said:
These two functions have terrible names (they sound like
they're meant to map 'A'->65 or whatever), and the functions
themselves don't work on EBCDIC. IOW, they're not completely
portable. You could write

int tonum(char c)
{
static const char letters[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
if (isalpha(c)) return strchr(letters, c) - letters;
return 0;
}

This will fail for international locales, as the letters array
only contains ASCII characters.

They won't fail; they just won't include international characters
in the letters list - which is perfectly fine, for the OP's purposes.
He's mapping 1..26 -> A..Z, not generically "numbers to letters."

All C implementations everywhere include the letters A to Z, no
matter what character set they use.

-Arthur
 
J

Jack Klein

Arthur J. O'Dwyer said:
These two functions have terrible names (they sound like
they're meant to map 'A'->65 or whatever), and the functions
themselves don't work on EBCDIC. IOW, they're not completely
portable. You could write

int tonum(char c)
{
static const char letters[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
if (isalpha(c)) return strchr(letters, c) - letters;
return 0;
}

This will fail for international locales, as the letters array
only contains ASCII characters.

This might well fail for International locales, but that has nothing
to do with whether or not the implementation uses the ASCII character
set. On an EBCDIC platform, the letter array will contain EBCDIC
characters, not ASCII at all.

Consider a platform where plain char is signed, and passing a negative
char value to this function. Undefined behavior calling toupper() and
isalpha(). And undefined behavior being what it is, if isalpha()
returns a non-zero value, the code will happily apply a negative index
outside the bounds of the array.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq
 
B

Bjorn Reese

Arthur J. O'Dwyer said:
They won't fail; they just won't include international characters
in the letters list - which is perfectly fine, for the OP's purposes.
He's mapping 1..26 -> A..Z, not generically "numbers to letters."

My comment was not so much directed towards the purpose of the
function, but rather that isalpha() was used to guard the first
return statement. A safer method would have been to use

switch (toupper(c)) {
case 'A': case 'B': /* etc */
return strchr(letters, c) = letters;
default:
return 0;
}
 
T

Twirlip

Bjorn said:
My comment was not so much directed towards the purpose of the
function, but rather that isalpha() was used to guard the first
return statement.

Meaning that isalpha() might return non-zero for something
other than letters A-Z, in which case strchr would return
NULL, and the return would subtract 'letters' from NULL.
A safer method would have been to use

switch (toupper(c)) {
case 'A': case 'B': /* etc */
return strchr(letters, c) = letters;
default:
return 0;
}

That's 26 case statements when you can just use the NULL return of
strchr() (untested):

int alp2num(char c)
{
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char *ptr;

ptr = strchr(letters, toupper((unsigned char)c));
return ptr != NULL && c != '\0' ? ptr - letters + 1 : 0;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,077
Messages
2,570,566
Members
47,202
Latest member
misc.

Latest Threads

Top