Optimisable?

craigbeanhead · Aug 4, 2003

Hi clc readers,

I'm working on a problem, and I've got some working code, but I think
it can be improved. In particular I'd like to get rid of the nasty
goto statement that I've used, but I'm unsure as to how to go about
rewriting that section.

The problem is to find a word of x length, where the sum of the
characters (using 1 for A, 2 for B, etc) is equal to y.

I'm calling the program like this: "myprog.exe 5 39" - this means
"find me all the words of length 5, where the characters add up to
39". I've got two solutions to the problem, one version creates
strings such as AAAAA, AAAAB, and the second version searches a
newline-delimited dictionary of words.

It seems that each time I post code here (or someone else posts code
here), it ends up being improved

I hope someone here can help me
improve the code I've written so far.

Thanks,
Craig

Code follows:

/*Cut here*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define MAX_LINE_LENGTH 100

int findstring(int nLength, int nSum, char* sFile);
int makewords(int nLength, int nSum);
char tochar(int i);
int tonum(char c);
int strsum(char* s);
void ucase(char *s);

int main(int argc, char *argv[])
{
int nLength, nSum, nMatches;
char *stop = NULL;
char** words = NULL;

if(argc!=3)
{
puts("Please supply two arguments, the word length, and the desired
sum of the characters");
exit(EXIT_FAILURE);
}

nLength = (int)strtol(argv[1], &stop, 10);
nSum = (int)strtol(argv[2], &stop, 10);

/* nMatches = makewords(nLength, nSum, words); */
nMatches = findstring(nLength, nSum, "dictionary.txt");

printf("%d matches found\n", nMatches);

exit(EXIT_SUCCESS);
}

int findstring(int nLength, int nSum, char* sFile)
{
FILE* fp = fopen(sFile, "r");
char line[MAX_LINE_LENGTH + 1];
int i = 0;

while(fgets(line, MAX_LINE_LENGTH, fp) != NULL)
{
line[strlen(line)-1] = '\0';
upper(line);

if(strlen(line) != nLength) continue;
/* printf("Sum of %s is %d, Aiming for %d\n", line, strsum(line),
nSum); */
if(strsum(line) != nSum) continue;
printf("%s\n", line);
i++;
}

fclose(fp);
return i;
}

int makewords(int nLength, int nSum)
{
char *chars = malloc(nLength+1);
int i,j=0;

/* reset the memory to all 'A's */
for(i = 0; i<nLength; i++)
chars = tochar(1);
chars[nLength] = '\0';

/* forever */
while(1)
{
int i;

/* test for string sum */
if(strsum(chars) == nSum)
{
printf("%s\n", chars);
j++;
}

chars[nLength-1]++;

start_again:
for(i=nLength-1; i>=0; i--)
if(tonum(chars) >= 27)
{
if(i==0) return 0;
chars = 'A';
chars[i-1]++;
goto start_again;
}
}

free(chars);
return j;
}

/* Takes 1, returns 'A', takes 2, returns 'B' */
char tochar(int i)
{
return 'A' + i - 1;
}

/* Takes 'A', returns 1, takes 'B', returns 2 */
int tonum(char c)
{
return c - 'A' + 1;
}

/* Takes "ABC", returns 1 + 2 + 3 = 6 */
int strsum(char *s)
{
int i=0;
char *c;

while(*(c = s++))
i += tonum(*c);

return i;
}

/* takes "a WordOrTwo", returns "A WORDORTWO" */
void ucase(char *s)
{
while(*s++ = toupper(*s))
;
}

Pieter Droogendijk · Aug 4, 2003

On 4 Aug 2003 05:45:25 -0700

Hi clc readers,

I'm working on a problem, and I've got some working code, but I think
it can be improved. In particular I'd like to get rid of the nasty
goto statement that I've used, but I'm unsure as to how to go about
rewriting that section.

int makewords(int nLength, int nSum)
{
char *chars = malloc(nLength+1);
int i,j=0;

/* reset the memory to all 'A's */
for(i = 0; i<nLength; i++)
chars = tochar(1);
chars[nLength] = '\0';

/* forever */
while(1)
{
int i;

/* test for string sum */
if(strsum(chars) == nSum)
{
printf("%s\n", chars);
j++;
}

chars[nLength-1]++;

start_again:
for(i=nLength-1; i>=0; i--)
if(tonum(chars) >= 27)
{
if(i==0) return 0;
chars = 'A';
chars[i-1]++;
goto start_again;

instead of goto, try simply resetting the counter to it's initial state:
i=nLength;
continue;
note that the length is decremented by the loop, so leaving out '-1' is legal in
this case.

}
}

free(chars);
return j;
}

Click to expand...

<snip>

Arthur J. O'Dwyer · Aug 4, 2003

I'm working on a problem, and I've got some working code, but I think
it can be improved. In particular I'd like to get rid of the nasty
goto statement that I've used, but I'm unsure as to how to go about
rewriting that section.

The problem is to find a word of x length, where the sum of the
characters (using 1 for A, 2 for B, etc) is equal to y.

I'm calling the program like this: "myprog.exe 5 39" - this means
"find me all the words of length 5, where the characters add up to
39". I've got two solutions to the problem, one version creates
strings such as AAAAA, AAAAB, and the second version searches a
newline-delimited dictionary of words.

A good study of the possible algorithms involved might help.
Here's a hint: for values 5 39, there is a mapping between the
strings you want and the bit-patterns

111100000000000000000000000000000000000000
111010000000000000000000000000000000000000
111001000000000000000000000000000000000000
....
110110000000000000000000000000000000000000
110101000000000000000000000000000000000000
....
000000000000000000000000100010000000001100
....
000000000000000000000000000000000000001111

So, if you can find an algorithm that produces those bit-strings,
you can find an algorithm that produces the strings you want.
Also, you can easily determine from what I've written above *how*
*many* strings there are for 5 39, how many for 2 3, etc., and
make sure that your current program is getting the right answers.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define MAX_LINE_LENGTH 100

int findstring(int nLength, int nSum, char* sFile);
int makewords(int nLength, int nSum);
char tochar(int i);
int tonum(char c);
int strsum(char* s);
void ucase(char *s);

int main(int argc, char *argv[])
{
int nLength, nSum, nMatches;

Please don't use tabs in Usenet posts. Set your editor
to use regular spaces instead. (Don't use Hungarian
notation either, but that's *slightly* less of a PITA
to decipher than badly-indented code.)

char *stop = NULL;
char** words = NULL;

if(argc!=3)
{

puts("Please supply two arguments, the word length, and the"
" desired sum of the characters");

Break lines at 65-79 characters. This is why string concatenation was
invented.

exit(EXIT_FAILURE);
}

nLength = (int)strtol(argv[1], &stop, 10);
nSum = (int)strtol(argv[2], &stop, 10);

Spurious casts. Remove them.
Also, check 'stop' for error conditions. You
went ahead and declared 'stop' (when you could've
just used NULL, for all the error-checking you do);
you might as well make sure nLength and nSum have
valid values before you continue.

/* nMatches = makewords(nLength, nSum, words); */
nMatches = findstring(nLength, nSum, "dictionary.txt");

printf("%d matches found\n", nMatches);

exit(EXIT_SUCCESS);
}

int findstring(int nLength, int nSum, char* sFile)
{
FILE* fp = fopen(sFile, "r");
char line[MAX_LINE_LENGTH + 1];
int i = 0;

Again, no error-checking on 'fp'. What I will usually do
in a case like this is open and close the file inside 'main',
and pass the (FILE *) to the processing function. That way,
'main' can use the argument list without passing chunks of
it to other functions, and my algorithm-type functions aren't
cluttered up with I/O handling.

Check that 'fp' is not NULL. What should you do if it is?

while(fgets(line, MAX_LINE_LENGTH, fp) != NULL)
{
line[strlen(line)-1] = '\0';

Silly. Although I see what you're trying to do here - you
want to remove any trailing '\n' from the input buffer.
So why not *say* that?

char *p;
if (( p = strchr(line,'\n') ))
*p = '\0';

(Double parentheses to silence gcc warnings, not for any
syntactical reason.)

upper(line);

if(strlen(line) != nLength) continue;

And then, since you don't use the '\n' anyway, wouldn't it be
easier to skip the whole '\n'-stripping thing and just write

if (strlen(line) != nLength+1) continue;

Unless you're actually going to try to make this a *real*
program, in which case you'll want to strip *all* whitespace
from the front and back of the line, not just a single character
from the back.

/* printf("Sum of %s is %d, Aiming for %d\n", line,
strsum(line), nSum); */

BTW: strsum() is not your identifier. It belongs to the compiler
implementation, because it begins with 'str' and then a lower-case
letter. Use 'str_sum' or 'sumLetters' or 'countalpha' or something.

if(strsum(line) != nSum) continue;
printf("%s\n", line);
i++;
}

fclose(fp);
return i;
}

int makewords(int nLength, int nSum)
{
char *chars = malloc(nLength+1);
int i,j=0;

Bad practice. Write

int i;
int j = 0;

to make it clear *which* variable is initialized and which
isn't. Note also the use of whitespace in my version.

/* reset the memory to all 'A's */
for(i = 0; i<nLength; i++)
chars = tochar(1);

Your comment doesn't match your code. The code
is setting everything equal to tochar(1), which
*happens* to be 'A', but the implication is that
that might change (otherwise, you'd have written
'A' literally in the code). I'd write:

for (i=0; i < nLength; ++i)
chars = 'A';

or, if I really cared enough to think about it,

memset(chars, 'A', nLength);

memset(chars, tonum(1), nLength);

would also work, if you wanted to keep the old
semantics for some reason - and would be faster.

chars[nLength] = '\0';

/* forever */
while(1)
{
int i;

/* test for string sum */
if(strsum(chars) == nSum)
{
printf("%s\n", chars);
j++;
}

chars[nLength-1]++;

start_again:
for(i=nLength-1; i>=0; i--)
if(tonum(chars) >= 27)
{
if(i==0) return 0;
chars = 'A';
chars[i-1]++;
goto start_again;
}

Click to expand...

This should be a FAQ (in fact, I thought it was!).
Consider the mapping between the following values:

0 -> AAAAAAAAA
1 -> AAAAAAAAB
2 -> AAAAAAAAC
...
25 -> AAAAAAAAZ
26 -> AAAAAAABA
...

Now write a program that incorporates that mapping.
Consider how to convert from a number to its base-26
representation.

}

free(chars);
return j;
}

/* Takes 1, returns 'A', takes 2, returns 'B' */
char tochar(int i)
{
return 'A' + i - 1;
}

/* Takes 'A', returns 1, takes 'B', returns 2 */
int tonum(char c)
{
return c - 'A' + 1;
}

Click to expand...

These two functions have terrible names (they sound like
they're meant to map 'A'->65 or whatever), and the functions
themselves don't work on EBCDIC. IOW, they're not completely
portable. You could write

int tonum(char c)
{
static const char letters[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
if (isalpha(c)) return strchr(letters, c) - letters;
return 0;
}

for example.

/* Takes "ABC", returns 1 + 2 + 3 = 6 */
int strsum(char *s)
{
int i=0;
char *c;

while(*(c = s++))
i += tonum(*c);

return i;
}

Click to expand...

Yuck! Did you consider

int sum;
for (sum = 0; *s; ++s)
sum += tonum(*s);
return sum;

It's much easier to read than that nested
assignment thingie you've got there, and
doesn't need the extra pointer variable.
Also, 'i' has been changed to 'sum', reflecting
its purpose rather than its data type.

/* takes "a WordOrTwo", returns "A WORDORTWO" */
void ucase(char *s)

Click to expand...

You called this function 'upper' in the code that
called it. I'm surprised it linked.

{
while(*s++ = toupper(*s))
;
}

Click to expand...

Undefined behavior. You have written the pointerified
equivalent of

while (i++ = f(i)) ...

Diagnosis: You're still too enamored of obfuscation to
write good code. Try again, and this time *don't* *do*
*anything* *clever*!

HTH,
-Arthur

David Rubin · Aug 4, 2003

craigbeanhead wrote:

[snip]

The problem is to find a word of x length, where the sum of the
characters (using 1 for A, 2 for B, etc) is equal to y.

I'm calling the program like this: "myprog.exe 5 39" - this means
"find me all the words of length 5, where the characters add up to
39". I've got two solutions to the problem, one version creates
strings such as AAAAA, AAAAB, and the second version searches a
newline-delimited dictionary of words.

This is the same as solving "how many ways can I generate a monotonic
sequence of positive integers which sums to X?"

The first solution is to create a routine which finds sequences by
(recursively) adding subsequences. For example, X=5 produces

1+1+1+1+1
2+1+1+1
2+2+1
2+3
4+1
3+1+1
5
etc

The second solution, using a dictionary, is simply providing you can
pre-process the dictionary. Then, you just "sign" each word with its
associated value, sort, (store,) and search.

/david

Bjorn Reese · Aug 4, 2003

Arthur J. O'Dwyer said:
These two functions have terrible names (they sound like
they're meant to map 'A'->65 or whatever), and the functions
themselves don't work on EBCDIC. IOW, they're not completely
portable. You could write

int tonum(char c)
{
static const char letters[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
if (isalpha(c)) return strchr(letters, c) - letters;
return 0;
}

This will fail for international locales, as the letters array
only contains ASCII characters.

Arthur J. O'Dwyer · Aug 4, 2003

Arthur J. O'Dwyer said:
Arthur J. O'Dwyer said:

These two functions have terrible names (they sound like
they're meant to map 'A'->65 or whatever), and the functions
themselves don't work on EBCDIC. IOW, they're not completely
portable. You could write

int tonum(char c)
{
static const char letters[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
if (isalpha(c)) return strchr(letters, c) - letters;
return 0;
}

Click to expand...

This will fail for international locales, as the letters array
only contains ASCII characters.

They won't fail; they just won't include international characters
in the letters list - which is perfectly fine, for the OP's purposes.
He's mapping 1..26 -> A..Z, not generically "numbers to letters."

All C implementations everywhere include the letters A to Z, no
matter what character set they use.

-Arthur

Jack Klein · Aug 5, 2003

Arthur J. O'Dwyer said:
Arthur J. O'Dwyer said:

These two functions have terrible names (they sound like
they're meant to map 'A'->65 or whatever), and the functions
themselves don't work on EBCDIC. IOW, they're not completely
portable. You could write

int tonum(char c)
{
static const char letters[] = "0ABCDEFGHIJKLMNOPQRSTUVWXYZ";
c = toupper(c);
if (isalpha(c)) return strchr(letters, c) - letters;
return 0;
}

Click to expand...

This will fail for international locales, as the letters array
only contains ASCII characters.

This might well fail for International locales, but that has nothing
to do with whether or not the implementation uses the ASCII character
set. On an EBCDIC platform, the letter array will contain EBCDIC
characters, not ASCII at all.

Consider a platform where plain char is signed, and passing a negative
char value to this function. Undefined behavior calling toupper() and
isalpha(). And undefined behavior being what it is, if isalpha()
returns a non-zero value, the code will happily apply a negative index
outside the bounds of the array.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Bjorn Reese · Aug 5, 2003

Arthur J. O'Dwyer said:
They won't fail; they just won't include international characters
in the letters list - which is perfectly fine, for the OP's purposes.
He's mapping 1..26 -> A..Z, not generically "numbers to letters."

My comment was not so much directed towards the purpose of the
function, but rather that isalpha() was used to guard the first
return statement. A safer method would have been to use

switch (toupper(c)) {
case 'A': case 'B': /* etc */
return strchr(letters, c) = letters;
default:
return 0;
}

Twirlip · Aug 5, 2003

Bjorn said:
My comment was not so much directed towards the purpose of the
function, but rather that isalpha() was used to guard the first
return statement.

Meaning that isalpha() might return non-zero for something
other than letters A-Z, in which case strchr would return
NULL, and the return would subtract 'letters' from NULL.

A safer method would have been to use

switch (toupper(c)) {
case 'A': case 'B': /* etc */
return strchr(letters, c) = letters;
default:
return 0;
}

That's 26 case statements when you can just use the NULL return of
strchr() (untested):

int alp2num(char c)
{
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char *ptr;

ptr = strchr(letters, toupper((unsigned char)c));
return ptr != NULL && c != '\0' ? ptr - letters + 1 : 0;
}

Fibonacci	0	May 13, 2023
C pipe	1	Dec 9, 2021
Command Line Arguments	0	Mar 7, 2023
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Adding adressing of IPv6 to program	1	Feb 16, 2023
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Scanf is being prioritized over printf ?	1	Nov 5, 2023
STRING - Remove small letters from string	1	Jan 20, 2023

Optimisable?

craigbeanhead

Pieter Droogendijk

Arthur J. O'Dwyer

David Rubin

Bjorn Reese

Arthur J. O'Dwyer

Jack Klein

Bjorn Reese

Twirlip

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads