My code to determine whether two words are anagrams won't work.

F

Felipe Ribeiro

Hello everyone!

So, I was trying to write a program that checks whether two words are
anagrams. I wrote the code but for some reason I can't figure out, the
program won't behave as it should.

Here goes the code:
-----------------------------------------------------------------------------------------
#include <stdio.h>
#include <ctype.h>

#define LEN 26
#define TRUE 1
#define FALSE 0

typedef int bool;

void read_word(int letters_in_word[], int array_length);
bool are_anagrams(int letters_in_word1[], int letters_in_word2[],
int array_length);

int main(void)
{
/* The number of times each letter appears in the words */
int letters_word1[LEN] = {0}, letters_word2[LEN] = {0}, i;

printf("Enter first word: ");
read_word(letters_word1, LEN);

printf("Enter second word: ");
read_word(letters_word2, LEN);

if (are_anagrams(letters_word1, letters_word2, LEN))
printf("The words are anagrams.\n");
else
printf("The words are not anagrams.\n");

return 0;
}

void read_word(int word[], int length)
{
char ch;

while ((ch = getchar() != '\n')) {
ch = toupper(ch);
word[ch - 'A']++;
}
}

/*
* If the number of times each letter appears in each word is the
same,
* then the words are anagrams.
*/
bool are_anagrams(int word1[], int word2[], int length)
{
int i;

for (i = 0; i < length; i++)
if (word1 != word2)
return FALSE;
return TRUE;
}
-----------------------------------------------------------------------------------------

It might be some obvious mistake; I'm just a beginner.
I've been looking at it for some time and can't find out what the
error is. It seems to me that are_anagrams() always returns TRUE and I
can't understand why.

I'd appreciate any help you could give me.
Thanks in advance.
 
B

Ben Bacarisse

Felipe Ribeiro said:
So, I was trying to write a program that checks whether two words are
anagrams. I wrote the code but for some reason I can't figure out, the
program won't behave as it should.

You need to develop some way to find out what your programs are really
doing. Posting to Usenet will be a very slow way to learn. What have
you done to try to find out what us happening? Do you have a debugger?
#include <stdio.h>
#include <ctype.h>

#define LEN 26
#define TRUE 1
#define FALSE 0

typedef int bool;

void read_word(int letters_in_word[], int array_length);
bool are_anagrams(int letters_in_word1[], int letters_in_word2[],
int array_length);

int main(void)
{
/* The number of times each letter appears in the words */
int letters_word1[LEN] = {0}, letters_word2[LEN] = {0}, i;

printf("Enter first word: ");
read_word(letters_word1, LEN);

printf("Enter second word: ");
read_word(letters_word2, LEN);

if (are_anagrams(letters_word1, letters_word2, LEN))
printf("The words are anagrams.\n");
else
printf("The words are not anagrams.\n");

return 0;
}

void read_word(int word[], int length)

length is no used (but it should be!).
{
char ch;

It is usually better to read into an int. That way, you can be
reliably told about the end of the input. The C FAQ is worth
bookmarking: http://c-faq.com/
while ((ch = getchar() != '\n')) {

This does not do what you think. It sets ch to the result (0 or 1) of
the test getchar() != '\n'.
ch = toupper(ch);
word[ch - 'A']++;

And the weakness of this line is now revealed. You were rather
unlucky that using 1 - 'A' as an index did not cause a more dramatic
end to the execution. You need to be sure that ch - 'A' is in range.
You might also think how you'd write this if you needed it to work on
systems that don't have 'A' to 'Z' consecutive and increasing.

<snip>
 
D

Doug Miller

So, I was trying to write a program that checks whether two words are
anagrams. I wrote the code but for some reason I can't figure out, the
program won't behave as it should.

This is due, at least in part, to the program not being *designed* as it
should.
Here goes the code: [snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.

*Far* too complicated. Sort the letters in each word in alphabetical order,
then compare the sorted lists.
 
D

dee

Here goes the code:
[snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.

*Far* too complicated. Sort the letters in each word in alphabetical order,
then compare the sorted lists.

I don't see a problem with the first approach. Infact the first method
works in O(n), while sorting the arrays itself will take more time (of
the order of n sqr or nlogn).

The problem is somewhere in the implementation while the logic seems
fine. As told by Ben, merely putting the problem on a forum won't be
of much help for a learner. Try using a debugger.
 
D

Doug Miller

Here goes the code: [snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.

*Far* too complicated. Sort the letters in each word in alphabetical order,
then compare the sorted lists.

I don't see a problem with the first approach. Infact the first method
works in O(n), while sorting the arrays itself will take more time (of
the order of n sqr or nlogn).

For strings as short as a word (typically less than ten bytes), the difference
in execution time between O(n) and O(n^2) algorithms is irrelevant, as both
will be measured in microseconds. For data sets this small, ease of
implementation trumps execution speed.
 
K

Keith Thompson

Here goes the code:
[snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.

*Far* too complicated. Sort the letters in each word in
alphabetical order, then compare the sorted lists.

I don't see a problem with the first approach. Infact the first
method works in O(n), while sorting the arrays itself will take more
time (of the order of n sqr or nlogn).

For strings as short as a word (typically less than ten bytes), the
difference in execution time between O(n) and O(n^2) algorithms is
irrelevant, as both will be measured in microseconds. For data sets
this small, ease of implementation trumps execution speed.

Perhaps, but I don't think sorting the strings is any easier than
traversing them and storing the character counts in an array. Setting
up the comparison routine for qsort isn't difficult to get right, but
it's also not difficult to get wrong.
 
L

Lew Pitcher

Here goes the code:
[snipped]
* If the number of times each letter appears in each word is the
same, * then the words are anagrams.

*Far* too complicated. Sort the letters in each word in
alphabetical order, then compare the sorted lists.

I don't see a problem with the first approach. Infact the first
method works in O(n), while sorting the arrays itself will take more
time (of the order of n sqr or nlogn).

For strings as short as a word (typically less than ten bytes), the
difference in execution time between O(n) and O(n^2) algorithms is
irrelevant, as both will be measured in microseconds. For data sets
this small, ease of implementation trumps execution speed.

Perhaps, but I don't think sorting the strings is any easier than
traversing them and storing the character counts in an array. Setting
up the comparison routine for qsort isn't difficult to get right, but
it's also not difficult to get wrong.

Funny enough, I once tackled an anagram/scrabble word problem exactly this
way. Here's the relevant code:

/*
** CharComp(p1, p2): return p1:p2 comparison
** Accepts: two pointers to characters
** Returns: an integer comparison value
*/
int CharComp(const void *p1, const void *p2)
{
return (*(char *)p1 - *(char *)p2);
}

char *ScrabbleSort(char *word)
{
char *temp = NULL;
size_t templen = strlen(word);

if ((temp = malloc(templen+1)) == NULL)
return NULL;

strcpy(temp,word);
qsort(temp, templen, 1, CharComp);
return temp;
}


--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------
 
K

Keith Thompson

Malcolm McLean said:
If CHAR_BIT is 32 and the character set includes Chinese, how easy would yur
array method be to implement?

Quite easy, since as long as we're making assumptions I'll also assume
that I have about a terabyte of memory.

:cool:}

But yeah, I see your point. There certainly are cases where
O(n log n) can beat O(n) if the scale factors are sufficiently
unfriendly.
 
N

Nick Keighley

If CHAR_BIT is 32 and the character set includes Chinese, how easy would yur
array method be to implement?

is an anagram a meaningful concept in chinese?
 
J

James Kuyper

Nick said:
is an anagram a meaningful concept in chinese?

Since most Chinese words are at least two characters long, the concept
is at least potentially meaningful. However, since there's few Chinese
words with more than two characters, and even fewer with more than three
(offhand, I can't think of any, but I've only spent a year studying
Mandarin), searching for anagrams is too trivial a task to be interesting.
 
F

Felipe Ribeiro

You need to develop some way to find out what your programs are really
doing.  Posting to Usenet will be a very slow way to learn.  What have
you done to try to find out what us happening?  Do you have a debugger?

I had inserted some code and could see that the array was being
assigned strange values but couldn't find out why.
I hadn't been using any program to help me with my code, mainly
because I thought they were meant to help with more complex code.
Since my programs are so simple I imagined they woudn't be of any
help. I see that I was wrong: splint found the very same mistake you
pointed out below. I'll make sure I don't come back here before
analyzing my code more deeply. =)
#include <stdio.h>
#include <ctype.h>
#define LEN 26
#define TRUE 1
#define FALSE 0
typedef int bool;
void read_word(int letters_in_word[], int array_length);
bool are_anagrams(int letters_in_word1[], int letters_in_word2[],
                             int array_length);
int main(void)
{
   /* The number of times each letter appears in the words */
   int letters_word1[LEN] = {0}, letters_word2[LEN] = {0}, i;
   printf("Enter first word: ");
   read_word(letters_word1, LEN);
   printf("Enter second word: ");
   read_word(letters_word2, LEN);
   if (are_anagrams(letters_word1, letters_word2, LEN))
           printf("The words are anagrams.\n");
   else
           printf("The words are not anagrams.\n");
   return 0;
}
void read_word(int word[], int length)

length is no used (but it should be!).

I'm a bit confused with this. length is not necessary in the function.
If the length of an array isn't used in a function, do I still have to
require it as a parameter?
It is usually better to read into an int.  That way, you can be
reliably told about the end of the input.  The C FAQ is worth
bookmarking:http://c-faq.com/

I didn't quite understand what you said but will read through the faq
to see if I can find something that helps me in understanding.
This does not do what you think.  It sets ch to the result (0 or 1) of
the test getchar() != '\n'.

Thanks a lot for pointing this out. As I said before, splint found the
same mistake. I'll use it from now on. =)

<snip>
 
B

Ben Bacarisse

Felipe Ribeiro said:
void read_word(int word[], int length)

length is no used (but it should be!).

I'm a bit confused with this. length is not necessary in the function.
If the length of an array isn't used in a function, do I still have to
require it as a parameter?[/QUOTE]

I was saying that you don't use so why is it there as a parameter?
The bit in parentheses was saying that in fact the real problem is not
that pass it but that you don't use it. You can use it to be sure
that you don't access beyond the end of the array.
I didn't quite understand what you said but will read through the faq
to see if I can find something that helps me in understanding.

See, specifically, Q 12.1: http://c-faq.com/stdio/getcharc.html
Thanks a lot for pointing this out. As I said before, splint found the
same mistake. I'll use it from now on. =)

That's not a bad idea but be warned that splint can come out with some
very odd messages at times!
 
L

Lie Ryan

Nick said:
is an anagram a meaningful concept in chinese?

Maybe... if you define anagram as all the valid characters a set of
strokes could produce

but they aren't related to character set.
 
P

Paul Hsieh

Here goes the code:
[snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.
*Far* too complicated. Sort the letters in each word in alphabetical order,
then compare the sorted lists.
I don't see a problem with the first approach. Infact the first method
works in O(n), while sorting the arrays itself will take more time (of
the order of n sqr or nlogn).

For strings as short as a word (typically less than ten bytes), the difference
in execution time between O(n) and O(n^2) algorithms is irrelevant, as both
will be measured in microseconds. For data sets this small, ease of
implementation trumps execution speed.

Using qsort() is incredibly slow in this scenario as you have to call
a callback function on each comparison. To escape this you would have
to write your own sort, which is a lot more complicated than the
originally proposed solution.

Actually I believe this can be done in one pass per string with the
character table approach. Zero out the character table, and just
count +1 per character from the first string. Also keep count of the
number of times any count transitions from 0 to 1. So:

memset (table, 0, sizeof (table));
transition = 0;
for (c=(unsigned char *) str0; *c; c++) {
transition += 0 == table[*c];
table[*c]++;
}

Then for the second string, subtract 1 per character, and subtract 1
for the number of transitions from 1 to 0. If there is a transition
from 0 to -1, we know we don't have an anagram.

for (c=(unsigned char *) str1; *c; c++) {
if (0 > --table[*c]) return NOT_AN_ANAGRAM;
transition -= 0 == table[*c];
}

Then we would expect the number of "transitions" should be exactly
balanced at 0.

return transition ? NOT_AN_ANAGRAM : IS_AN_ANAGRAM;

The whole point of this is that you don't need to check all 256
entries of table[] but in fact only the entries that you touch.

And you wanted to call qsort()?
 
D

Doug Miller

Using qsort() is incredibly slow in this scenario as you have to call
a callback function on each comparison.

Nonsense. Unless the process will be repeated a large number of times for
different pairs of words, that doesn't matter. One iteration, checking two
words, will run so fast regardless of the algorithm that there is simply no
point in optimizing for execution speed. Rather, it should be optimized for
ease of implementation.
 
S

Squeamizh

Nonsense. Unless the process will be repeated a large number of times for
different pairs of words, that doesn't matter. One iteration, checking two
words, will run so fast regardless of the algorithm that there is simply no
point in optimizing for execution speed. Rather, it should be optimized for
ease of implementation.

Your point is valid, but not all that interesting. You didn't need to
say it twice, and it certainly does not render Paul's post as
nonsense. You seem to agree with him anyway: relative to his
implementation, yours is incredible slow.
 
B

BartC

Doug Miller said:
Nonsense. Unless the process will be repeated a large number of times for
different pairs of words, that doesn't matter. One iteration, checking two
words, will run so fast regardless of the algorithm that there is simply
no
point in optimizing for execution speed. Rather, it should be optimized
for
ease of implementation.

I did some tests and a qsort solution took about 9000ms.

My own algorithm using a table of flags, loads of strlens everywhere and
with O(n^2) performance, took 2500ms.

Paul's algorithm took 900ms.

(The sort method needed to duplicate the strings, the other methods were
non-destructive.)

It seems to me that a 10x speed factor is significant.
 
D

Doug Miller

I did some tests and a qsort solution took about 9000ms.

My own algorithm using a table of flags, loads of strlens everywhere and
with O(n^2) performance, took 2500ms.

Paul's algorithm took 900ms.

(The sort method needed to duplicate the strings, the other methods were
non-destructive.)

It seems to me that a 10x speed factor is significant.
The difference between 9 seconds and 0.9 seconds is not.
 
B

BartC

Doug Miller said:
The difference between 9 seconds and 0.9 seconds is not.

OK call it 90 seconds and 9 seconds.

Or 0.1 seconds (acceptable user interface delay) and 1 second (unacceptable
delay).

If it wasn't significant, everyone here would be using something like
Python.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top