My code to determine whether two words are anagrams won't work.

Felipe Ribeiro · Jun 21, 2009

Hello everyone!

So, I was trying to write a program that checks whether two words are
anagrams. I wrote the code but for some reason I can't figure out, the
program won't behave as it should.

Here goes the code:
-----------------------------------------------------------------------------------------
#include <stdio.h>
#include <ctype.h>

#define LEN 26
#define TRUE 1
#define FALSE 0

typedef int bool;

void read_word(int letters_in_word[], int array_length);
bool are_anagrams(int letters_in_word1[], int letters_in_word2[],
int array_length);

int main(void)
{
/* The number of times each letter appears in the words */
int letters_word1[LEN] = {0}, letters_word2[LEN] = {0}, i;

printf("Enter first word: ");
read_word(letters_word1, LEN);

printf("Enter second word: ");
read_word(letters_word2, LEN);

if (are_anagrams(letters_word1, letters_word2, LEN))
printf("The words are anagrams.\n");
else
printf("The words are not anagrams.\n");

return 0;
}

void read_word(int word[], int length)
{
char ch;

while ((ch = getchar() != '\n')) {
ch = toupper(ch);
word[ch - 'A']++;
}
}

/*
* If the number of times each letter appears in each word is the
same,
* then the words are anagrams.
*/
bool are_anagrams(int word1[], int word2[], int length)
{
int i;

for (i = 0; i < length; i++)
if (word1 != word2)
return FALSE;
return TRUE;
}
-----------------------------------------------------------------------------------------

It might be some obvious mistake; I'm just a beginner.
I've been looking at it for some time and can't find out what the
error is. It seems to me that are_anagrams() always returns TRUE and I
can't understand why.

I'd appreciate any help you could give me.
Thanks in advance.

Ben Bacarisse · Jun 21, 2009

Felipe Ribeiro said:
So, I was trying to write a program that checks whether two words are
anagrams. I wrote the code but for some reason I can't figure out, the
program won't behave as it should.

You need to develop some way to find out what your programs are really
doing. Posting to Usenet will be a very slow way to learn. What have
you done to try to find out what us happening? Do you have a debugger?

#include <stdio.h>
#include <ctype.h>

#define LEN 26
#define TRUE 1
#define FALSE 0

typedef int bool;

void read_word(int letters_in_word[], int array_length);
bool are_anagrams(int letters_in_word1[], int letters_in_word2[],
int array_length);

int main(void)
{
/* The number of times each letter appears in the words */
int letters_word1[LEN] = {0}, letters_word2[LEN] = {0}, i;

printf("Enter first word: ");
read_word(letters_word1, LEN);

printf("Enter second word: ");
read_word(letters_word2, LEN);

if (are_anagrams(letters_word1, letters_word2, LEN))
printf("The words are anagrams.\n");
else
printf("The words are not anagrams.\n");

return 0;
}

void read_word(int word[], int length)

length is no used (but it should be!).

{
char ch;

It is usually better to read into an int. That way, you can be
reliably told about the end of the input. The C FAQ is worth
bookmarking: http://c-faq.com/

while ((ch = getchar() != '\n')) {

This does not do what you think. It sets ch to the result (0 or 1) of
the test getchar() != '\n'.

ch = toupper(ch);
word[ch - 'A']++;

And the weakness of this line is now revealed. You were rather
unlucky that using 1 - 'A' as an index did not cause a more dramatic
end to the execution. You need to be sure that ch - 'A' is in range.
You might also think how you'd write this if you needed it to work on
systems that don't have 'A' to 'Z' consecutive and increasing.

}
}

<snip>

Doug Miller · Jun 21, 2009

So, I was trying to write a program that checks whether two words are
anagrams. I wrote the code but for some reason I can't figure out, the
program won't behave as it should.

This is due, at least in part, to the program not being *designed* as it
should.

Here goes the code: [snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.

*Far* too complicated. Sort the letters in each word in alphabetical order,
then compare the sorted lists.

dee · Jun 21, 2009

Here goes the code:

[snipped]

* If the number of times each letter appears in each word is the same,
* then the words are anagrams.

Click to expand...

*Far* too complicated. Sort the letters in each word in alphabetical order,
then compare the sorted lists.

I don't see a problem with the first approach. Infact the first method
works in O(n), while sorting the arrays itself will take more time (of
the order of n sqr or nlogn).

The problem is somewhere in the implementation while the logic seems
fine. As told by Ben, merely putting the problem on a forum won't be
of much help for a learner. Try using a debugger.

Doug Miller · Jun 21, 2009

Here goes the code: [snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.

Click to expand...

*Far* too complicated. Sort the letters in each word in alphabetical order,
then compare the sorted lists.

Click to expand...

I don't see a problem with the first approach. Infact the first method
works in O(n), while sorting the arrays itself will take more time (of
the order of n sqr or nlogn).

For strings as short as a word (typically less than ten bytes), the difference
in execution time between O(n) and O(n^2) algorithms is irrelevant, as both
will be measured in microseconds. For data sets this small, ease of
implementation trumps execution speed.

Keith Thompson · Jun 21, 2009

Here goes the code:
[snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.

*Far* too complicated. Sort the letters in each word in
alphabetical order, then compare the sorted lists.

Click to expand...

I don't see a problem with the first approach. Infact the first
method works in O(n), while sorting the arrays itself will take more
time (of the order of n sqr or nlogn).

Click to expand...

For strings as short as a word (typically less than ten bytes), the
difference in execution time between O(n) and O(n^2) algorithms is
irrelevant, as both will be measured in microseconds. For data sets
this small, ease of implementation trumps execution speed.

Perhaps, but I don't think sorting the strings is any easier than
traversing them and storing the character counts in an array. Setting
up the comparison routine for qsort isn't difficult to get right, but
it's also not difficult to get wrong.

Lew Pitcher · Jun 21, 2009

Here goes the code:
[snipped]
* If the number of times each letter appears in each word is the
same, * then the words are anagrams.

*Far* too complicated. Sort the letters in each word in
alphabetical order, then compare the sorted lists.

I don't see a problem with the first approach. Infact the first
method works in O(n), while sorting the arrays itself will take more
time (of the order of n sqr or nlogn).

Click to expand...

For strings as short as a word (typically less than ten bytes), the
difference in execution time between O(n) and O(n^2) algorithms is
irrelevant, as both will be measured in microseconds. For data sets
this small, ease of implementation trumps execution speed.

Click to expand...

Perhaps, but I don't think sorting the strings is any easier than
traversing them and storing the character counts in an array. Setting
up the comparison routine for qsort isn't difficult to get right, but
it's also not difficult to get wrong.

Funny enough, I once tackled an anagram/scrabble word problem exactly this
way. Here's the relevant code:

/*
** CharComp(p1, p2): return p1

2 comparison
** Accepts: two pointers to characters
** Returns: an integer comparison value
*/
int CharComp(const void *p1, const void *p2)
{
return (*(char *)p1 - *(char *)p2);
}

char *ScrabbleSort(char *word)
{
char *temp = NULL;
size_t templen = strlen(word);

if ((temp = malloc(templen+1)) == NULL)
return NULL;

strcpy(temp,word);
qsort(temp, templen, 1, CharComp);
return temp;
}

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

Keith Thompson · Jun 22, 2009

Malcolm McLean said:
If CHAR_BIT is 32 and the character set includes Chinese, how easy would yur
array method be to implement?

Quite easy, since as long as we're making assumptions I'll also assume
that I have about a terabyte of memory.

}

But yeah, I see your point. There certainly are cases where
O(n log n) can beat O(n) if the scale factors are sufficiently
unfriendly.

Nick Keighley · Jun 22, 2009

If CHAR_BIT is 32 and the character set includes Chinese, how easy would yur
array method be to implement?

is an anagram a meaningful concept in chinese?

James Kuyper · Jun 22, 2009

Nick said:
is an anagram a meaningful concept in chinese?

Since most Chinese words are at least two characters long, the concept
is at least potentially meaningful. However, since there's few Chinese
words with more than two characters, and even fewer with more than three
(offhand, I can't think of any, but I've only spent a year studying
Mandarin), searching for anagrams is too trivial a task to be interesting.

Felipe Ribeiro · Jun 23, 2009

You need to develop some way to find out what your programs are really
doing. Posting to Usenet will be a very slow way to learn. What have
you done to try to find out what us happening? Do you have a debugger?

I had inserted some code and could see that the array was being
assigned strange values but couldn't find out why.
I hadn't been using any program to help me with my code, mainly
because I thought they were meant to help with more complex code.
Since my programs are so simple I imagined they woudn't be of any
help. I see that I was wrong: splint found the very same mistake you
pointed out below. I'll make sure I don't come back here before
analyzing my code more deeply. =)

#include <stdio.h>
#include <ctype.h>

Click to expand...

#define LEN 26
#define TRUE 1
#define FALSE 0

Click to expand...

typedef int bool;

Click to expand...

void read_word(int letters_in_word[], int array_length);
bool are_anagrams(int letters_in_word1[], int letters_in_word2[],
int array_length);

Click to expand...

int main(void)
{
/* The number of times each letter appears in the words */
int letters_word1[LEN] = {0}, letters_word2[LEN] = {0}, i;

Click to expand...

printf("Enter first word: ");
read_word(letters_word1, LEN);

Click to expand...

printf("Enter second word: ");
read_word(letters_word2, LEN);

Click to expand...

if (are_anagrams(letters_word1, letters_word2, LEN))
printf("The words are anagrams.\n");
else
printf("The words are not anagrams.\n");

Click to expand...

return 0;
}

Click to expand...

void read_word(int word[], int length)

Click to expand...

length is no used (but it should be!).

I'm a bit confused with this. length is not necessary in the function.
If the length of an array isn't used in a function, do I still have to
require it as a parameter?

It is usually better to read into an int. That way, you can be
reliably told about the end of the input. The C FAQ is worth
bookmarking:http://c-faq.com/

I didn't quite understand what you said but will read through the faq
to see if I can find something that helps me in understanding.

This does not do what you think. It sets ch to the result (0 or 1) of
the test getchar() != '\n'.

Thanks a lot for pointing this out. As I said before, splint found the
same mistake. I'll use it from now on. =)

<snip>

Ben Bacarisse · Jun 23, 2009

Felipe Ribeiro said:
void read_word(int word[], int length)

Click to expand...

length is no used (but it should be!).

I'm a bit confused with this. length is not necessary in the function.
If the length of an array isn't used in a function, do I still have to
require it as a parameter?[/QUOTE]

I was saying that you don't use so why is it there as a parameter?
The bit in parentheses was saying that in fact the real problem is not
that pass it but that you don't use it. You can use it to be sure
that you don't access beyond the end of the array.

I didn't quite understand what you said but will read through the faq
to see if I can find something that helps me in understanding.

See, specifically, Q 12.1: http://c-faq.com/stdio/getcharc.html

Thanks a lot for pointing this out. As I said before, splint found the
same mistake. I'll use it from now on. =)

That's not a bad idea but be warned that splint can come out with some
very odd messages at times!

Lie Ryan · Jun 24, 2009

Nick said:
is an anagram a meaningful concept in chinese?

Maybe... if you define anagram as all the valid characters a set of
strokes could produce

but they aren't related to character set.

Paul Hsieh · Jun 24, 2009

Here goes the code:
[snipped]
* If the number of times each letter appears in each word is the same,
* then the words are anagrams.
*Far* too complicated. Sort the letters in each word in alphabetical order,
then compare the sorted lists.

Click to expand...

Click to expand...

I don't see a problem with the first approach. Infact the first method
works in O(n), while sorting the arrays itself will take more time (of
the order of n sqr or nlogn).

Click to expand...

For strings as short as a word (typically less than ten bytes), the difference
in execution time between O(n) and O(n^2) algorithms is irrelevant, as both
will be measured in microseconds. For data sets this small, ease of
implementation trumps execution speed.

Using qsort() is incredibly slow in this scenario as you have to call
a callback function on each comparison. To escape this you would have
to write your own sort, which is a lot more complicated than the
originally proposed solution.

Actually I believe this can be done in one pass per string with the
character table approach. Zero out the character table, and just
count +1 per character from the first string. Also keep count of the
number of times any count transitions from 0 to 1. So:

memset (table, 0, sizeof (table));
transition = 0;
for (c=(unsigned char *) str0; *c; c++) {
transition += 0 == table[*c];
table[*c]++;
}

Then for the second string, subtract 1 per character, and subtract 1
for the number of transitions from 1 to 0. If there is a transition
from 0 to -1, we know we don't have an anagram.

for (c=(unsigned char *) str1; *c; c++) {
if (0 > --table[*c]) return NOT_AN_ANAGRAM;
transition -= 0 == table[*c];
}

Then we would expect the number of "transitions" should be exactly
balanced at 0.

return transition ? NOT_AN_ANAGRAM : IS_AN_ANAGRAM;

The whole point of this is that you don't need to check all 256
entries of table[] but in fact only the entries that you touch.

And you wanted to call qsort()?

Doug Miller · Jun 24, 2009

Using qsort() is incredibly slow in this scenario as you have to call
a callback function on each comparison.

Nonsense. Unless the process will be repeated a large number of times for
different pairs of words, that doesn't matter. One iteration, checking two
words, will run so fast regardless of the algorithm that there is simply no
point in optimizing for execution speed. Rather, it should be optimized for
ease of implementation.

Squeamizh · Jun 24, 2009

Nonsense. Unless the process will be repeated a large number of times for
different pairs of words, that doesn't matter. One iteration, checking two
words, will run so fast regardless of the algorithm that there is simply no
point in optimizing for execution speed. Rather, it should be optimized for
ease of implementation.

Your point is valid, but not all that interesting. You didn't need to
say it twice, and it certainly does not render Paul's post as
nonsense. You seem to agree with him anyway: relative to his
implementation, yours is incredible slow.

BartC · Jun 24, 2009

Doug Miller said:
Nonsense. Unless the process will be repeated a large number of times for
different pairs of words, that doesn't matter. One iteration, checking two
words, will run so fast regardless of the algorithm that there is simply
no
point in optimizing for execution speed. Rather, it should be optimized
for
ease of implementation.

I did some tests and a qsort solution took about 9000ms.

My own algorithm using a table of flags, loads of strlens everywhere and
with O(n^2) performance, took 2500ms.

Paul's algorithm took 900ms.

(The sort method needed to duplicate the strings, the other methods were
non-destructive.)

It seems to me that a 10x speed factor is significant.

Doug Miller · Jun 25, 2009

I did some tests and a qsort solution took about 9000ms.

My own algorithm using a table of flags, loads of strlens everywhere and
with O(n^2) performance, took 2500ms.

Paul's algorithm took 900ms.

(The sort method needed to duplicate the strings, the other methods were
non-destructive.)

It seems to me that a 10x speed factor is significant.

The difference between 9 seconds and 0.9 seconds is not.

Keith Thompson · Jun 25, 2009

The difference between 9 seconds and 0.9 seconds is not.

Depending on the context, it could be *very* significant.

BartC · Jun 25, 2009

Doug Miller said:
The difference between 9 seconds and 0.9 seconds is not.

OK call it 90 seconds and 9 seconds.

Or 0.1 seconds (acceptable user interface delay) and 1 second (unacceptable
delay).

If it wasn't significant, everyone here would be using something like
Python.

C language. work with text	3	Dec 10, 2021
Qsort() messing with my entire Code	0	Apr 25, 2022
Prompt was to have user enter sentence and output should be the number of words, this code's correct but idk How "prevChar = currentChar"work?	2	Mar 8, 2024
I have to finish this code for my assignment but I cant figure out how to solve it	1	Jun 27, 2023
SENTINEL CONTROL LOOP WHEN DEALING WITH TWO ARRAYS	1	Oct 26, 2023
Anagrams	9	Oct 23, 2007
Addition and substraction of polynomials is working fine but the multiplication isn't; what's wrong with my code	1	Nov 22, 2022
Could you fix my code please? (i get no output after inputing the number)	1	Oct 16, 2023

My code to determine whether two words are anagrams won't work.

Felipe Ribeiro

Ben Bacarisse

Doug Miller

dee

Doug Miller

Keith Thompson

Lew Pitcher

Keith Thompson

Nick Keighley

James Kuyper

Felipe Ribeiro

Ben Bacarisse

Lie Ryan

Paul Hsieh

Doug Miller

Squeamizh

BartC

Doug Miller

Keith Thompson

BartC

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads