Buiding anamgrams

andrea · Dec 26, 2008

I'm studying ruby (and I really love it even more than python) and I
want to do a little exercise.

I want to create an efficient module to generate every possible
anagrams of a word, of any given size...

I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTrotter.shtml
works fine) and I'd like to use enumerators in the best and most
efficient way.

I also looked at callcc which is great for backtracking, could be used
in this case??
thanks for any help

A basic structure like that could be fine?
class Anagrams
attr :word
include Enumerable
include Comparable

def <=>(other)
self.word <=> other.word
end

def each(&block)
@word.chars
end

def initialize(word)
@word = word
end

end

Robert Klemme · Dec 26, 2008

I'm studying ruby (and I really love it even more than python) and I
want to do a little exercise.

Welcome to the pack! What a nice Christmas occupation.

I want to create an efficient module to generate every possible
anagrams of a word, of any given size...

Do you really mean "anagrams" or rather "permutations"? To decide on
anagrams you likely need some kind of dictionary which is significantly
more difficult than just permuting.

I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTrotter.shtml
works fine) and I'd like to use enumerators in the best and most
efficient way.

From 1.8.7 on you can do this

irb(main):009:0> "abc".scan(/./).permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):010:0> "abc".chars.to_a.permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):011:0>

Now you only need a filter that properly decides which of those is a
proper word and use that to decide which words to print and which not.

I also looked at callcc which is great for backtracking, could be used
in this case??

I would try to avoid it as it makes code very hard to read - at least
for me.

thanks for any help

A basic structure like that could be fine?
class Anagrams
attr :word
include Enumerable
include Comparable

def <=>(other)
self.word <=> other.word
end

def each(&block)
@word.chars
end

def initialize(word)
@word = word
end

end

What exactly should be the responsibility of this class? Especially,
why do you call it "anagrams" but store only one word in it?

Cheers

robert

tjnogueira · Dec 26, 2008

That's it. In ruby 1.8.6 you cant use this form , only for 1.8.7 and prior
...
I tested here and the code works perfectly.
About your class:
class Anagramsi really dont understant what it means. Explain better.
King regards
tiago

I'm studying ruby (and I really love it even more than python) and I
want to do a little exercise.

Click to expand...

Welcome to the pack! What a nice Christmas occupation.

I want to create an efficient module to generate every possible
anagrams of a word, of any given size...

Click to expand...

Do you really mean "anagrams" or rather "permutations"? To decide on
anagrams you likely need some kind of dictionary which is significantly
more difficult than just permuting.

I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTrotter.shtml
works fine) and I'd like to use enumerators in the best and most
efficient way.

Click to expand...

From 1.8.7 on you can do this

irb(main):009:0> "abc".scan(/./).permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):010:0> "abc".chars.to_a.permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):011:0>

Now you only need a filter that properly decides which of those is a
proper word and use that to decide which words to print and which not.

I also looked at callcc which is great for backtracking, could be used
in this case??

Click to expand...

I would try to avoid it as it makes code very hard to read - at least
for me.

thanks for any help

A basic structure like that could be fine?
class Anagrams
attr :word
include Enumerable
include Comparable

def <=>(other)
self.word <=> other.word
end

def each(&block)
@word.chars
end

def initialize(word)
@word = word
end

end

Click to expand...

What exactly should be the responsibility of this class? Especially,
why do you call it "anagrams" but store only one word in it?

Cheers

robert

Axel Etzold · Dec 27, 2008

Dear Andrea,

I want to create an efficient module to generate every possible
anagrams of a word, of any given size...
I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTrotter.shtml
works fine) and I'd like to use enumerators in the best and most
efficient way.

I also looked at callcc which is great for backtracking, could be used
in this case??

Permutations become quite unwieldy very quickly ... maybe you'd have a look
at Hidden Markov Models instead. There are several algorithms associated
to these models. I think what you'd want to do is to apply the "forward algorithm"
to estimate how probable a given word is, given the probability of letter x2 succeeding
letter x1 in the language you are trying to form anagrams in ( ... download some novels
from project Gutenberg, split them into pairs of letters and count.....) .

For the forward algorithm, look at the concrete example here:

http://en.wikipedia.org/wiki/Viterbi_algorithm

There once was a Ruby quiz about Markov chains to help you with Ruby:

http://www.rubyquiz.com/quiz74.html

To test whether a given word exists in language X, you could use raspell in conjunction with
aspell:

http://blog.evanweaver.com/files/doc/fauna/raspell/files/README.html

Best regards,

Axel

andrea · Dec 27, 2008

Thanks a lot everyone, now I already posted but don't know why it
doesn't appear my message.
Anyway you're right Axel, this is the first "solution" to the problem
but it's really too slow for any reasonable large size of the string.

This is the code, any hint about style or whatever is welcome:
http://pastie.textmate.org/private/rp4ivw36vefuz07twjlw3g

I'll have a look to the Markov Chain (which I'm studying right now at
uni).

Dave Thomas · Dec 27, 2008

Thanks a lot everyone, now I already posted but don't know why it
doesn't appear my message.
Anyway you're right Axel, this is the first "solution" to the problem
but it's really too slow for any reasonable large size of the string.

This is the code, any hint about style or whatever is welcome:
http://pastie.textmate.org/private/rp4ivw36vefuz07twjlw3g

I think a faster approach, particularly to find multiple anagrams, is
to create a signature or hash with the property that each word in a
set of anagrams will have the same hash code. Finding anagrams then
becomes a problem of building lists of words with the hash hash code.

A convenient hash is simply to take all the letters in each work and
sort them, so that "dog" becomes "dgo" and "god" also becomes "dgo".
Because they have the same hash, they are anagrams.

Here's the code from the PickAxe[1] that uses this to find anagrams:

http://pastie.textmate.org/private/yttznlgzcf9czdkc7vxea

and here are some simple tests:

http://pastie.textmate.org/private/stv3sxaecc6z8uwxclldba

In fact, I use this as an example of Gem packaging, so you can
actually do

gem install anagram

and get the whole thing.

Cheers

Dave

[1] http://pragprog.com/titles/ruby3

Justin Collins · Dec 28, 2008

andrea said:
Thanks a lot everyone, now I already posted but don't know why it
doesn't appear my message.
Anyway you're right Axel, this is the first "solution" to the problem
but it's really too slow for any reasonable large size of the string.

This is the code, any hint about style or whatever is welcome:
http://pastie.textmate.org/private/rp4ivw36vefuz07twjlw3g

I'll have a look to the Markov Chain (which I'm studying right now at
uni).

Just for fun/reference, this is what I came up with after reading your
post earlier. Note that it doesn't do the "of any length" requirement,
though.

require 'rubygems'
require 'raspell'

def anagrams word
checker = Aspell.new
word.split(//).permutation.select { |w| checker.check w.join
}.map { |w| w.join }
end

puts anagrams "words"

-Justin

andrea · Dec 31, 2008

Thanks to everyone for your very nice answers, now I wrote another
version of the program.

http://pastie.textmate.org/private/ryxlzlqle8txaorswiynea

Which implments all 3 methods discussed here, the nicest of course is
the third.

It creates signature only if really needed, otherwise it just loads
the dict with Marshall. (other ways?)
Is correct the use of md5?? I think it is as it gives the same result
of the openssl md5, but it looks a little bit too fast for such a big
file.
dict_md5 = Digest::MD5.hexdigest(open(@dict).read)
conf_md5 = open($CONF).read
# then they must be equal and I have the correct
# signatures for my dictionary
if dict_md5 == conf_md5
puts "we already have the right signatures" if $DEBUG
# loading the precomputed signature file
@signatures = Marshal.load(open($SIGNATURE))
else

It's not bad at all, the only problem is that with a huge dictionary
it takes a long time to load it (around 38seconds for a almost 1-
million word dictionary) and takes a big amount of ram.

Could I maybe load it incrementally? Maybe divide the signs by size
and load just what I need to check?
Is it possible using only one file?

Thanks a lot

ruby1.9: lazy versions of Enumerator#select and friends?	11	Nov 2, 2008
Issue with textbox script?	0	Sep 5, 2022
A greenhand rubyist program -- ask for comments	1	Mar 27, 2011
Generator implementation	0	Feb 11, 2007
Binding scope	10	Jan 12, 2010
String#each_slice methods (like Enumerable#each_slice)	4	Apr 6, 2011
obj.attr(qualifier) = value -- possible?	8	May 15, 2009
Enumerable	4	Aug 1, 2006

Buiding anamgrams

andrea

Robert Klemme

tjnogueira

Axel Etzold

andrea

Dave Thomas

Justin Collins

andrea

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads