Buiding anamgrams

A

andrea

I'm studying ruby (and I really love it even more than python) and I
want to do a little exercise.

I want to create an efficient module to generate every possible
anagrams of a word, of any given size...

I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTrotter.shtml
works fine) and I'd like to use enumerators in the best and most
efficient way.

I also looked at callcc which is great for backtracking, could be used
in this case??
thanks for any help

A basic structure like that could be fine?
class Anagrams
attr :word
include Enumerable
include Comparable

def <=>(other)
self.word <=> other.word
end

def each(&block)
@word.chars
end

def initialize(word)
@word = word
end

end
 
R

Robert Klemme

I'm studying ruby (and I really love it even more than python) and I
want to do a little exercise.

Welcome to the pack! What a nice Christmas occupation. :)
I want to create an efficient module to generate every possible
anagrams of a word, of any given size...

Do you really mean "anagrams" or rather "permutations"? To decide on
anagrams you likely need some kind of dictionary which is significantly
more difficult than just permuting.
I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTrotter.shtml
works fine) and I'd like to use enumerators in the best and most
efficient way.

From 1.8.7 on you can do this

irb(main):009:0> "abc".scan(/./).permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):010:0> "abc".chars.to_a.permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):011:0>

Now you only need a filter that properly decides which of those is a
proper word and use that to decide which words to print and which not.
I also looked at callcc which is great for backtracking, could be used
in this case??

I would try to avoid it as it makes code very hard to read - at least
for me. :)
thanks for any help

A basic structure like that could be fine?
class Anagrams
attr :word
include Enumerable
include Comparable

def <=>(other)
self.word <=> other.word
end

def each(&block)
@word.chars
end

def initialize(word)
@word = word
end

end

What exactly should be the responsibility of this class? Especially,
why do you call it "anagrams" but store only one word in it?

Cheers

robert
 
T

tjnogueira

That's it. In ruby 1.8.6 you cant use this form , only for 1.8.7 and prior
...
I tested here and the code works perfectly.
About your class:
class Anagramsi really dont understant what it means. Explain better.
King regards
tiago




I'm studying ruby (and I really love it even more than python) and I
want to do a little exercise.

Welcome to the pack! What a nice Christmas occupation. :)
I want to create an efficient module to generate every possible
anagrams of a word, of any given size...

Do you really mean "anagrams" or rather "permutations"? To decide on
anagrams you likely need some kind of dictionary which is significantly
more difficult than just permuting.
I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTrotter.shtml
works fine) and I'd like to use enumerators in the best and most
efficient way.

From 1.8.7 on you can do this

irb(main):009:0> "abc".scan(/./).permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):010:0> "abc".chars.to_a.permutation {|x| p x.join}
"abc"
"acb"
"bac"
"bca"
"cab"
"cba"
=> ["a", "b", "c"]
irb(main):011:0>

Now you only need a filter that properly decides which of those is a
proper word and use that to decide which words to print and which not.
I also looked at callcc which is great for backtracking, could be used
in this case??

I would try to avoid it as it makes code very hard to read - at least
for me. :)
thanks for any help

A basic structure like that could be fine?
class Anagrams
attr :word
include Enumerable
include Comparable

def <=>(other)
self.word <=> other.word
end

def each(&block)
@word.chars
end

def initialize(word)
@word = word
end

end

What exactly should be the responsibility of this class? Especially,
why do you call it "anagrams" but store only one word in it?

Cheers

robert
 
A

Axel Etzold

Dear Andrea,
I want to create an efficient module to generate every possible
anagrams of a word, of any given size...
I already done this in python, I want to see how can it be elegant in
ruby...
I need a good permutation algorithm (this
http://www.cut-the-knot.org/Curriculum/Combinatorics/JohnsonTrotter.shtml
works fine) and I'd like to use enumerators in the best and most
efficient way.

I also looked at callcc which is great for backtracking, could be used
in this case??


Permutations become quite unwieldy very quickly ... maybe you'd have a look
at Hidden Markov Models instead. There are several algorithms associated
to these models. I think what you'd want to do is to apply the "forward algorithm"
to estimate how probable a given word is, given the probability of letter x2 succeeding
letter x1 in the language you are trying to form anagrams in ( ... download some novels
from project Gutenberg, split them into pairs of letters and count.....) .

For the forward algorithm, look at the concrete example here:

http://en.wikipedia.org/wiki/Viterbi_algorithm

There once was a Ruby quiz about Markov chains to help you with Ruby:

http://www.rubyquiz.com/quiz74.html

To test whether a given word exists in language X, you could use raspell in conjunction with
aspell:

http://blog.evanweaver.com/files/doc/fauna/raspell/files/README.html

Best regards,

Axel
 
A

andrea

Thanks a lot everyone, now I already posted but don't know why it
doesn't appear my message.
Anyway you're right Axel, this is the first "solution" to the problem
but it's really too slow for any reasonable large size of the string.

This is the code, any hint about style or whatever is welcome:
http://pastie.textmate.org/private/rp4ivw36vefuz07twjlw3g


I'll have a look to the Markov Chain (which I'm studying right now at
uni).
 
D

Dave Thomas

Thanks a lot everyone, now I already posted but don't know why it
doesn't appear my message.
Anyway you're right Axel, this is the first "solution" to the problem
but it's really too slow for any reasonable large size of the string.

This is the code, any hint about style or whatever is welcome:
http://pastie.textmate.org/private/rp4ivw36vefuz07twjlw3g

I think a faster approach, particularly to find multiple anagrams, is
to create a signature or hash with the property that each word in a
set of anagrams will have the same hash code. Finding anagrams then
becomes a problem of building lists of words with the hash hash code.

A convenient hash is simply to take all the letters in each work and
sort them, so that "dog" becomes "dgo" and "god" also becomes "dgo".
Because they have the same hash, they are anagrams.

Here's the code from the PickAxe[1] that uses this to find anagrams:

http://pastie.textmate.org/private/yttznlgzcf9czdkc7vxea

and here are some simple tests:

http://pastie.textmate.org/private/stv3sxaecc6z8uwxclldba

In fact, I use this as an example of Gem packaging, so you can
actually do

gem install anagram

and get the whole thing.


Cheers


Dave

[1] http://pragprog.com/titles/ruby3
 
J

Justin Collins

andrea said:
Thanks a lot everyone, now I already posted but don't know why it
doesn't appear my message.
Anyway you're right Axel, this is the first "solution" to the problem
but it's really too slow for any reasonable large size of the string.

This is the code, any hint about style or whatever is welcome:
http://pastie.textmate.org/private/rp4ivw36vefuz07twjlw3g


I'll have a look to the Markov Chain (which I'm studying right now at
uni).

Just for fun/reference, this is what I came up with after reading your
post earlier. Note that it doesn't do the "of any length" requirement,
though.


require 'rubygems'
require 'raspell'

def anagrams word
checker = Aspell.new
word.split(//).permutation.select { |w| checker.check w.join
}.map { |w| w.join }
end

puts anagrams "words"


-Justin
 
A

andrea

Thanks to everyone for your very nice answers, now I wrote another
version of the program.

http://pastie.textmate.org/private/ryxlzlqle8txaorswiynea

Which implments all 3 methods discussed here, the nicest of course is
the third.

It creates signature only if really needed, otherwise it just loads
the dict with Marshall. (other ways?)
Is correct the use of md5?? I think it is as it gives the same result
of the openssl md5, but it looks a little bit too fast for such a big
file.
dict_md5 = Digest::MD5.hexdigest(open(@dict).read)
conf_md5 = open($CONF).read
# then they must be equal and I have the correct
# signatures for my dictionary
if dict_md5 == conf_md5
puts "we already have the right signatures" if $DEBUG
# loading the precomputed signature file
@signatures = Marshal.load(open($SIGNATURE))
else

It's not bad at all, the only problem is that with a huge dictionary
it takes a long time to load it (around 38seconds for a almost 1-
million word dictionary) and takes a big amount of ram.

Could I maybe load it incrementally? Maybe divide the signs by size
and load just what I need to check?
Is it possible using only one file?


Thanks a lot
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,181
Messages
2,570,970
Members
47,536
Latest member
VeldaYoung

Latest Threads

Top