[SUMMARY] Panagrams (#86)

R

Ruby Quiz

First, let me clear up the naming issue, since I missed it when created the
quiz. The actual term for sentences that contain all the letters of the
alphabet is pangram (or Swallowsgram) as discussed in the linked article. The
quiz name is in error.

Now that we know what to call them, the question becomes how do we generate self
documenting pangrams? The linked article described a technique called
"Robbinsoning," which is a simple process. The idea is that you start with some
random distribution of letter counts, build the sentence, adjust the counts to
reflect the actual sentence counts, rebuild, adjust, etc. You can zero in on a
solution in this fashion and most of the submitted solutions used something
along these lines.

I want to have a look at Danial Martin's code below, but before we get into that
you need to know how Daniel's code tracks letter frequencies. Here's Daniel's
own description of the technique:

# I represented the letter frequencies of letters in a sentence
# as one huge bignum, such that if "freq" was a variable containing
# the number, then "freq & 0xFF" would be the number of "a"s in the
# sentence, "(freq>>8) & 0xFF" would be the number of "b"s, etc.
#
# This means that when I adjust a guess, changing the actual frequency
# is as simple as adding and subtracting from a single variable.

Now that we know what to expect, here's the first bit of solution code
(reformatted slighty):

def find_sentence(prefix, suffix, initial = {})
letterre = Regexp.new('(?i:[a-z])');
letters = ('a'..'z').to_a
letterpower = Hash.new {|h,k| h[k] = 1 << ((k[0]-?a)*8)}
lettershifts = letters.map {|x| ((x[0]-?a)*8)}
basesentence = prefix + letters.map {|x|
(x == 'z'? 'and ' : '') + "_ '#{x}'"
}.join(', ') + suffix
basefreq = 0
basesentence.scan(letterre) {|x| basefreq += letterpower[x.downcase]}

# ...

Obviously, we have a lot of setup work here. Let's take it line by line,
because there's a lot going on. This method is invoked with a sentence prefix
and suffix, and optionally the initial letter counts to try. When invoked, the
first line defines a letter Regexp that will match individual letters in upper
or lower case. The next line generates an Array of letters the code can iterate
over.

The next two variables are helpers for working with the Bignum frequencies.
letterpower will give you the number needed to add one count for the keyed
letter to the frequencies and letter shifts is the offset a given letter is
shifted into the Bigum. (Note that letterpower calculates its own shift, in the
same way lettershifts does.)

The next three lines are easier to swallow. First, the sentence is constructed
using the prefix, placeholders for counts, the word "and" as needed, and the
sentence suffix. The next two lines then calculate the letter frequencies of
this baseline using the Regexp to iterate over the letters and letterpower to
adjust the count.

Let's tackle the next chunk of code:

# ...

# enfreq holds the letter counts that spelling out that number adds to
# the sentence.
# E.g. enfreq[1] == letterpower['o'] + letterpower['n'] + letterpower['e']
enfreq = Hash.new {|h,k|
if k > 255 then
h[k] = h[k >> 8]
else
h[k] = 0
k.to_en.scan(letterre) {|x| h[k] += letterpower[x.downcase]}
h[k] += letterpower['s'] if k != 1
end
h[k]
}

# ...

This Hash is a typical memoization idiom in Ruby. Give the Hash the code to
calculate values from keys which it will invoke on the first call, then all
future calls use a simple Hash lookup. The lookup is much faster of course,
since it doesn't need to rerun the code to build it.

In this case, the code builds letter counts, to add to the overall frequency
counts, for the English word equivalents to the passed number. The else
statement is where that happens. The process is basically what we saw for
counting the base sentence frequency before. Note that the code accounts for
the s needed, should the count be plural.

The to_en() call in this code is provided by Glenn Parker's solution to Ruby
Quiz #25. I've discussed that code multiple times now, so I left it out of this
summary for brevity.

Time for a little more code (minus a not needed variable removed by me):

# ...

guessfreq = 0
letters.each{|x|
guessfreq += (initial[x]||0) * letterpower[x]
}
guessfreq = basefreq if guessfreq == 0
actualfreq = 0

# ...

Here we have the last bit of all this setup work. First, an initial guessfreq
is built for the letters based on the passed Hash. If no starting point was
given, the sentence uses the basefreq calculated earlier. Finally, a variable
is allocated to hold the actualfreq of the generated sentence.

OK, grab a deep breath and let's finally tackle the actual guessing loop that
zeros in on solutions (debugging code and comments removed by me):

# ...

until guessfreq == actualfreq do
if actualfreq > 0 then
lettershifts.each{ |y|
g = 0xFF & (guessfreq >> y)
a = 0xFF & (actualfreq >> y)
if (g != a)
d = (g-a).abs
r1 = rand(d+1)
r2 = rand(d+1)
r1=r2 if r1 < r2
r1=-r1 if a<g
if (r1 != 0) then
guessfreq += r1 << y
actualfreq += enfreq[g+r1] - enfreq[g]
end
end
}
else
actualfreq = basefreq
lettershifts.each {|y|
g = 0xFF & (guessfreq >> y)
actualfreq += enfreq[g]
}
end
end

# ...

This code cycles until our latest guess matches the actual count for the
sentence. On the first pass actualfreq will be zero, so the else clause is
executed. This sets actualfreq to the baseline and adds in our guess. After
that, each iteration should hit the if branch.

Each time through the if branch, every letter is compared for a distance from
its guess value and its actual value. A random number is selected (well two
actually with the high one favored) and added to our guess. Then the actual is
updated to reflect the change. The net effect is that they close in on each
other until our guess matches reality.

When they match, the final sentence can be constructed:

prefix + ('a'..'z').map {|x|
g = (guessfreq >> ((x[0]-?a)*8))%256
(x == 'z'? 'and ' : '') + "#{g.to_en} '#{x}'" +
(g==1 ? '' : 's')}.join(', ') + suffix
end

That works like the original sentence construction, but real numbers instead of
actual placeholders this time. That's returned as our final result.

Here's the code that starts the process, passing the initial prefix and suffix:

puts find_sentence(
"Daniel Martin's sallowsgram program produced a sentence with ", "."
)

Interestingly, it seems that some inputs never resolve to a solution. I have
not investigated this too deeply, but multiple quiz solvers reported the same
issue.

My thanks to all the clever solvers who found all those pangrams so quickly. I
learned a lot from reading the solutions, including how to use NArray from Simon
Kroeger (worth a look).

Tomorrow's quiz has us inventing time travel for Ruby, so the summary for that
could show up at any moment now...
 
J

James Edward Gray II

Ruby Quiz said:
Now that we know what to call them, the question becomes how do we
generate self documenting pangrams? The linked article described a
technique called "Robbinsoning," which is a simple process. The
idea is that you start with some random distribution of letter
counts, build the sentence, adjust the counts to reflect the actual
sentence counts, rebuild, adjust, etc. You can zero in on a
solution in this fashion and most of the submitted solutions used
something along these lines.

Note that the discussion of the different ways to solve this problem
pointed up two distinctly different "randomized Robbinsoning"
algorithms:

1) rebuild the sentence (or recalculate the frequencies) after each
letter adjustment. This was what my code did.

2) Go through each letter doing randomized adjustments then, after all
letters have been adjusted, rebuild/recalculate.

To have my program use this second algorithm, you can change the "if"
bit in the main loop to:

actual2 = 0
lettershifts.each{ |y|
g = 0xFF & (guessfreq >> y)
a = 0xFF & (actualfreq >> y)
if (g != a)
d = (g-a).abs
r1 = rand(d+1)
r2 = rand(d+1)
r1=r2 if r1 < r2
r1=-r1 if a<g
if (r1 != 0) then
guessfreq += r1 << y
g += r1
end
end
actual2 += enfreq[g]
}
actualfreq = actual2

But you *really* don't want to do that. Adjusting all letters before
recalculating the frequencies turns this consistently-under-30-seconds
program into one that takes over an hour to complete. Note that this
second algorithm is the one followed by the perl script on the page
linked to in the quiz description.

Also note that Simon Kroeger's (first posted) solution follows this
second algorithm but is still able to achieve phenomenally fast speed
because he is able to push all the operations in the lettershifts loop
above into NArray vector operations implemented in C. In his
alternate solution to this quiz, he uses the first algorithm but needs
to loop over all the letter positions in ruby. He reports that the
increased speed of the first algorithm and the comparative slowness of
doing the loop in ruby essentially cancel each other out, leading to
both solutions being roughly equivalent in terms of speed.

Great info all around.

I didn't clue in to the two different algorithms. Thanks for
bringing that up!

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top