R
Ruby Quiz
First, let me clear up the naming issue, since I missed it when created the
quiz. The actual term for sentences that contain all the letters of the
alphabet is pangram (or Swallowsgram) as discussed in the linked article. The
quiz name is in error.
Now that we know what to call them, the question becomes how do we generate self
documenting pangrams? The linked article described a technique called
"Robbinsoning," which is a simple process. The idea is that you start with some
random distribution of letter counts, build the sentence, adjust the counts to
reflect the actual sentence counts, rebuild, adjust, etc. You can zero in on a
solution in this fashion and most of the submitted solutions used something
along these lines.
I want to have a look at Danial Martin's code below, but before we get into that
you need to know how Daniel's code tracks letter frequencies. Here's Daniel's
own description of the technique:
# I represented the letter frequencies of letters in a sentence
# as one huge bignum, such that if "freq" was a variable containing
# the number, then "freq & 0xFF" would be the number of "a"s in the
# sentence, "(freq>>8) & 0xFF" would be the number of "b"s, etc.
#
# This means that when I adjust a guess, changing the actual frequency
# is as simple as adding and subtracting from a single variable.
Now that we know what to expect, here's the first bit of solution code
(reformatted slighty):
def find_sentence(prefix, suffix, initial = {})
letterre = Regexp.new('(?i:[a-z])');
letters = ('a'..'z').to_a
letterpower = Hash.new {|h,k| h[k] = 1 << ((k[0]-?a)*8)}
lettershifts = letters.map {|x| ((x[0]-?a)*8)}
basesentence = prefix + letters.map {|x|
(x == 'z'? 'and ' : '') + "_ '#{x}'"
}.join(', ') + suffix
basefreq = 0
basesentence.scan(letterre) {|x| basefreq += letterpower[x.downcase]}
# ...
Obviously, we have a lot of setup work here. Let's take it line by line,
because there's a lot going on. This method is invoked with a sentence prefix
and suffix, and optionally the initial letter counts to try. When invoked, the
first line defines a letter Regexp that will match individual letters in upper
or lower case. The next line generates an Array of letters the code can iterate
over.
The next two variables are helpers for working with the Bignum frequencies.
letterpower will give you the number needed to add one count for the keyed
letter to the frequencies and letter shifts is the offset a given letter is
shifted into the Bigum. (Note that letterpower calculates its own shift, in the
same way lettershifts does.)
The next three lines are easier to swallow. First, the sentence is constructed
using the prefix, placeholders for counts, the word "and" as needed, and the
sentence suffix. The next two lines then calculate the letter frequencies of
this baseline using the Regexp to iterate over the letters and letterpower to
adjust the count.
Let's tackle the next chunk of code:
# ...
# enfreq holds the letter counts that spelling out that number adds to
# the sentence.
# E.g. enfreq[1] == letterpower['o'] + letterpower['n'] + letterpower['e']
enfreq = Hash.new {|h,k|
if k > 255 then
h[k] = h[k >> 8]
else
h[k] = 0
k.to_en.scan(letterre) {|x| h[k] += letterpower[x.downcase]}
h[k] += letterpower['s'] if k != 1
end
h[k]
}
# ...
This Hash is a typical memoization idiom in Ruby. Give the Hash the code to
calculate values from keys which it will invoke on the first call, then all
future calls use a simple Hash lookup. The lookup is much faster of course,
since it doesn't need to rerun the code to build it.
In this case, the code builds letter counts, to add to the overall frequency
counts, for the English word equivalents to the passed number. The else
statement is where that happens. The process is basically what we saw for
counting the base sentence frequency before. Note that the code accounts for
the s needed, should the count be plural.
The to_en() call in this code is provided by Glenn Parker's solution to Ruby
Quiz #25. I've discussed that code multiple times now, so I left it out of this
summary for brevity.
Time for a little more code (minus a not needed variable removed by me):
# ...
guessfreq = 0
letters.each{|x|
guessfreq += (initial[x]||0) * letterpower[x]
}
guessfreq = basefreq if guessfreq == 0
actualfreq = 0
# ...
Here we have the last bit of all this setup work. First, an initial guessfreq
is built for the letters based on the passed Hash. If no starting point was
given, the sentence uses the basefreq calculated earlier. Finally, a variable
is allocated to hold the actualfreq of the generated sentence.
OK, grab a deep breath and let's finally tackle the actual guessing loop that
zeros in on solutions (debugging code and comments removed by me):
# ...
until guessfreq == actualfreq do
if actualfreq > 0 then
lettershifts.each{ |y|
g = 0xFF & (guessfreq >> y)
a = 0xFF & (actualfreq >> y)
if (g != a)
d = (g-a).abs
r1 = rand(d+1)
r2 = rand(d+1)
r1=r2 if r1 < r2
r1=-r1 if a<g
if (r1 != 0) then
guessfreq += r1 << y
actualfreq += enfreq[g+r1] - enfreq[g]
end
end
}
else
actualfreq = basefreq
lettershifts.each {|y|
g = 0xFF & (guessfreq >> y)
actualfreq += enfreq[g]
}
end
end
# ...
This code cycles until our latest guess matches the actual count for the
sentence. On the first pass actualfreq will be zero, so the else clause is
executed. This sets actualfreq to the baseline and adds in our guess. After
that, each iteration should hit the if branch.
Each time through the if branch, every letter is compared for a distance from
its guess value and its actual value. A random number is selected (well two
actually with the high one favored) and added to our guess. Then the actual is
updated to reflect the change. The net effect is that they close in on each
other until our guess matches reality.
When they match, the final sentence can be constructed:
prefix + ('a'..'z').map {|x|
g = (guessfreq >> ((x[0]-?a)*8))%256
(x == 'z'? 'and ' : '') + "#{g.to_en} '#{x}'" +
(g==1 ? '' : 's')}.join(', ') + suffix
end
That works like the original sentence construction, but real numbers instead of
actual placeholders this time. That's returned as our final result.
Here's the code that starts the process, passing the initial prefix and suffix:
puts find_sentence(
"Daniel Martin's sallowsgram program produced a sentence with ", "."
)
Interestingly, it seems that some inputs never resolve to a solution. I have
not investigated this too deeply, but multiple quiz solvers reported the same
issue.
My thanks to all the clever solvers who found all those pangrams so quickly. I
learned a lot from reading the solutions, including how to use NArray from Simon
Kroeger (worth a look).
Tomorrow's quiz has us inventing time travel for Ruby, so the summary for that
could show up at any moment now...
quiz. The actual term for sentences that contain all the letters of the
alphabet is pangram (or Swallowsgram) as discussed in the linked article. The
quiz name is in error.
Now that we know what to call them, the question becomes how do we generate self
documenting pangrams? The linked article described a technique called
"Robbinsoning," which is a simple process. The idea is that you start with some
random distribution of letter counts, build the sentence, adjust the counts to
reflect the actual sentence counts, rebuild, adjust, etc. You can zero in on a
solution in this fashion and most of the submitted solutions used something
along these lines.
I want to have a look at Danial Martin's code below, but before we get into that
you need to know how Daniel's code tracks letter frequencies. Here's Daniel's
own description of the technique:
# I represented the letter frequencies of letters in a sentence
# as one huge bignum, such that if "freq" was a variable containing
# the number, then "freq & 0xFF" would be the number of "a"s in the
# sentence, "(freq>>8) & 0xFF" would be the number of "b"s, etc.
#
# This means that when I adjust a guess, changing the actual frequency
# is as simple as adding and subtracting from a single variable.
Now that we know what to expect, here's the first bit of solution code
(reformatted slighty):
def find_sentence(prefix, suffix, initial = {})
letterre = Regexp.new('(?i:[a-z])');
letters = ('a'..'z').to_a
letterpower = Hash.new {|h,k| h[k] = 1 << ((k[0]-?a)*8)}
lettershifts = letters.map {|x| ((x[0]-?a)*8)}
basesentence = prefix + letters.map {|x|
(x == 'z'? 'and ' : '') + "_ '#{x}'"
}.join(', ') + suffix
basefreq = 0
basesentence.scan(letterre) {|x| basefreq += letterpower[x.downcase]}
# ...
Obviously, we have a lot of setup work here. Let's take it line by line,
because there's a lot going on. This method is invoked with a sentence prefix
and suffix, and optionally the initial letter counts to try. When invoked, the
first line defines a letter Regexp that will match individual letters in upper
or lower case. The next line generates an Array of letters the code can iterate
over.
The next two variables are helpers for working with the Bignum frequencies.
letterpower will give you the number needed to add one count for the keyed
letter to the frequencies and letter shifts is the offset a given letter is
shifted into the Bigum. (Note that letterpower calculates its own shift, in the
same way lettershifts does.)
The next three lines are easier to swallow. First, the sentence is constructed
using the prefix, placeholders for counts, the word "and" as needed, and the
sentence suffix. The next two lines then calculate the letter frequencies of
this baseline using the Regexp to iterate over the letters and letterpower to
adjust the count.
Let's tackle the next chunk of code:
# ...
# enfreq holds the letter counts that spelling out that number adds to
# the sentence.
# E.g. enfreq[1] == letterpower['o'] + letterpower['n'] + letterpower['e']
enfreq = Hash.new {|h,k|
if k > 255 then
h[k] = h[k >> 8]
else
h[k] = 0
k.to_en.scan(letterre) {|x| h[k] += letterpower[x.downcase]}
h[k] += letterpower['s'] if k != 1
end
h[k]
}
# ...
This Hash is a typical memoization idiom in Ruby. Give the Hash the code to
calculate values from keys which it will invoke on the first call, then all
future calls use a simple Hash lookup. The lookup is much faster of course,
since it doesn't need to rerun the code to build it.
In this case, the code builds letter counts, to add to the overall frequency
counts, for the English word equivalents to the passed number. The else
statement is where that happens. The process is basically what we saw for
counting the base sentence frequency before. Note that the code accounts for
the s needed, should the count be plural.
The to_en() call in this code is provided by Glenn Parker's solution to Ruby
Quiz #25. I've discussed that code multiple times now, so I left it out of this
summary for brevity.
Time for a little more code (minus a not needed variable removed by me):
# ...
guessfreq = 0
letters.each{|x|
guessfreq += (initial[x]||0) * letterpower[x]
}
guessfreq = basefreq if guessfreq == 0
actualfreq = 0
# ...
Here we have the last bit of all this setup work. First, an initial guessfreq
is built for the letters based on the passed Hash. If no starting point was
given, the sentence uses the basefreq calculated earlier. Finally, a variable
is allocated to hold the actualfreq of the generated sentence.
OK, grab a deep breath and let's finally tackle the actual guessing loop that
zeros in on solutions (debugging code and comments removed by me):
# ...
until guessfreq == actualfreq do
if actualfreq > 0 then
lettershifts.each{ |y|
g = 0xFF & (guessfreq >> y)
a = 0xFF & (actualfreq >> y)
if (g != a)
d = (g-a).abs
r1 = rand(d+1)
r2 = rand(d+1)
r1=r2 if r1 < r2
r1=-r1 if a<g
if (r1 != 0) then
guessfreq += r1 << y
actualfreq += enfreq[g+r1] - enfreq[g]
end
end
}
else
actualfreq = basefreq
lettershifts.each {|y|
g = 0xFF & (guessfreq >> y)
actualfreq += enfreq[g]
}
end
end
# ...
This code cycles until our latest guess matches the actual count for the
sentence. On the first pass actualfreq will be zero, so the else clause is
executed. This sets actualfreq to the baseline and adds in our guess. After
that, each iteration should hit the if branch.
Each time through the if branch, every letter is compared for a distance from
its guess value and its actual value. A random number is selected (well two
actually with the high one favored) and added to our guess. Then the actual is
updated to reflect the change. The net effect is that they close in on each
other until our guess matches reality.
When they match, the final sentence can be constructed:
prefix + ('a'..'z').map {|x|
g = (guessfreq >> ((x[0]-?a)*8))%256
(x == 'z'? 'and ' : '') + "#{g.to_en} '#{x}'" +
(g==1 ? '' : 's')}.join(', ') + suffix
end
That works like the original sentence construction, but real numbers instead of
actual placeholders this time. That's returned as our final result.
Here's the code that starts the process, passing the initial prefix and suffix:
puts find_sentence(
"Daniel Martin's sallowsgram program produced a sentence with ", "."
)
Interestingly, it seems that some inputs never resolve to a solution. I have
not investigated this too deeply, but multiple quiz solvers reported the same
issue.
My thanks to all the clever solvers who found all those pangrams so quickly. I
learned a lot from reading the solutions, including how to use NArray from Simon
Kroeger (worth a look).
Tomorrow's quiz has us inventing time travel for Ruby, so the summary for that
could show up at any moment now...