R
Ruby Quiz
Here's an interesting quote from Brian Schroeder's solution page:
To make it more difficult JEGII introduced the possibility to skip letters,
but in my opinion this leads to bad results (Though it made me think, so it
was a good idea).
I'm not sure exactly what qualifies as "bad results," but here's a thought I had
while reading this:
CAR-4-YOU
That's not the reason I added it though. As usual, the real truth is far less
interesting. It was pointed out to me, by Tobias Peters, that this problem
originated as a means to compare various programming languages. I examined the
original description of the problem, forwarded to me by Tobias, and tried to
stay close to the original challenge. Allowing single numbers comes from there.
One last interesting quote from Brian's page:
-e, --encoding ENCODING
How the alphabet is encoded to phonenumbers. james or logic are supported.
Does this suggest that "james" and "logic" are opposites? Food for thought...
On to the solutions.
I'll show Brian's solution below, because I ran into some minor issues with the
other two. However, do take a look at Jannis Harder's WEBrick servlet. It's
very little work to offer this service to the whole world through a Web
interface and that program shows it off nicely. Lee Marlow's solution was also
short and very straight forward, though the running time was a little high.
Let's inspect Brian's code:
# Nodes in the Dictionary.
class DictionaryNode < Array
# Terminal info
attr_reader :words
def initialize
super(10)
@words = []
end
end
This first piece of the puzzle is a node to be used by a tree class we'll meet
shortly. DictionaryNode is an Array that contains exactly 10 members. Why 10?
Because that's how many digits our encoding has. DictionaryNode also contains
an Array of words, that will be filled from the dictionary file.
Here's the start of Brian's tree class:
# A tree-indexed version of the dictionary that allows
# efficent searching by number 2 alphabet mapping.
class Dictionary
def initialize(encoding)
super()
@encoding = {}
@inverse_encoding = {}
encoding.each do | k, v |
@encoding[k] = v.split(/\s+/).map{|c| c[0]}
end
# Create map from characters to numbers
@inverse_encoding = @encoding.inject({}) { | r, (k, v) |
v.each do | l | r[l] = k end
r
}
@root = DictionaryNode.new
end
# ...
That's pretty easy to follow. This setup work creates maps for the encoding of
this Dictionary object. The maps go both ways, numbers to letters and the
inverse letters to numbers. Finally, the root of the tree is created from a new
DictionaryNode.
The following methods add words to the tree from a dictionary file:
# ...
# Helper method for rekursive adding of words to the dictionary
private
def add_recursive(node, word, position)
if word.length == position
node.words << word
return node
end
add_recursive(
node[@inverse_encoding[word[position]]] ||= DictionaryNode.new,
word,
position + 1 )
end
# Add words to the dictionary
public
def add(word)
add_recursive(@root, word, 0)
self
end
# Load a wordlist from a file, which contains one word per line.
# Ignores punctuation and whitespace.
def load_wordlist(file, options)
$stderr.print "Loading dictionary... " if options.verbose
start = Time.new
file.read.gsub(/[^A-Za-z\n]/, '').upcase!.split($/).uniq!.each do |w|
w.chomp!
next if w.empty? or w.length <= options.min_length
self.add(w)
end
if options.verbose
$stderr.puts "built dictionary in %f seconds" %
(Time.new-start).to_f
end
self
end
# ...
The dictionary reading process starts in the last method, load_wordlist(). This
method slurps the file, discards illegal characters, normalizes case, breaks up
the list on line boundaries, and eliminates repetition all on the third line.
Each member of the list of words that creates is then sent on to the add()
method. Note the nice use of options.verbose to show a build time.
Before I move on to add(), let me point out that the above method does have a
few minor issues. When I was playing with this code to figure out how it works,
I fed it a five word dictionary and was surprised when it crashed. The cause?
No duplicates. That causes uniq!() to return nil (a hot topic on Ruby Talk
lately) and since nil doesn't support an each() call, the code blew up.
upcase!() has similar problems.
One more minor issue. Here's a tip: When you normalize case, it's generally
better to go down than up. The reason is international support. Some languages
distinguish between things like uppercase and titlecase. That means that a
bunch of uppercase conversions might not be consistent, based on certain local
settings. The best way to avoid such problems is to lowercase content instead.
This isn't much of a problem, but it's a good habit to build.
Back to the code. add(), as you can see, is just a shell over add_recursive().
It passes the word on with the root node and a starting position of 0.
add_recursive() is pretty clever. It digs down into the tree until finding the
right spot to place the word in the Dictionary. This digging happens at the end
of the method with recursive calls. The current letter in the word is examined
and a branch of the tree is created to handle that encoded letter, if it didn't
already exist. The algorithm then moves to that node, examining the next letter
in line. When all the letters have been branched off, we're at the right place
to insert the word. The if at the beginning of the method handles that end
condition.
The last thing a Dictionary object requires is a way to hunt for words. Here
are those methods:
# ...
private
# Search words and return (in the block) words and the unmatched rest
# of the number
def sub_find(node, number, &block)
# Return words found so far
block[node.words.map{|w|w.dup}, number] unless node.words.empty?
# No more digits, so stop searching here
return node if number.empty?
# Search for longer words
sub_find(node[number[0]], number[1..-1], &block) if node[number[0]]
end
private
# Calculate all allowed skip patterns for a number of a given length
def skips(s, length)
return if length == 0
result = skips(s + [false], length-1)
result.concat(skips(s + [true], length-1)) unless s[-1]
result
end
public
def find_noskip(number)
result = []
sub_find(@root, number) do | words, rest_number |
if rest_number.empty?
result.concat(words)
else
find_noskip(rest_number).each do | sentence |
words.each do | w |
result << w + '-' + sentence
end
end
end
end
result
end
# Skipping makes this a bit ugly
def find(number)
result = []
skips([], number.length).each do | skipped |
# Create the injector that can inject the skipped numbers
# back into the word
injector = []
skipped.zip(number).each_with_index do |(s,n), i|
injector << [n.to_s, i] if s
end
# We search for words built from the unskipped digits
unskipped_digits =
number.zip(skipped).select{|(d, s)| !s}.map{|(d,s)|d}
sentences = find_noskip(unskipped_digits)
# Inject the skipped digits back into the found sentences
sentences.each do | s |
injector.each do | (n, i) | s.insert(i, n) end
end
result.concat(sentences)
end
result
end
end
Start with the sub_find() method. It's the key to the search and easy enough to
digest. sub_find() takes a node to search, the number to use in that search,
and a block to pass results to. The first line passes all matching words from
this node to the block, if there are any. The root node, where the algorithm
begins, won't have any since most dictionaries don't include 0 length words.
The second line finishes the process, if we've examined all the numbers. The
third line recurses, moving to the node for the next digit at the head of the
number variable. That's half of the picture.
The find_noskip() method is the public face for that. It calls sub_find(),
passing a block of code that fills the local results Array as matches are found.
When a word matches in the number, find_noskip() recurses looking other words to
finish off the number. Of course, as the name implies, this version of the
process does not skip digits.
For skipping, you need the find() method. find() first calls skip() to
calculate all possible skip patterns for this number. Then, one skip at a time,
find() removes the skipped digits and calls find_noskip() on the remainder.
After results are generated, the skips are reinserted back into their original
locations. That's pretty tricky.
To be clear, this does not function as I intended the quiz to work (and I now
understand why Brian thinks I allowed "bad results"). Numbers were only to be
allowed at word boundaries, while Brian's algorithm will reinsert them into the
middle of words. Looking back, I did not make this very clear in the quiz and
it's certainly my error. Brian's code is still a very nice implementation of
his interpretation of the rules.
Finally, there's an interface that puts all that code to use. Let's look at
that:
encodings = {
:james => {
2 => 'A B C',
3 => 'D E F',
4 => 'G H I',
5 => 'J K L',
6 => 'M N O',
7 => 'P Q R S',
8 => 'T U V',
9 => 'W X Y Z'},
:logic => {
0 => 'A B',
1 => 'C D',
2 => 'E F',
3 => 'G H',
4 => 'I J K',
5 => 'L M N',
6 => 'O P Q',
7 => 'R S T',
8 => 'U V W',
9 => 'X Y Z'
}
}
require 'optparse'
class PhonewordOptions < OptionParser
attr_reader :dictionary, :encoding, :format, :allow_skips, :help,
:encoding_help, :verbose, :min_length
def initialize
super()
@dictionary = '/usr/share/dict/words'
@encoding = :james
@format = lain
@allow_skips = true
@HELP = false
@encoding_help = false
@verbose = false
@ignore_non_alpha = false
@min_length = 1
self.on("-d", "--dictionary DICTIONARY", String) { | v |
@dictionary = v
}
self.on("-e", "--encoding ENCODING", String,
"How the alphabet is encoded to phonenumbers. " +
"james or logic are supported.") { | v |
@encoding = v.downcase.to_sym
}
self.on("-p", "--plain",
'One result per found number, ' +
'no other information. (Default)') { @format = lain }
self.on("-f", "--full", 'Prefix the result with the number') {
@format = :full
}
self.on("-v", "--verbose", 'Make more noise') { @verbose = true }
self.on("-s", "--skips", "--allow_skips", "--allow-skips",
'Allow to skip one adjacent number while matching. (Default)',
'Gives lots of ugly results, but james asked for it.') {
@allow_skips = true
}
self.on("-c", "--no-skips",
"Don't leave numbers in the detected words") {
@allow_skips = false
}
self.on("-m" "--min-length", "Minimum length of accepted words.",
"Use this to ignore one-letter words that make " +
"the output quite uninteresting.", Integer) { | v |
@min_length = v
}
self.on("-?", "--help") { @HELP = true }
self.on("--supported-encodings", "--encoding-help",
"List the supported encodings") { @encoding_help = true }
end
end
options = PhonewordOptions.new
options.parse!(ARGV)
if options.help
puts options
exit
end
if options.encoding_help or !encodings[options.encoding]
puts "Possible encodings:"
puts encodings.to_a.sort_by{|(k,v)|k.to_s}.map{|(k,v)|
"#{k}:\n"+v.map{|(n,e)|" #{n}: #{e}"}.sort.join("\n")
}
exit
end
dictionary = Dictionary.new(encodings[options.encoding]).load_wordlist(
File.open(options.dictionary), options
)
output = {
lain => lambda do | number, sentence | sentence end,
:full => lambda do | number, sentence |
"#{number.ljust(15)}: #{sentence}"
end
}
method = {true => :find, false => :find_noskip }
ARGF.each do | number |
number.strip!
number = number.gsub(/[^0-9]/, '').unpack('C*').map{|n|n - ?0}
$stderr.puts "Searching for #{number}" if options.verbose
dictionary.send(
method[options.allow_skips], number
).each do | sentence |
puts output[options.format][number, sentence]
end
end
Most of that code is option handling. Brian creates his own PhonewordOptions
object, which inherits from OptionParser. In the setup for that object,
defaults are established and option parsing in defined with several calls to
on(). From there, reader methods are provided for all the defined options.
This makes for a pretty self-contained bundle of option parsing and reading.
You can see the options object put to good use, after the class.
That final block is what actually kicks off the program. Each number is read
from ARGF, cleaned up, and passed to the find methods of the dictionary object.
Results from that find are printed, creating a complete solution.
My thanks go out to all three quiz workers. It was nice to have a few people
playing along again.
Tomorrows quiz will stay with the topic of phones and how we use them...
To make it more difficult JEGII introduced the possibility to skip letters,
but in my opinion this leads to bad results (Though it made me think, so it
was a good idea).
I'm not sure exactly what qualifies as "bad results," but here's a thought I had
while reading this:
CAR-4-YOU
That's not the reason I added it though. As usual, the real truth is far less
interesting. It was pointed out to me, by Tobias Peters, that this problem
originated as a means to compare various programming languages. I examined the
original description of the problem, forwarded to me by Tobias, and tried to
stay close to the original challenge. Allowing single numbers comes from there.
One last interesting quote from Brian's page:
-e, --encoding ENCODING
How the alphabet is encoded to phonenumbers. james or logic are supported.
Does this suggest that "james" and "logic" are opposites? Food for thought...
On to the solutions.
I'll show Brian's solution below, because I ran into some minor issues with the
other two. However, do take a look at Jannis Harder's WEBrick servlet. It's
very little work to offer this service to the whole world through a Web
interface and that program shows it off nicely. Lee Marlow's solution was also
short and very straight forward, though the running time was a little high.
Let's inspect Brian's code:
# Nodes in the Dictionary.
class DictionaryNode < Array
# Terminal info
attr_reader :words
def initialize
super(10)
@words = []
end
end
This first piece of the puzzle is a node to be used by a tree class we'll meet
shortly. DictionaryNode is an Array that contains exactly 10 members. Why 10?
Because that's how many digits our encoding has. DictionaryNode also contains
an Array of words, that will be filled from the dictionary file.
Here's the start of Brian's tree class:
# A tree-indexed version of the dictionary that allows
# efficent searching by number 2 alphabet mapping.
class Dictionary
def initialize(encoding)
super()
@encoding = {}
@inverse_encoding = {}
encoding.each do | k, v |
@encoding[k] = v.split(/\s+/).map{|c| c[0]}
end
# Create map from characters to numbers
@inverse_encoding = @encoding.inject({}) { | r, (k, v) |
v.each do | l | r[l] = k end
r
}
@root = DictionaryNode.new
end
# ...
That's pretty easy to follow. This setup work creates maps for the encoding of
this Dictionary object. The maps go both ways, numbers to letters and the
inverse letters to numbers. Finally, the root of the tree is created from a new
DictionaryNode.
The following methods add words to the tree from a dictionary file:
# ...
# Helper method for rekursive adding of words to the dictionary
private
def add_recursive(node, word, position)
if word.length == position
node.words << word
return node
end
add_recursive(
node[@inverse_encoding[word[position]]] ||= DictionaryNode.new,
word,
position + 1 )
end
# Add words to the dictionary
public
def add(word)
add_recursive(@root, word, 0)
self
end
# Load a wordlist from a file, which contains one word per line.
# Ignores punctuation and whitespace.
def load_wordlist(file, options)
$stderr.print "Loading dictionary... " if options.verbose
start = Time.new
file.read.gsub(/[^A-Za-z\n]/, '').upcase!.split($/).uniq!.each do |w|
w.chomp!
next if w.empty? or w.length <= options.min_length
self.add(w)
end
if options.verbose
$stderr.puts "built dictionary in %f seconds" %
(Time.new-start).to_f
end
self
end
# ...
The dictionary reading process starts in the last method, load_wordlist(). This
method slurps the file, discards illegal characters, normalizes case, breaks up
the list on line boundaries, and eliminates repetition all on the third line.
Each member of the list of words that creates is then sent on to the add()
method. Note the nice use of options.verbose to show a build time.
Before I move on to add(), let me point out that the above method does have a
few minor issues. When I was playing with this code to figure out how it works,
I fed it a five word dictionary and was surprised when it crashed. The cause?
No duplicates. That causes uniq!() to return nil (a hot topic on Ruby Talk
lately) and since nil doesn't support an each() call, the code blew up.
upcase!() has similar problems.
One more minor issue. Here's a tip: When you normalize case, it's generally
better to go down than up. The reason is international support. Some languages
distinguish between things like uppercase and titlecase. That means that a
bunch of uppercase conversions might not be consistent, based on certain local
settings. The best way to avoid such problems is to lowercase content instead.
This isn't much of a problem, but it's a good habit to build.
Back to the code. add(), as you can see, is just a shell over add_recursive().
It passes the word on with the root node and a starting position of 0.
add_recursive() is pretty clever. It digs down into the tree until finding the
right spot to place the word in the Dictionary. This digging happens at the end
of the method with recursive calls. The current letter in the word is examined
and a branch of the tree is created to handle that encoded letter, if it didn't
already exist. The algorithm then moves to that node, examining the next letter
in line. When all the letters have been branched off, we're at the right place
to insert the word. The if at the beginning of the method handles that end
condition.
The last thing a Dictionary object requires is a way to hunt for words. Here
are those methods:
# ...
private
# Search words and return (in the block) words and the unmatched rest
# of the number
def sub_find(node, number, &block)
# Return words found so far
block[node.words.map{|w|w.dup}, number] unless node.words.empty?
# No more digits, so stop searching here
return node if number.empty?
# Search for longer words
sub_find(node[number[0]], number[1..-1], &block) if node[number[0]]
end
private
# Calculate all allowed skip patterns for a number of a given length
def skips(s, length)
return
result = skips(s + [false], length-1)
result.concat(skips(s + [true], length-1)) unless s[-1]
result
end
public
def find_noskip(number)
result = []
sub_find(@root, number) do | words, rest_number |
if rest_number.empty?
result.concat(words)
else
find_noskip(rest_number).each do | sentence |
words.each do | w |
result << w + '-' + sentence
end
end
end
end
result
end
# Skipping makes this a bit ugly
def find(number)
result = []
skips([], number.length).each do | skipped |
# Create the injector that can inject the skipped numbers
# back into the word
injector = []
skipped.zip(number).each_with_index do |(s,n), i|
injector << [n.to_s, i] if s
end
# We search for words built from the unskipped digits
unskipped_digits =
number.zip(skipped).select{|(d, s)| !s}.map{|(d,s)|d}
sentences = find_noskip(unskipped_digits)
# Inject the skipped digits back into the found sentences
sentences.each do | s |
injector.each do | (n, i) | s.insert(i, n) end
end
result.concat(sentences)
end
result
end
end
Start with the sub_find() method. It's the key to the search and easy enough to
digest. sub_find() takes a node to search, the number to use in that search,
and a block to pass results to. The first line passes all matching words from
this node to the block, if there are any. The root node, where the algorithm
begins, won't have any since most dictionaries don't include 0 length words.
The second line finishes the process, if we've examined all the numbers. The
third line recurses, moving to the node for the next digit at the head of the
number variable. That's half of the picture.
The find_noskip() method is the public face for that. It calls sub_find(),
passing a block of code that fills the local results Array as matches are found.
When a word matches in the number, find_noskip() recurses looking other words to
finish off the number. Of course, as the name implies, this version of the
process does not skip digits.
For skipping, you need the find() method. find() first calls skip() to
calculate all possible skip patterns for this number. Then, one skip at a time,
find() removes the skipped digits and calls find_noskip() on the remainder.
After results are generated, the skips are reinserted back into their original
locations. That's pretty tricky.
To be clear, this does not function as I intended the quiz to work (and I now
understand why Brian thinks I allowed "bad results"). Numbers were only to be
allowed at word boundaries, while Brian's algorithm will reinsert them into the
middle of words. Looking back, I did not make this very clear in the quiz and
it's certainly my error. Brian's code is still a very nice implementation of
his interpretation of the rules.
Finally, there's an interface that puts all that code to use. Let's look at
that:
encodings = {
:james => {
2 => 'A B C',
3 => 'D E F',
4 => 'G H I',
5 => 'J K L',
6 => 'M N O',
7 => 'P Q R S',
8 => 'T U V',
9 => 'W X Y Z'},
:logic => {
0 => 'A B',
1 => 'C D',
2 => 'E F',
3 => 'G H',
4 => 'I J K',
5 => 'L M N',
6 => 'O P Q',
7 => 'R S T',
8 => 'U V W',
9 => 'X Y Z'
}
}
require 'optparse'
class PhonewordOptions < OptionParser
attr_reader :dictionary, :encoding, :format, :allow_skips, :help,
:encoding_help, :verbose, :min_length
def initialize
super()
@dictionary = '/usr/share/dict/words'
@encoding = :james
@format = lain
@allow_skips = true
@HELP = false
@encoding_help = false
@verbose = false
@ignore_non_alpha = false
@min_length = 1
self.on("-d", "--dictionary DICTIONARY", String) { | v |
@dictionary = v
}
self.on("-e", "--encoding ENCODING", String,
"How the alphabet is encoded to phonenumbers. " +
"james or logic are supported.") { | v |
@encoding = v.downcase.to_sym
}
self.on("-p", "--plain",
'One result per found number, ' +
'no other information. (Default)') { @format = lain }
self.on("-f", "--full", 'Prefix the result with the number') {
@format = :full
}
self.on("-v", "--verbose", 'Make more noise') { @verbose = true }
self.on("-s", "--skips", "--allow_skips", "--allow-skips",
'Allow to skip one adjacent number while matching. (Default)',
'Gives lots of ugly results, but james asked for it.') {
@allow_skips = true
}
self.on("-c", "--no-skips",
"Don't leave numbers in the detected words") {
@allow_skips = false
}
self.on("-m" "--min-length", "Minimum length of accepted words.",
"Use this to ignore one-letter words that make " +
"the output quite uninteresting.", Integer) { | v |
@min_length = v
}
self.on("-?", "--help") { @HELP = true }
self.on("--supported-encodings", "--encoding-help",
"List the supported encodings") { @encoding_help = true }
end
end
options = PhonewordOptions.new
options.parse!(ARGV)
if options.help
puts options
exit
end
if options.encoding_help or !encodings[options.encoding]
puts "Possible encodings:"
puts encodings.to_a.sort_by{|(k,v)|k.to_s}.map{|(k,v)|
"#{k}:\n"+v.map{|(n,e)|" #{n}: #{e}"}.sort.join("\n")
}
exit
end
dictionary = Dictionary.new(encodings[options.encoding]).load_wordlist(
File.open(options.dictionary), options
)
output = {
lain => lambda do | number, sentence | sentence end,
:full => lambda do | number, sentence |
"#{number.ljust(15)}: #{sentence}"
end
}
method = {true => :find, false => :find_noskip }
ARGF.each do | number |
number.strip!
number = number.gsub(/[^0-9]/, '').unpack('C*').map{|n|n - ?0}
$stderr.puts "Searching for #{number}" if options.verbose
dictionary.send(
method[options.allow_skips], number
).each do | sentence |
puts output[options.format][number, sentence]
end
end
Most of that code is option handling. Brian creates his own PhonewordOptions
object, which inherits from OptionParser. In the setup for that object,
defaults are established and option parsing in defined with several calls to
on(). From there, reader methods are provided for all the defined options.
This makes for a pretty self-contained bundle of option parsing and reading.
You can see the options object put to good use, after the class.
That final block is what actually kicks off the program. Each number is read
from ARGF, cleaned up, and passed to the find methods of the dictionary object.
Results from that find are printed, creating a complete solution.
My thanks go out to all three quiz workers. It was nice to have a few people
playing along again.
Tomorrows quiz will stay with the topic of phones and how we use them...