[QUIZ] Shirt Reader (#140)

R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Sun Microsystems handed out some fun shirts at the LSRC. The graphic on the
front looks something like:

sun image

star image + T + up arrow

E + perfume bottle image + sea shells image

This is a clever advertisement for their "Sun Startup Essentials" program.

This week's Ruby Quiz is to write a program that can read such shirts. At the
basic level, we will assume the user will feed us the right words:

$ ruby read_shirt.rb e scent shells
essentially

Hopefully your solution will be smarter than mine, which, as you can see, only
gets close. It may be better to give a few possible choices instead of just the
best match.

Of course, the problem comes when you guess the wrong word. My first thought
was of a perfume bottle, as my description shows, but the correct word was
scent. For bonus points, see what you can do about this problem. One idea
might be to allow alternatives for a segment of the word. Another might be to
hit the thesaurus, which does list scent as a synonym for perfume.
 
E

Eugene Kalenkovich

Here is my solution. Obviously it is not as "well educated" as Steve's one
because it does not use pronunciation dictionary, but it behaves pretty well
with what it has :)

I've tried different hashing strategies, one of them to use soundex, but
that one had one less match, though it was able to understand 're late shun
ship' - metaphone believes that relationship is pronounced differently (and
I cannot blame it :)

Soundex has a big disadvantage of requiring correct first letter, that's why
I think the metaphone-based solution is closer to what we want.

BTW, I was surprised that text gem includes only symmetrical Levenshtein
distance, asymmetrical one (with different costs of operations) could
probably give better results


require 'rubygems'
require 'text'

include Text::Metaphone
include Text::Levenshtein


expectations = {
%w[e scent shells] => 'essentials',
%w[q all if i] => 'qualify',
%w[fan task tick] => 'fantastic',
%w[b you tea full] => 'beautiful',
%w[fun duh mint all] => 'fundamental',
%w[s cape] => 'escape',
%w[pan z] => 'pansy',
%w[n gauge] => 'engage',
%w[cap tin] => 'captain',
%w[g rate full] => 'grateful',
%w[re late shun ship] => 'relationship',
%w[con grad yeul 8] => 'congratulate',
%w[2 burr q low sis] => 'tuberculosis',
%w[my crows cope] => 'microscope',
%w[add minus ray shun] => 'administration',
%w[accent you ate it] => 'accentuated',
%w[add van sing] => 'advancing',
%w[car knee for us] => 'carnivorous',
%w[soup or seed] => 'supercede',
%w[poor 2 bell o] => 'portobello',
%w[d pen dance] => 'dependence',
%w[s o tear rick] => 'esoteric',
%w[4 2 it us] => 'fortuitous',
%w[4 2 n 8] => 'fortunate',
%w[4 in R] => 'foreigner',
%w[naan disk clothes your] => 'nondisclosure',
%w[Granmda Atika Lee] => 'grammatically',
%w[a brie vie a shun] => 'abbreviation',
%w[pheemeeneeneetee] => 'femininity',
%w[me c c p] => 'mississippi',
%w[art fork] => 'aardvark',
%w[liberty giblet] => 'flibbertigibbet',
%w[zoo key knee] => 'zucchini',
%w[you'll tight] => 'yuletide',
%w[Luke I like] => 'lookalike',
%w[mah deux mah zeal] => 'mademoiselle',
%w[may gel omen yak] => 'megalomaniac',
%w[half tell mall eau gist] => 'ophthalmologist',
%w[whore tea cull your wrist] => 'horticulturist',
%w[pant oh my m] => 'pantomime',
%w[tear a ball] => 'terrible',
%w[a bowl i shun] => 'abolition',
%w[pre chair] => 'preacher',
%w[10 s] => 'tennis',
%w[e z] => 'easy',
%w[1 door full] => 'wonderful',
%w[a door] => 'adore',
%w[hole e] => 'holy',
%w[grand your] => 'grandeur',
%w[4 2 5] => 'fortify',
%w[age, it ate her] => 'agitator',
%w[tear it or eel] => 'territorial',
%w[s 1] => 'swan'
}

subs={'1'=>'won','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
'h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'e'}
%w[f l m n s x].each{|l| subs[l]='e'+l}

def metadist(str1,str2)
2*distance(metaphone(str1),metaphone(str2))+distance(str1,str2)
end

hash=Hash.new{|h,k|h[k]=[]}

File.open("/usr/share/dict/words") {|f| f.readlines}.each do |w|
word=w.downcase.delete("^a-z")
m1,m2=double_metaphone(word)
hash[m1]<<word
hash[m2]<<word if m2
end
#make sure that expectations are in the word list
expectations.values.each { |word|
m1,m2=double_metaphone(word)
hash[m1]<<word
hash[m2]<<word if m2
}

inputs=[]
if (ARGV.empty?)
inputs=expectations.keys
else
inputs << ARGV
end
inputs.each { |rebus|
y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
m1,m2=double_metaphone(word)
results=hash[m1]
results+=hash[m2] if m2
res=results.uniq.sort{|a,b|
(metadist(word,a) <=> metadist(word,b)).nonzero? || a.length<=>b.length
}[0] || 'no clue'
print "'#{rebus.join(' ')}' => #{res}"
expected=expectations[rebus]
print ", expected: #{expected}" if expected
puts
}
 
E

Eugene Kalenkovich

Eugene Kalenkovich said:
Here is my solution. Obviously it is not as "well educated" as Steve's one
because it does not use pronunciation dictionary, but it behaves pretty
well with what it has :)

One more limit of a good education - sometimes it is limited to an education
source. Unfortunately I did not have enough patience to test it on all new
expectations, even single example requires too much time (without ruby
inline), but I know that it will not get at least some part of a new set:

# expectations.rb
$expectations = {
%w[pro 1 sal] => 'provencal',
%w[2 thing] => 'toothing',
%w[r b trash] => 'arbitrage',
%w[mon 3 l] => 'montreal',
%w[3 men dose] => 'tremendous',
%w[mid wind r] => 'midwinter',
%w[yes tier knight] => 'yesternight',
%w[mar well s]=>'marvelous',
%w[vert x]=>'vertex',
%w[ban l eyes] => 'banalize',
%w[harm n eyes] => 'harmonize',
%w[harm o niece east] => 'harmonicist',
%w[knight in gale] => 'nightingale',
%w[knee hill east] => 'nihilist',
%w[mass car pone a] => 'mascarpone',
%w[**** knee] => 'cockney',
}
################################################

And two more variations for my solution.
First - compromise with coarsened hash, balancing decent matching with
decent performance. This one is probably the best I coud get.

Second one - full scan. Allows more concise code and slightly better
matching in cost of performance (but still several time faster than Steve's
without inline).

# Variant 1
require 'rubygems'
require 'text'
include Text::Metaphone
include Text::Levenshtein
load 'expectations.rb'

subs={'1'=>'wan','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
'c'=>'see','h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'ee'}
%w[f l m n s x].each{|l| subs[l]='e'+l}

def metadist(str1,str2)
2*distance(metaphone(str1),metaphone(str2))+
distance(str1,str2)
end

def short_double_metaphone(word)
m1,m2=double_metaphone(word)
[m1[0,2],m2 ? m2[0,2] : nil]
end

hash=Hash.new{|h,k|h[k]=[]}

File.open("/usr/share/dict/words") {|f| f.readlines}.each do |w|
word=w.downcase.delete("^a-z")
m1,m2=short_double_metaphone(word)
hash[m1]<<word
hash[m2]<<word if m2
end
$expectations.values.each { |word|
m1,m2=short_double_metaphone(word)
hash[m1]<<word
hash[m2]<<word if m2
}

hash.each_key{|k| hash[k].uniq!}

inputs=[]
if (ARGV.empty?)
inputs=$expectations.keys
else
inputs << ARGV
end

inputs.each { |rebus|
y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
m1,m2=short_double_metaphone(word)
results=hash[m1]
results+=hash[m2] if m2 && m2!=m1
res=results.uniq.sort_by{|a| [metadist(word,a),a.length]}.first(5)
print "'#{rebus.join(' ')}' => #{res[0]}"
expected=$expectations[rebus]
print ", expected '#{expected}' is at position #{res.index(expected)}" if
expected
puts
}
################################################


# Variant 2
require 'rubygems'
require 'text'
include Text::Metaphone
include Text::Levenshtein
load 'expectations.rb'

subs={'1'=>'won','2'=>'to','3'=>'tre','4'=>'for','5'=>'five','6'=>'six','7'=>'seven','8'=>'ate','9'=>'nine','10'=>'ten',
'c'=>'see','h'=>'eich','j'=>'jey','k'=>'key','q'=>'que','r'=>'ar'}
subsy={}
%w[b c d g p t v z].each {|l| subsy[l]=l+'y'}
%w[b c d g p t v z].each {|l| subs[l]=l+'ee'}
%w[f l m n s x].each{|l| subs[l]='e'+l}

def metadist(str1,str2)
2*distance(metaphone(str1),metaphone(str2))+
distance(str1,str2)
end

words = (File.open("/usr/share/dict/words") {|f| f.readlines}.map{|word|
word.downcase.delete("^a-z")}+$expectations.values).uniq

inputs=[]
if (ARGV.empty?)
inputs=$expectations.keys
else
inputs << ARGV
end

inputs.each { |rebus|
y_ed=rebus[0..-2]<<(subsy[rebus[-1]] || rebus[-1])
word=y_ed.map{|w| subs[w] || w }.join.downcase.gsub(/[^a-z0-9]/,'')
res=words.sort_by{ |a| [metadist(word,a),a.length] }.first(5)
print "'#{rebus.join(' ')}' => #{res[0]}"
expected=$expectations[rebus]
print ", expected '#{expected}' is at position #{res.index(expected)}" if
expected
puts
}

################################################
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,131
Members
46,689
Latest member
liammiller

Latest Threads

Top