R
Ruby Quiz
The three rules of Ruby Quiz:
1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.
2. Support Ruby Quiz by submitting ideas as often as you can:
http://www.grayproductions.net/ruby_quiz/
3. Enjoy!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
by Fredrik Jagenheim
At work we discovered that they installed a spam filter that throws away e-mail
that it considers spam. It doesn't use any Bayes Filter, it simply checks for
certain words that it considers 'banned'. One word we discovered was 'sex',
which is a Swedish word for the number six. So the phrase "I'll be home at six
o'clock." will be classified as spam, thrown away and never seen.
The Ruby Quiz I propose is to figure out which words are banned. Since the
filter is a black box, we can only find out which words they are by sending
email through it. The real problem is to find out how to do it with as *few*
emails as possible.
Of course I don't want the ruby community to do a Denial-of-Service on my
employer's mailserver, so do it as a local filter; perhaps something like this:
# by JEG2
#
# A simple class for managing a filter that prevents to use
# of a given _banned_words_ list.
#
class LanguageFilter
#
# Create a new LanguageFilter object that will
# disallow _banned_words_.
# Accepts a list of words, arrays of words,
# or a combination of the two.
#
def initialize( *banned_words )
@banned_words = banned_words.flatten.sort
@clean_calls = 0
end
# A count of the calls to <i>clean?</i>.
attr_reader :clean_calls
#
# Test if provided _text_ is allowable by this filter.
# Returns *false* if _text_ contains _banned_words_,
# *true* if it does not.
#
def clean?( text )
@clean_calls += 1
@banned_words.each do |word|
return false if text =~ /\b#{word}\b/
end
true
end
#
# Verify a _suspect_words_ list against the actual
# _banned_words_ list.
# Returns *false* if the two lists are not identical or
# *true* if the lists do match.
# Accepts a list of words, arrays of words,
# or a combination of the two.
#
def verify( *suspect_words )
suspect_words.flatten.sort == @banned_words
end
end
filter = LanguageFilter.new "six"
filter.clean?("I'll be home at six.") # => false
filter.clean?("Do not taunt the happy fun ball!") # => true
filter.verify("ball") # => false
filter.verify("six") # => true
filter.clean_calls # => 2
So, figure out how to find the hidden words, using as few calls to
LanguageFilter#clean? as possible.
Which algorithms do you find are effective when many words are blocked (like
10%) and which are effective when very few are blocked(1 in 20000)?
All solutions should do better than this:
dict = ["foo", "bar", "six", "baz"]
filter = LanguageFilter.new "six"
p dict - (dict.dup.delete_if { |word| not filter.clean?(word) })
1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.
2. Support Ruby Quiz by submitting ideas as often as you can:
http://www.grayproductions.net/ruby_quiz/
3. Enjoy!
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
by Fredrik Jagenheim
At work we discovered that they installed a spam filter that throws away e-mail
that it considers spam. It doesn't use any Bayes Filter, it simply checks for
certain words that it considers 'banned'. One word we discovered was 'sex',
which is a Swedish word for the number six. So the phrase "I'll be home at six
o'clock." will be classified as spam, thrown away and never seen.
The Ruby Quiz I propose is to figure out which words are banned. Since the
filter is a black box, we can only find out which words they are by sending
email through it. The real problem is to find out how to do it with as *few*
emails as possible.
Of course I don't want the ruby community to do a Denial-of-Service on my
employer's mailserver, so do it as a local filter; perhaps something like this:
# by JEG2
#
# A simple class for managing a filter that prevents to use
# of a given _banned_words_ list.
#
class LanguageFilter
#
# Create a new LanguageFilter object that will
# disallow _banned_words_.
# Accepts a list of words, arrays of words,
# or a combination of the two.
#
def initialize( *banned_words )
@banned_words = banned_words.flatten.sort
@clean_calls = 0
end
# A count of the calls to <i>clean?</i>.
attr_reader :clean_calls
#
# Test if provided _text_ is allowable by this filter.
# Returns *false* if _text_ contains _banned_words_,
# *true* if it does not.
#
def clean?( text )
@clean_calls += 1
@banned_words.each do |word|
return false if text =~ /\b#{word}\b/
end
true
end
#
# Verify a _suspect_words_ list against the actual
# _banned_words_ list.
# Returns *false* if the two lists are not identical or
# *true* if the lists do match.
# Accepts a list of words, arrays of words,
# or a combination of the two.
#
def verify( *suspect_words )
suspect_words.flatten.sort == @banned_words
end
end
filter = LanguageFilter.new "six"
filter.clean?("I'll be home at six.") # => false
filter.clean?("Do not taunt the happy fun ball!") # => true
filter.verify("ball") # => false
filter.verify("six") # => true
filter.clean_calls # => 2
So, figure out how to find the hidden words, using as few calls to
LanguageFilter#clean? as possible.
Which algorithms do you find are effective when many words are blocked (like
10%) and which are effective when very few are blocked(1 in 20000)?
All solutions should do better than this:
dict = ["foo", "bar", "six", "baz"]
filter = LanguageFilter.new "six"
p dict - (dict.dup.delete_if { |word| not filter.clean?(word) })