[QUIZ] Statistician I (#167)

M

Matthew Moss

[Note: parts of this message were removed to make it a legal post.]

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz 2:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have passed from the time on this message.

2. Support Ruby Quiz 2 by submitting ideas as often as you can! (A
permanent, new website is in the works for Ruby Quiz 2. Until then,
please visit the temporary website at

<http://splatbang.com/rubyquiz/>.
3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
## Statistician I (#167)

This week begins a three-part quiz, the final goal to provide a little
library for parsing and analyzing line-based data. Hopefully, each portion
of the larger problem is interesting enough on its own, without being too
difficult to attempt. The first part -- this week's quiz -- will focus on
the pattern matching.

Let's look at a bit of example input:

You wound Perl for 15 points of Readability damage.
You wound Perl with Metaprogramming for 23 points of Usability damage.
Your mighty blow defeated Perl.
C++ walks into the arena.
C++ wounds you with Compiled Code for 37 points of Speed damage.
You wound C++ for 52 points of Usability damage.

Okay, it's silly, but it is similar to a much larger data file I'll provide
end for testing.

You should definitely note the repetitiveness: just the sort of thing that
we can automate. In fact, I've examined the input above and created three
rules (a.k.a. patterns) that match (most of) the data:

[The ]<name> wounds you[ with <attack>] for <amount> point of <kind>[
damage].
You wound[ the] <name>[ with <attack>] for <amount> point of <kind>[
damage].
Your mighty blow defeated[ the] <name>.

There are a few guidelines about these rules:

1. Text contained within square brackets is optional.
2. A word contained in angle brackets represents a field; not a literal
match, but data to be remembered.
3. Fields are valid within optional portions.
4. You may assume that both the rules and the input lines are stripped of
excess whitespace on both ends.

Assuming the rules are in `rules.txt` and the input is in `data.txt`,
running your Ruby script as such:
ruby reporter.rb rules.txt data.txt

Should generate the following output:

Rule 1: Perl, 15, Readability
Rule 1: Perl, Metaprogramming, 23, Usability
Rule 2: Perl
# No Match
Rule 0: C++, Compiled Code, 37, Speed
Rule 1: C++, 52, Usability

Unmatched input:
C++ walks into the arena.

Each line of the output corresponds to a line of the input; it indicates
which rule was matched (zero-based index), and outputs the matched fields'
values. Any lines of the input that could not be matched to one of the rules
should output an "No Match" comment, with all the unmatched input records
printed in the "Unmatched input" section at the end (so the author of the
rules can extend them appropriately).

One thing you should keep in mind while working on this week's quiz is that
you want to be flexible; followup quizzes will require that you modify
things a bit.

For testing, I am providing two larger datasets: combat logs taken from Lord
of the Rings Online gameplay. There is data for a [Guardian][1] and a
[Hunter][2]; unzip before use. Both use the same ruleset:

[The ]<name> wounds you[ with <attack>] for <amount> point of <kind>[
damage].
You are wounded for <amount> point of <kind> damage.
You wound[ the] <name>[ with <attack>] for <amount> point of <kind>[
damage].
You reflect <amount> point of <kind> damage to[ the] <name>.
You succumb to your wounds.
Your mighty blow defeated[ the] <name>.



[1]: http://www.splatbang.com/rubyquiz/files/guardian.zip
[2]: http://www.splatbang.com/rubyquiz/files/hunter.zip
 
M

Matthew Moss

Here's my own submission for this problem. Once you wrap your head
around a few bits of the regular expression, it's pretty simple to
understand.



class Rule
attr_reader :fields

def initialize(str)
patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
@pattern = Regexp.new('^' + patt + '$')
@fields = nil
end

def match(str)
if md = @pattern.match(str)
@fields = md.captures
else
@fields = nil
end
end
end


rules = []
File.open(ARGV[0]).each do |line|
line.strip!
next if line.empty?
rules << Rule.new(line)
end


unknown = []
File.open(ARGV[1]).each do |line|
line.strip!
if line.empty?
puts
next
end

if rule = rules.find { |rule| rule.match(line) }
indx, data = rules.index(rule), rule.fields.reject { |f| f.nil? }
puts "Rule #{indx}: #{data.join(', ')}"
else
unknown << line
puts "# No match"
end
end


puts "\nUnmatched input:"
puts unknown.join("\n")
 
M

Matthew Rudy Jacobs

Matthew said:
def initialize(str)
patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
@pattern = Regexp.new('^' + patt + '$')
@fields = nil
end

does the rule string not need to be regexp escaped somehow if it's
gonna be directly Regexp.new'ed?

I fear a rule with something like "You run away[ from <name>] (you
coward)" would break this approach.

Matthew Rudy
 
K

krusty.ar

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
## Statistician I (#167)

My first quiz, it's very rough but it works most of the time.

I'm probably (re)implementing a very limited form of regular
expression, but in the process of making this I discovered several
ways it could fail, in the test cases, it's just the case noted in the
comments.

Here is the code: http://pastie.org/224463
And the rules that catch most of the samples: http://pastie.org/224464

Lucas.
 
M

Matthew Moss

Matthew said:
=A0 def initialize(str)
=A0 =A0 patt =3D str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?= )')
=A0 =A0 @pattern =3D Regexp.new('^' + patt + '$')
=A0 =A0 @fields =3D nil
=A0 end

does the rule string not need to be regexp escaped =A0somehow if it's
gonna be directly Regexp.new'ed?

I fear a rule with something like "You run away[ from <name>] (you
coward)" would break this approach.


Perhaps... My solution is likely not safe from all input sets. While I
hadn't considered literal parentheses as part of the rule set, I
should have at the least considered the period (match any char).

For the current purposes, it is sufficient if your solution supports
the provided example ruleset, though any additional work towards
escaping parts/preventing breakage is certainly acceptable.
 
M

Matthias Reitinger

Here is my submission. I hope it's flexible enough for the followup
quizzes. I sensed there might be a need to access the fields of a match
by name, which is why I added the RuleMatch#fields method. It returns a
hash that allows code like

puts Rule.match(line).fields['amount'] # prints the value of the
<amount> field

This method isn't used in the current code however. But who knows, it
might come in handy later on.

You can find my submission at http://www.pastie.org/224480

- Matthias
 
J

Jesús Gabriel y Galán

## Statistician I (#167)

This week begins a three-part quiz, the final goal to provide a little
library for parsing and analyzing line-based data. Hopefully, each portion
of the larger problem is interesting enough on its own, without being too
difficult to attempt. The first part -- this week's quiz -- will focus on
the pattern matching.

Let's look at a bit of example input:

You wound Perl for 15 points of Readability damage.
You wound Perl with Metaprogramming for 23 points of Usability damage.
Your mighty blow defeated Perl.
C++ walks into the arena.
C++ wounds you with Compiled Code for 37 points of Speed damage.
You wound C++ for 52 points of Usability damage.

Okay, it's silly, but it is similar to a much larger data file I'll provide
end for testing.

You should definitely note the repetitiveness: just the sort of thing that
we can automate. In fact, I've examined the input above and created three
rules (a.k.a. patterns) that match (most of) the data:

[The ]<name> wounds you[ with <attack>] for <amount> point of <kind>[
damage].
You wound[ the] <name>[ with <attack>] for <amount> point of <kind>[
damage].
Your mighty blow defeated[ the] <name>.

There are a few guidelines about these rules:

1. Text contained within square brackets is optional.
2. A word contained in angle brackets represents a field; not a literal
match, but data to be remembered.
3. Fields are valid within optional portions.
4. You may assume that both the rules and the input lines are stripped of
excess whitespace on both ends.

Assuming the rules are in `rules.txt` and the input is in `data.txt`,
running your Ruby script as such:
ruby reporter.rb rules.txt data.txt

Should generate the following output:

Rule 1: Perl, 15, Readability
Rule 1: Perl, Metaprogramming, 23, Usability
Rule 2: Perl
# No Match
Rule 0: C++, Compiled Code, 37, Speed
Rule 1: C++, 52, Usability

Unmatched input:
C++ walks into the arena.


Hi,

This is my try at this quiz. I thought it would be cool to store the
field "names" too, for each match.
I also added a verbose output to show the field name and the value. As
the goal was to be flexible too,
I made some classes to encapsulate everything, to prepare for the future:

class Match
attr_accessor :captures, :mappings, :rule

def initialize captures, mappings, rule
@captures = captures
@mappings = mappings
@rule = rule
end

def to_s verbose=false
s = "Rule #{@rule.id}: "
if verbose
@rule.names.each_with_index {|n,i| s << "[#{n} => #{@mappings[n]}]"
if @captures}
s
else
s + "#{@captures.compact.join(",")}"
end
end
end

class Rule
attr_accessor :names, :id

# Translate rules to regexps, specifying if the first captured group
# has to be remembered
RULE_MAPPINGS = {
"[" => ["(?:", false],
"]" => [")?", false],
/<(.*?)>/ => ["(.*?)", true],
}
def initialize id, line
@id = id
@names = []
escaped = escape(line)
reg = RULE_MAPPINGS.inject(escaped) do |line, (tag, value)|
replace, remember = *value
line.gsub(tag) do |m|
@names << $1 if remember
replace
end
end
@reg = Regexp.new(reg)
end

def escape line
# From the mappings, change the regexp sensitive chars with non-sensitive ones
# so that we can Regexp.escape the line, then sub them back
escaped = line.gsub("[", "____").gsub("]", "_____")
escaped = Regexp.escape(escaped)
escaped.gsub("_____", "]").gsub("____", "[")
end

def match data
m = @reg.match data
return nil unless m
map = Hash[*@names.zip(m.captures).flatten]
Match.new m.captures, map, self
end
end

class RuleSet
def initialize file
@rules = []
File.open(file) do |f|
f.each_with_index {|line, i| @rules << Rule.new(i, line.chomp)}
end
p @rules
end

def apply data
match = nil
@rules.find {|r| match = r.match data}
match
end
end

rules_file = ARGV[0] || "rules.txt"
data_file = ARGV[1] || "data.txt"

rule_set = RuleSet.new rules_file

matches = nil
unmatched = []
File.open(data_file) do |f|
matches = f.map do |line|
m = rule_set.apply line.chomp
unmatched << line unless m
m
end
end

matches.each do |m|
if m
puts m
else
puts "#No match"
end
end

unless unmatched.empty?
puts "Unmatched input: "
puts unmatched
end

#~ puts "Verbose output:"
#~ matches.each do |m|
#~ if m
#~ puts (m.to_s(true))
#~ else
#~ puts "#No match"
#~ end
#~ end
 
J

Jesús Gabriel y Galán

Matthew said:
def initialize(str)
patt = str.gsub(/\[(.+?)\]/, '(?:\1)?').gsub(/<(.+?)>/, '(.+?)')
@pattern = Regexp.new('^' + patt + '$')
@fields = nil
end

does the rule string not need to be regexp escaped somehow if it's
gonna be directly Regexp.new'ed?

I fear a rule with something like "You run away[ from <name>] (you
coward)" would break this approach.


Perhaps... My solution is likely not safe from all input sets. While I
hadn't considered literal parentheses as part of the rule set, I
should have at the least considered the period (match any char).

For the current purposes, it is sufficient if your solution supports
the provided example ruleset, though any additional work towards
escaping parts/preventing breakage is certainly acceptable.

I had to escape the string in order to make my solution work due to
the final dot...

Jesus.
 
S

Sandro Paganotti

[Note: parts of this message were removed to make it a legal post.]

Here's mine solution: (http://pastie.org/226949)

class Parse
def initialize(rules)
@rules = create_rules(rules)
end

# Read the rules and transform then into regexp
def create_rules(rules)
rules.collect do |r|
vars =[];
r=Regexp.escape(r.chomp).gsub("\\\[","[").gsub("\\\]","]").gsub
/\[([^\]]+)\]/, '(?:\1)?';
r.gsub!(/<([^>]+)>/) do vars<<$1; '(.*?)' end
[Regexp.new(r),vars]
end
end

# Parse the given file upon the rules created
def parse(data)
@match =[]; @exceptions=[]; data.each do |l|
mdata=nil; @rules.each_with_index{|(r,d),i| break if !((mdata =
[i,r.match(l)]) == [i,nil]) }
if !mdata[1].nil?
@match << ["Rule #{mdata[0]+1}:",*mdata[1].to_a[1..-1]]
else
@match << ["# No Match"]; @exceptions << l
end
end; self
end

#Print results
def to_s
"#{@match.collect{|m| m.join(" ")}.join("\n")}" +
(@exceptions.empty? ? "" : "\n\nUnmatched
input:\n#{@exceptions.join("")}")
end
end

# Example of usage
puts "#{Parse.new(File.read("rules.txt")).parse(File.read("guardian.txt"))}"



On Tue, Jul 1, 2008 at 10:47 PM, (e-mail address removed) <
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,692
Latest member
JenniferTi

Latest Threads

Top