R
Ruby Quiz
In an interesting contrast, this quiz generated a lot of good discussion, but
only two solutions. I believe that may be because the problem turned out to be
more complicated than I intended. I know I personally ran into a few
complications and didn't have a chance to finish my own solution.
Again, the discussion was excellent and you should probably skim the thread, if
you weren't following it this weekend. Multiple Tic-Tac-Toe servers (written in
Ruby, of course) were posted, so programs could play against each other
remotely.
I posted a message about how Tic-Tac-Toe positions can be transformed by
rotation and "mirroring" to other seemly different layouts that can be handled
the same.
I also posted a "tictactoe.rb" library that makes most interaction with the game
trivially simple.
Finally, a few of us posted some notes about the problems we ran into. I agree
with Hans Fugal, who said that you can learn almost as much from those.
The two solutions posted are similar. Basically, they learn to avoid their
mistakes over time. They accomplish this by "scoring" the moves they made at
each position in a game, based on whether they won or lost. Eventually, this
knowledge allows them to select mainly strong moves, simply by remembering how
they've done in the past, in the same position.
I'll show Brian Schroeder's code here, but both were interesting to examine.
Brian's solution contained eight files of Ruby code, the embedded documentation
for said files, charts, a write-up of the process, and was all wrapped up in a
handy web page, so you can dig as deep as you like into what he's done. (And he
once asked where *I* find the time!) For the purposes of this summary, I'll
stick to his learning code.
Here it is:
class Learning < BasicInterface
attr_accessor :random_prob
attr_reader layer
def initialize
@state_values = Hash.new(0)
@state_transitions = {}
@random_prob = 0.05
end
def new_game(player)
@player = player
@states_visited = []
end
def choose_move(game)
moves = game.moves
if !@state_transitions[game.state_id] or rand < random_prob
move = moves[rand(moves.length)]
else
move_id = @state_transitions[game.state_id].max{ |(ma,sa),(mb,sb)|
@state_values[sa] <=> @state_values[sb]
}[0]
move = moves.select{|m| m.move_id == move_id}[0]
end
move
end
def inform_of_move(before, after, move)
@states_visited << before.state_id << after.state_id
(@state_transitions[before.state_id] ||= {})[move.move_id] =
after.state_id
if after.final?
winner = after.winner
if winner
value = winner == self.player ? 100.0 : -1000.0
else
value = 0.0
end
factor = 1.0
while state = @states_visited.pop
@state_values[state] = (1.0 - factor) * @state_values[state] +
factor * value
factor *= 0.5
end
end
end
end
The initialize() method sets up Brian's @state_values and @state_transitions,
which constitute the AI's brain.
@state_values will hold scores for the positions the AI has won or lost with
before. @state_transitions holds a "map" of how to get from position to
position. When these are filled in, the AI will have "learned" what positions
are desirable and how it can reach them.
Knowing this, choose_move() is easy to breakdown. It checks to see if it knows
anything about the moves from the current position. If it does, it selects the
highest score it can find for itself (else branch). If it doesn't, it goes with
a random choice from all available moves (if branch).
The final piece of the puzzle is inform_of_move(). This method remembers all
moves made in the game, mainly. When it sees a final position, it scores all
those moves based on whether it won or lost. The scoring scale is slanted, to
encourage the AI to avoid losses.
That's the heart of Brian's solution. The rest of the code is interface,
client/server, a perfect minimax player, and Tic-Tac-Toe details.
For the curious, this quiz was inspired by the research of Donald Michie. In
1961 he built a "machine" that learned to play perfect Tic-Tac-Toe against
humans, using matchboxes and beads. He called the machine MENACE (Matchbox
Educable Naughts And Crosses Engine).
304 matchboxes where labeled with images of Tic-Tac-Toe positions and filled
with colored beads representing possible moves. At each move, a bead would be
rattled out of the proper box to determine a move. When MENACE would win, more
beads of the colors played would be added to each position box. When it would
lose, the beads were left out to discourage these moves.
Michie claimed that he trained MENACE in 220 games, being beaten by it eight out
of the final ten games.
Thanks to those who tried this one, successful or not. Sorry it turned out to
be a bit involved.
Tomorrow, "How Ruby Can Help You Beat Your Grandmother In Scrabble" is the
topic. If your grandmother is anything like mine, you'll appreciate all the
help you can get!
only two solutions. I believe that may be because the problem turned out to be
more complicated than I intended. I know I personally ran into a few
complications and didn't have a chance to finish my own solution.
Again, the discussion was excellent and you should probably skim the thread, if
you weren't following it this weekend. Multiple Tic-Tac-Toe servers (written in
Ruby, of course) were posted, so programs could play against each other
remotely.
I posted a message about how Tic-Tac-Toe positions can be transformed by
rotation and "mirroring" to other seemly different layouts that can be handled
the same.
I also posted a "tictactoe.rb" library that makes most interaction with the game
trivially simple.
Finally, a few of us posted some notes about the problems we ran into. I agree
with Hans Fugal, who said that you can learn almost as much from those.
The two solutions posted are similar. Basically, they learn to avoid their
mistakes over time. They accomplish this by "scoring" the moves they made at
each position in a game, based on whether they won or lost. Eventually, this
knowledge allows them to select mainly strong moves, simply by remembering how
they've done in the past, in the same position.
I'll show Brian Schroeder's code here, but both were interesting to examine.
Brian's solution contained eight files of Ruby code, the embedded documentation
for said files, charts, a write-up of the process, and was all wrapped up in a
handy web page, so you can dig as deep as you like into what he's done. (And he
once asked where *I* find the time!) For the purposes of this summary, I'll
stick to his learning code.
Here it is:
class Learning < BasicInterface
attr_accessor :random_prob
attr_reader layer
def initialize
@state_values = Hash.new(0)
@state_transitions = {}
@random_prob = 0.05
end
def new_game(player)
@player = player
@states_visited = []
end
def choose_move(game)
moves = game.moves
if !@state_transitions[game.state_id] or rand < random_prob
move = moves[rand(moves.length)]
else
move_id = @state_transitions[game.state_id].max{ |(ma,sa),(mb,sb)|
@state_values[sa] <=> @state_values[sb]
}[0]
move = moves.select{|m| m.move_id == move_id}[0]
end
move
end
def inform_of_move(before, after, move)
@states_visited << before.state_id << after.state_id
(@state_transitions[before.state_id] ||= {})[move.move_id] =
after.state_id
if after.final?
winner = after.winner
if winner
value = winner == self.player ? 100.0 : -1000.0
else
value = 0.0
end
factor = 1.0
while state = @states_visited.pop
@state_values[state] = (1.0 - factor) * @state_values[state] +
factor * value
factor *= 0.5
end
end
end
end
The initialize() method sets up Brian's @state_values and @state_transitions,
which constitute the AI's brain.
@state_values will hold scores for the positions the AI has won or lost with
before. @state_transitions holds a "map" of how to get from position to
position. When these are filled in, the AI will have "learned" what positions
are desirable and how it can reach them.
Knowing this, choose_move() is easy to breakdown. It checks to see if it knows
anything about the moves from the current position. If it does, it selects the
highest score it can find for itself (else branch). If it doesn't, it goes with
a random choice from all available moves (if branch).
The final piece of the puzzle is inform_of_move(). This method remembers all
moves made in the game, mainly. When it sees a final position, it scores all
those moves based on whether it won or lost. The scoring scale is slanted, to
encourage the AI to avoid losses.
That's the heart of Brian's solution. The rest of the code is interface,
client/server, a perfect minimax player, and Tic-Tac-Toe details.
For the curious, this quiz was inspired by the research of Donald Michie. In
1961 he built a "machine" that learned to play perfect Tic-Tac-Toe against
humans, using matchboxes and beads. He called the machine MENACE (Matchbox
Educable Naughts And Crosses Engine).
304 matchboxes where labeled with images of Tic-Tac-Toe positions and filled
with colored beads representing possible moves. At each move, a bead would be
rattled out of the proper box to determine a move. When MENACE would win, more
beads of the colors played would be added to each position box. When it would
lose, the beads were left out to discourage these moves.
Michie claimed that he trained MENACE in 220 games, being beaten by it eight out
of the final ten games.
Thanks to those who tried this one, successful or not. Sorry it turned out to
be a bit involved.
Tomorrow, "How Ruby Can Help You Beat Your Grandmother In Scrabble" is the
topic. If your grandmother is anything like mine, you'll appreciate all the
help you can get!