This week's Ruby Quiz is to implement an AI for playing Tic-Tac-Toe, with a
catch: You're not allowed to embed any knowledge of the game into your
creation beyond how to make legal moves and recognizing that it has won or
lost.
Your program is expected to "learn" from the games it plays, until it masters
the game and can play flawlessly.
Submissions can have any interface, but should be able to play against humans
interactively. However, I also suggest making it easy to play against
another AI, so you can "teach" the program faster.
Being able to monitor the learning progression and know when a program has
mastered the game would be very interesting, if you can manage it.
Here comes my solution. As always I've set up a website where you can browse
everything online, that will be the most comfortable way.
http://ruby.brian-schroeder/quiz/tictactoe/
I implemented minimax with alpa-beta search as a standard. It is impossible to
play better. And here we see, that tic-tac-toe is quite uninteresting, in that
it is impossible to win against a perfect player. The best one can achieve is a
draw.
As discussed elsewhere, minimax is not a valid solution. So I wrote a ai that
gets to know only the id of the state the world is in, the id's of all allowed
moves from here, and the state and move id's that the opponent choose. Also it
is allowed to know, when a it has reached a final state, and if it has won,
drawn or lost.
Using this information the ai learns about the possible moves and states, and
assigns each state with a value. It then greedily goes always for the state
with the highest value.
The amount of exploration vs. exploitation that the AI makes can be adjusted,
and is set higher for training than for the actual usage. Note that you can
train the AI to make errors, by repeatedly giving away a chance. This will
train the AI to always run into danger that you exploit this chance one day.
The values of all states encountered in one game are updated according to the
outcome of the game. The later states are updated more strongly than earlier
states.
I will attach only the learning algorithm. The rest of the framework
- tictactoe-interface
- tictactoe-server
- tictactoe-client
- general minimax-alphabeta implementation
- the board
- the moves
can be browsed online if someone is interested.
class Learning < BasicInterface
attr_accessor :random_prob
attr_reader
layer
def initialize
@state_values = Hash.new(0)
@state_transitions = {}
@random_prob = 0.05
end
def new_game(player)
@player = player
@states_visited = []
end
def choose_move(game)
moves = game.moves
if !@state_transitions[game.state_id] or rand < random_prob # Pick a
random move
move = moves[rand(moves.length)]
else # Pick the best move
move_id = @state_transitions[game.state_id].max{ | (ma, sa), (mb, sb) |
@state_values[sa] <=> @state_values[sb] }[0]
move = moves.select{|m| m.move_id == move_id}[0]
end
move
end
def inform_of_move(before, after, move)
@states_visited << before.state_id << after.state_id
(@state_transitions[before.state_id] ||= {})[move.move_id] =
after.state_id
if after.final?
winner = after.winner
if winner
value = winner == self.player ? 100.0 : -1000.0
else
value = 0.0
end
factor = 1.0
while state = @states_visited.pop
@state_values[state] = (1.0 - factor) * @state_values[state] + factor
* value
factor *= 0.5
end
end
end
end
Best regards,
Brian