[QUIZ] Learning Tic-Tac-Toe (#11)

D

Demetrius Nunes

I would like to know if any of you had this problem:

Since today, when I use rails thru Webrick or Apache (not using FCGI or
modruby), the browser shows the contents of the headers and the body all
as plain text. Rails is not separating the headers from the body.

It does not happen when I run the application using the -c option (that
is, to cache classes).

Looking at the source, using -c the request is processed in the same
process as Webrick, as for using without the -c option, the server does
a "popen" to the ruby interpreter and captures the output.

I can't figure out why is this happening... It used to work just fine!
This is really annoying!

Thanks a lot,
Demetrius
 
J

James Edward Gray II

Can I assume, it knows only the followings at the beginning:

1. How to make a move.
2. Board has come to winning position or not.

Sure. Assume away.

James Edward Gray II
 
D

Dick Davies

* Mohammad Khan said:
Can I assume, it knows only the followings at the beginning:

1. How to make a move.
2. Board has come to winning position or not.

When I did this a while ago, I used libneural and the ai learnt what a good move
was from what the human player did in a given position.

The Board itself knows when someone has one, so all the AI does is make moves
and train based on the board state.
 
D

Demetrius Nunes

Demetrius said:
I would like to know if any of you had this problem:

Since today, when I use rails thru Webrick or Apache (not using FCGI
or modruby), the browser shows the contents of the headers and the
body all as plain text. Rails is not separating the headers from the
body.

It does not happen when I run the application using the -c option
(that is, to cache classes).

Looking at the source, using -c the request is processed in the same
process as Webrick, as for using without the -c option, the server
does a "popen" to the ruby interpreter and captures the output.

I can't figure out why is this happening... It used to work just fine!
This is really annoying!

Thanks a lot,
Demetrius
Dont bother answering this, I reinstalled ruby and rails and it went
back to normal functioning.
 
T

trans. (T. Onoma)

I'm up for this if I we can a get a couple day extension on the quiz.

T.

On Saturday 11 December 2004 02:12 pm, Hans Fugal wrote:
| It would be good to be able to play against eachother when this is all
| over, so I propose the following 'protocol'. It's not complicated and
| probably not ready for tournament tic-tac-toe ;-), but it should do nicely.
|
| 3 entities - a game server and two players. The game waits for players
| to connect. Upon connection the game sends an arbitrary hello line, then
| the character representing this player, and the board size:
|
| Ruby Quiz Tic-Tac-Toe server version 1.0
| X 3x3
|
| And the player responds with an arbitrary hello:
|
| hello my name is Juan
|
| When two players have joined, the game sends the current state to both
| players:
|
| ___ ___ ___
|
|
| Then the game sends a message to the current player:
|
| move
|
| The player responds with an action:
|
| 1,1
|
| The game then sends the state, or "move" again if it was an invalid move:
|
| ___ _X_ ___
|
| game to player 2:
|
| move
|
| Player 2:
|
| 0,2
|
| game to both:
|
| __O _X_ ___
|
| ad nauseum. When the game is over, the game sends to the winner and
| loser, respectively:
|
| win
| lose
|
| The game then disconnects.
|
| White space is ignored, except for newlines (each message is on one
| line). The game state is in row-major order.
|
| Attached is such a server, and you are welcome to play a running version
| at fugal.net:1276. If you notice bugs please let me know - it has been
| through some rudimentary testing but not exhaustive testing.

--
( o _ カラãƒ
// trans.
/ \ (e-mail address removed)

ruby -rdrb -e
'DRb.start_service;duck=DRbObject.new(nil,"druby://whytheluckystiff.net:6503");puts
duck.toms'

I don't give a damn for a man that can only spell a word one way.
-Mark Twain

[8,16,20,29,78,65,2,14,26,12,12,28,71,114,12,13,12,82,72,21,17,4,10,2,95].
each_with_index{|x,i| $><<(x^'Begin landing your troops').chr}
-Tadayoshi Funaba
 
G

Giovanni Intini

Attached is such a server, and you are welcome to play a running version
at fugal.net:1276. If you notice bugs please let me know - it has been
through some rudimentary testing but not exhaustive testing.

I tried testing it by connecting twice to a server running locally and
it didn't notice a victory by player X on the right column (0,2 - 1,2
- 2,2). I had to continue playing until it eventually ended in a draw
 
B

Brian Schröder

It would be good to be able to play against eachother when this is all
over, so I propose the following 'protocol'. It's not complicated and
probably not ready for tournament tic-tac-toe ;-), but it should do nicely.

3 entities - a game server and two players. The game waits for players
to connect. Upon connection the game sends an arbitrary hello line, then
the character representing this player, and the board size:

This is a good idea. Let me propose one enhancement to the protocol. If we
would submit the active player each time together with the board it would make
the client actions simpler. (Though now I've got the Idea that the active
player is always Player |non-nil-positions| mod 2...

But it would be nicer to have this explicitly.

Regards,

Brian
 
B

Brian Schröder

I think, in retrospect, that what you thought were inconsistencies were
just the result of my server using row-major order, whereas yours uses
an inverted cartesian coordinate system. Either one is of course valid.
It would be good to choose one or the other. Your server is much better
code than mine and includes things I only dreamed of like threads. ;-)
So I'll run both - Brians on fugal.net:1277 and mine on fugal.net:1276

As a bit of trivia, 1276 is (1024 + ?T*3) (3 T's for tic-tac-toe)

Thank you! Call it a dumb question, but how are row-major, row-minor and
(inverted)-cartesion coordinate system defined.

I thought, row major would mean, that you have each row after another.

I could then simply adapt the adressing, to change to your protocol. This would
break no code of mine, and we could have only one server left.

Regards,

Brian
 
J

James Edward Gray II

I'm up for this if I we can a get a couple day extension on the quiz.

The quiz will stick to it's normal schedule, but as this doesn't
involve any deadlines, it shouldn't affect you.

James Edward Gray II

P.S. On a related note, Ruby Quiz will observe the Christmas holiday.
There will be no quiz the weekend of December 25th.
 
J

James Edward Gray II

I thought, row major would mean, that you have each row after another.

Row major means the row comes first. So you would handle addressing
with something like:

board[y][x] ...or... board[row][column]

Hope that helps.

James Edward Gray II
 
J

James Edward Gray II

It would be good to be able to play against eachother when this is all
over, so I propose the following 'protocol'. It's not complicated and
probably not ready for tournament tic-tac-toe ;-), but it should do
nicely.

Now that's just cool! Thanks Hans.

James Edward Gray II
 
B

Brian Schröder

I thought, row major would mean, that you have each row after another.

Row major means the row comes first. So you would handle addressing
with something like:

board[y][x] ...or... board[row][column]

Hope that helps.

James Edward Gray II


Then I'm implementing row major, while the original server that claimed to
support row major in reality used column major.

This should be no flame or anything, I just want to get the specification
right, and hope that all can interface with the same server.

Regards,

Brian
 
B

Brian Schröder

This week's Ruby Quiz is to implement an AI for playing Tic-Tac-Toe, with a
catch: You're not allowed to embed any knowledge of the game into your
creation beyond how to make legal moves and recognizing that it has won or
lost.

Your program is expected to "learn" from the games it plays, until it masters
the game and can play flawlessly.

Submissions can have any interface, but should be able to play against humans
interactively. However, I also suggest making it easy to play against
another AI, so you can "teach" the program faster.

Being able to monitor the learning progression and know when a program has
mastered the game would be very interesting, if you can manage it.


Here comes my solution. As always I've set up a website where you can browse
everything online, that will be the most comfortable way.

http://ruby.brian-schroeder/quiz/tictactoe/

I implemented minimax with alpa-beta search as a standard. It is impossible to
play better. And here we see, that tic-tac-toe is quite uninteresting, in that
it is impossible to win against a perfect player. The best one can achieve is a
draw.

As discussed elsewhere, minimax is not a valid solution. So I wrote a ai that
gets to know only the id of the state the world is in, the id's of all allowed
moves from here, and the state and move id's that the opponent choose. Also it
is allowed to know, when a it has reached a final state, and if it has won,
drawn or lost.

Using this information the ai learns about the possible moves and states, and
assigns each state with a value. It then greedily goes always for the state
with the highest value.

The amount of exploration vs. exploitation that the AI makes can be adjusted,
and is set higher for training than for the actual usage. Note that you can
train the AI to make errors, by repeatedly giving away a chance. This will
train the AI to always run into danger that you exploit this chance one day. ;)

The values of all states encountered in one game are updated according to the
outcome of the game. The later states are updated more strongly than earlier
states.

I will attach only the learning algorithm. The rest of the framework
- tictactoe-interface
- tictactoe-server
- tictactoe-client
- general minimax-alphabeta implementation
- the board
- the moves
can be browsed online if someone is interested.


class Learning < BasicInterface
attr_accessor :random_prob
attr_reader :player

def initialize
@state_values = Hash.new(0)
@state_transitions = {}
@random_prob = 0.05
end

def new_game(player)
@player = player
@states_visited = []
end

def choose_move(game)
moves = game.moves
if !@state_transitions[game.state_id] or rand < random_prob # Pick a
random move
move = moves[rand(moves.length)]
else # Pick the best move
move_id = @state_transitions[game.state_id].max{ | (ma, sa), (mb, sb) |
@state_values[sa] <=> @state_values[sb] }[0]
move = moves.select{|m| m.move_id == move_id}[0]
end
move
end

def inform_of_move(before, after, move)
@states_visited << before.state_id << after.state_id
(@state_transitions[before.state_id] ||= {})[move.move_id] =
after.state_id

if after.final?
winner = after.winner
if winner
value = winner == self.player ? 100.0 : -1000.0
else
value = 0.0
end

factor = 1.0
while state = @states_visited.pop
@state_values[state] = (1.0 - factor) * @state_values[state] + factor
* value
factor *= 0.5
end
end
end
end

Best regards,

Brian
 
J

James Edward Gray II

This week's Ruby Quiz is to implement an AI for playing Tic-Tac-Toe,
with a
catch: You're not allowed to embed any knowledge of the game into
your creation beyond how to make legal moves and recognizing that it
has won or lost.

Okay, I think I have a working solution. My problem is in another
area. My thinking about Tic-Tac-Toe itself seems flawed.

First, I decided to generate all the positions and came up with:

#!/usr/bin/env ruby

$seen = [ ]

def generate( board, piece, &pos_handler )
(0..8).each do |i|
next unless board[i, 1] == " "
pos = board.dup
pos = piece

next if $seen.include?(pos)
$seen << pos
pos_handler.call(pos)

next if pos.split("").values_at(0..2) == [piece] * 3 or
pos.split("").values_at(3..5) == [piece] * 3 or
pos.split("").values_at(6..8) == [piece] * 3 or
pos.split("").values_at(0, 3, 6) == [piece] * 3 or
pos.split("").values_at(1, 4, 7) == [piece] * 3 or
pos.split("").values_at(2, 5, 8) == [piece] * 3 or
pos.split("").values_at(0, 4, 8) == [piece] * 3 or
pos.split("").values_at(2, 4, 6) == [piece] * 3

generate(pos, (["X", "O"] - [piece])[0], &pos_handler)
end
end

puts "1\n \n \n \n\n"
count = 1
generate(" " * 9, "X") do |pos|
count += 1
puts "#{count}\n#{pos[0..2]}\n#{pos[3..5]}\n#{pos[6..8]}\n\n"
end

__END__

Does the 5478 positions I'm getting from that sound right?

Then I tried to brainstorm transforms which yield equivalent positions.
Obviously rotating 90, 180, or 270 are some. I also use "mirror"
(swap first and third columns), and mirror with 90, 180, and 270
rotates.

That gets me down to:

#!/usr/bin/env ruby

class Position
def self.rotate( pos )
pos.split("").values_at(6, 3, 0, 7, 4, 1, 8, 5, 2).join
end

def self.mirror( pos )
pos.split("").values_at(2, 1, 0, 5, 4, 3, 8, 7, 6).join
end

def initialize( pos )
@pos = pos
end

def ==( pos )
return true if @pos == pos

transform = pos
3.times do
transform = Position.rotate(transform)
return true if @pos == transform
end

transform = Position.mirror(pos)
return true if @pos == transform
3.times do
transform = Position.rotate(transform)
return true if @pos == transform
end

false
end

def to_s; "#{@pos[0..2]}\n#{@pos[3..5]}\n#{@pos[6..8]}\n" end
def inspect; @pos end
end

$seen = [ ]

def generate( board, piece, &pos_handler )
(0..8).each do |i|
next unless board[i, 1] == " "
pos = board.dup
pos = piece

next if $seen.include?(pos)
$seen << Position.new(pos)
pos_handler.call($seen[-1])

next if pos.split("").values_at(0..2) == [piece] * 3 or
pos.split("").values_at(3..5) == [piece] * 3 or
pos.split("").values_at(6..8) == [piece] * 3 or
pos.split("").values_at(0, 3, 6) == [piece] * 3 or
pos.split("").values_at(1, 4, 7) == [piece] * 3 or
pos.split("").values_at(2, 5, 8) == [piece] * 3 or
pos.split("").values_at(0, 4, 8) == [piece] * 3 or
pos.split("").values_at(2, 4, 6) == [piece] * 3

generate(pos, (["X", "O"] - [piece])[0], &pos_handler)
end
end

puts "1\n \n \n \n\n"
count = 1
generate(" " * 9, "X") do |pos|
count += 1
puts "#{count}\n#{pos}\n"
end

__END__

That gives me 765 positions, which I know is wrong, but I can't find
the error in my logic...

James Edward Gray II
 
M

martinus

Has anyone tried a genetic programming approach? I have tried it, but
it does not work as expected. My problem is how to represent the logic
of the individuals so that mutation/crossover makes sense.

martinus
 
T

trans. (T. Onoma)

On Monday 13 December 2004 03:17 am, martinus wrote:
| Has anyone tried a genetic programming approach? I have tried it, but
| it does not work as expected. My problem is how to represent the logic
| of the individuals so that mutation/crossover makes sense.
|
| martinus

I did. It's interesting b/c genetic agents are absolutely stupid and take a
long time to learn. Nonetheless I ran a population of 100 through a 5000
generations and they did improve, but still not perfect. I worry that it is
difficult to escape local maxima (which tells you something about Darwinian
evolution itself.)

As to the logic. I simply gave them a "tiny" abstract computer which gets
programmed with a simple AST. To do this I divide the board into three parts:
slots, and X's and O's postions. The slots are numbered as follows:

256 | 128 | 64
-----+-----+----
32 | 16 | 8
-----+-----+----
4 | 2 | 1

X's and O's representation of the board are simply a binary number
cooresponding to the above board for the slots they have. Quick example:

OXO
__O
X_X

X: 0b010000101
O: 0b101001000

I also represent whose turn it is with 0b111111111 and 0b000000000.

So with that data I create an AST of logical operations:

AST = [ "|m", "&m", "^m",
"|e", "&e", "^e",
"|s", "&s", "^s",
"|t", "&t", "^t" ]

m = current players board
e = enemy players board
s = a slot (per above chart)
t = turn

During mutation one of the three operators (|,&,^) and an arbitrary number can
also be added. So you might end up with something like this (completely made
up example):

"|m&m|s^e^234|t|t"

I process this using #eval (first character gets removed). For each turn I
loop through each available slot on the board (s) and pick the one that
returns the highest value. Perhaps not the most elegent design, but I'm
pretty certain it is a sufficiant formalism.

Of course that doesn't help with heredity b/c a slight change to these little
programs has drastic effects. We need something with smoother variance. So I
gave each agent any number of these little programs and average out the
results.

I'm not quite finished with my program. And unfortunately I won't be able to
get to it til tommorrow --but I'll post it then if you would like to look at
it.

What approach have you been working on?

T.
 
T

Thomas Leitner

| This week's Ruby Quiz is to implement an AI for playing Tic-Tac-Toe,
| with a catch: You're not allowed to embed any knowledge of the game
| into your creation beyond how to make legal moves and recognizing that
| it has won or lost.
|
| Your program is expected to "learn" from the games it plays, until it
| masters the game and can play flawlessly.

So, I have also tried to program a learning AI player. However, it still
does not do what it should and I do not know why. Maybe someone with
more brains can help???

Here is the code for the AI player:

class AIPlayer < Player

def initialize( game, sign )
super( game, sign )
@stats = {}
@cur_stats = []
end

def move
unless @stats.has_key? @game.board
@stats[@game.board.dup] = {}
@game.board.valid_moves.each {|m| @stats[@game.board][m] = 0}
end
moves = @stats[@game.board].sort {|a,b| a[1] <=> b[1]}
result = moves.shift
result = moves.shift while (moves.length > 0) && rand < 0.02
@cur_stats << [@game.board.dup, result[0]]
return result[0]
end

def game_finished( result )
mult = case result
when :won then 1000
when :lost then -100
else 0
end
@cur_stats.each_with_index do |o,i|
@stats[o[0]][o[1]] += mult * 2**i
end
@cur_stats = []
end

end

A short description for the code:

The @game object is the current game and knows the current board. When
it is the AI players move, the #move method is called and the AI player
has to return a number between 0 and 8. Thats because my tictactoe board
is a simple array and the array inidices correspond to these fields:

0 1 2
3 4 5
6 7 8

In the #move method, I create a new entry in the @stats Hash (key is the
board) if it does not have the current board as key and initialize its
value with a Hash (keys are the valid fields and values are set to 0).
After that the Hash with the valid moves for this board is taken from
the @stats Hash and sorted so that the moves with the highest
probability are in front. Then the first, or if rand < 0.02 the second
or if rand < 0.0.2 the third, ... value is returned.

If the game has finished, the #game_finished method is called and the
moves that were chosen in the game are assigned new values depending on
the result of the game.
 
J

James Edward Gray II

Okay, I think I have a working solution. My problem is in another
area. My thinking about Tic-Tac-Toe itself seems flawed.

Following up on this with my findings. I believe my results may have
been correct after all. A bug in my player code was making it look
otherwise, I think.

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top