6. Search: Games, Minimax, and Alpha-Beta

By: MIT OpenCourseWare

1247   18   158174

Uploaded on 01/10/2014

MIT 6.034 Artificial Intelligence, Fall 2010
View the complete course: http://ocw.mit.edu/6-034F10
Instructor: Patrick Winston

In this lecture, we consider strategies for adversarial games such as chess. We discuss the minimax algorithm, and how alpha-beta pruning improves its efficiency. We then examine progressive deepening, which ensures that some answer is always available.

License: Creative Commons BY-NC-SA
More information at http://ocw.mit.edu/terms
More courses at http://ocw.mit.edu

Comments (4):

By anonymous    2017-09-20

Question

How can I use TensorFlow (and possibly Convolution) to play a board game?

Specifically I have a small board ( 4x4 or 16x1 ) with possible values of [0,1,2,3,4] and I want to know the next best move.

High Level Answer

What you are asking for isn't much different than what DeepMind did with AlphaGo, but perhaps you don't need all the bells and whistles. Reviewing how DeepMind arranged their architecture will give you a great insight into how you could apply a similar strategy to your game.

At a high level they used 1 neural net to grade the current board configuration (the value network) and another neural net to suggest the possible next best move (the policy network) together with a Search.

Yes, you could go to town with this configuration and your system would probably play an awesome game.

Practical Next Steps

I'm an Engineer and as such I like to see some results immediately, so that's what this section is really about.

I've programmed AI for checkers using simple AlphaBeta (built by following this lecture from MIT). I'd start with that, apply AlphaBeta to your board game and hardcode a value function. For example a value function in checkers could be "have more pieces than your opponent".

You will know if you've implemented it well if a N+1 lookahead usually beats a N lookahead

Then it's time for TensorFlow! Use your hardcoded value function and several lookaheads as your training set of good moves. Train a CNN from TensorFlow to correctly weigh the higher lookahead moves as high.

Specifically, for move A from lookahead 1 vs move B from lookahead 2 vs move C from lookahead 3, your CNN should grade move C with the highest value, followed by move B, then move A, then all other possible moves.

If you can get a CNN to value a move 3 lookaheads from your hardcoded policy then effecitively your CNN in one lookahead has the power of your hard coded policy when your hard coded policy has 3 lookaheads.

Now switch out your hard coded policy and put in your CNN version 1.

Iterate and train your CNN version 2 just like you did version 1.

Further next steps

The above is just to get something up and going. It won't recreate AlphaGo. Investigate the Monte Carlo method of playing very open games like Go. Also investigate more into a "policy network" to generate possible next moves instead of just grading them after generation.

Good luck!

Final Tip

Convert all those numbers [0,1,2,3,4] into a 1 hot encoding using something like

board = tf.constant( [4, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 3] , dtype=tf.int16 )
board_as_mask = tf.one_hot( board , 5 )
sess = tf.Session()
sess.run(board_as_mask)
array([[ 0.,  0.,  0.,  0.,  1.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  0.,  1.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  0.,  1.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  0.,  1.,  0.,  0.],
    [ 0.,  0.,  1.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  1.,  0.,  0.,  0.],
    [ 0.,  0.,  0.,  1.,  0.]], dtype=float32)

How would I define/compute the error in these cases?

The error is usually sum of squares (y_ - y )^2 or cross entropy y_ * log(y). No need to do anything more fancy than that. y_ could be "is this the move that a N lookahead would choose that a N-1 wouldn't" or if you are using monte carlo, it could be "does this move lead to a win?"

Do I need to know the number of potential values for y?

You could instead of using a 1-hot encoding, use a embedding_lookup like tf.nn.embedding_lookup and populate it with thousands of randomly initialized items. If during training it encounters a new item then it will just start to update the respective embedding.

Example Embedding Lookup

If you wanted to go with an embedding lookup (which could have serious advantages), you'd kick start it with something like this:

board = tf.constant( [4, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 3] , dtype=tf.int32 )

index_max = 10
features = 3
embeddings = tf.Variable( tf.truncated_normal( [index_max, features] , dtype=tf.float32 ) )
board_as_embeddings = tf.nn.embedding_lookup(embeddings, board)

sess = tf.Session()
sess.run(tf.global_variables_initializer() )

print sess.run(board_as_embeddings)

and the output

[[-0.29756528  1.25859058 -1.68821394]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 1.07770884 -1.47092581 -1.85934114]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 1.07770884 -1.47092581 -1.85934114]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 1.07770884 -1.47092581 -1.85934114]
 [ 1.07770884 -1.47092581 -1.85934114]
 [ 0.19683863  1.09903252 -1.12252223]
 [ 0.19683863  1.09903252 -1.12252223]
 [-0.34236383  1.67817557 -1.54652882]]

This converts each game location into a semantic field of features, in this case 3.

Actual implementation of Frozen Lake

Here is some code that does the original ask with frozen lake

imports

import tensorflow as tf

index_max = 10
features = 5
width = 4
size = width*width

x_data = [[4, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 3],
    [0, 1, 1, 1, 1, 2, 1, 4, 1, 1, 1, 2, 2, 1, 1, 3]]

y_data = [[0, 4, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 1, 1, 3],
    [0, 1, 1, 1, 1, 2, 1, 0, 1, 1, 1, 2, 2, 4, 1, 3]]

x_input = tf.placeholder( shape=[None,size] , dtype=tf.int32 )
y_ground = tf.placeholder( shape=[None,size] , dtype=tf.int32 )


embedding_vector = tf.Variable( tf.truncated_normal( [index_max, features] , dtype=tf.float32 ) )
embedding_vector_3_ranks = tf.reshape(embedding_vector, [1,index_max, features])

x_embedding = tf.nn.embedding_lookup(embedding_vector, x_input)

def conv_layer( x , layers_in=features , layers_out=features ):
  strides = [1, 1, 1, 1]
  w = tf.Variable( tf.truncated_normal([3,3,layers_in, layers_out], stddev=0.1, dtype=tf.float32) )
  b = tf.Variable( tf.constant(0.1, shape=[layers_out], dtype=tf.float32) )
  h = tf.nn.conv2d( x, w, strides=[1, 1, 1, 1], padding='SAME' ) + b
  return h # tf.nn.relu( h )

hidden = tf.reshape( x_embedding, [-1,width,width,features] )
hidden = tf.nn.relu( conv_layer( hidden ) )
hidden = tf.nn.relu( conv_layer( hidden ) )
hidden = tf.nn.relu( conv_layer( hidden ) )
y_output = tf.reshape( conv_layer( hidden ) , [-1,features] )


item_as_embedding = tf.tile( y_output , tf.constant([1,index_max]) )
item_as_embedding = tf.reshape( item_as_embedding , [-1,index_max,features] )

item_distance_to_embedding = tf.square( embedding_vector_3_ranks - item_as_embedding )
item_distance_to_embedding = tf.reduce_mean( item_distance_to_embedding , -1 )
item_distance_to_embedding = tf.reshape( item_distance_to_embedding, [-1,index_max] )

y_estimate = tf.arg_max( -1.0 * tf.reduce_mean( tf.square( tf.reshape( embedding_vector, [1,index_max, features]) - item_as_embedding ) , -1 ) , 1 )
mask = tf.reshape( tf.one_hot(y_ground,index_max, dtype=tf.float32) , [-1,index_max] )
error = tf.reduce_sum( mask * item_distance_to_embedding , -1 )
learn = tf.train.AdamOptimizer(0.001).minimize(error)


sess = tf.Session()
sess.run(tf.global_variables_initializer() )

for _ in range(20) :
    feed_dict = { x_input : x_data, y_ground : y_data }
    print sess.run(y_estimate,feed_dict).reshape([-1,size]) , sess.run(tf.reduce_sum(error,-1),feed_dict)
    for _ in range(20) :
      sess.run(learn,feed_dict)

Original Thread

By anonymous    2017-09-20

I think if you want to learn how a chess AI usually works you should care a lot more about the theory and algorithms that can be used to solve the problem. If you understand these you will understand the knowledge (i.e. the theory) and the steps (i.e. the algorithms) that the code was built with.

For the below you'll likely need knowledge of tree data structures and at least some college algebra. If you want to feel more comfortable with tree data structures you could check out sites like hackerrank and hackerearth. I am not affiliated with either of the two sites but have found them helpful in understanding certain data structures.

Despite that I think theory is important, I think it is nice to see actual code as well. Here is a repository (repo) that is MIT licensed (which is open source by most standards) and covers some of the basic theory and algorithms used in the repo. The repo is written in Python, JavaScript, HTML, and CSS so it is web based as you asked. You can even play the AI from the repo on the web.

So as not to just leave a link in my answer I will leave a brief synopsis of the repo's algorithms here.

This repo states it's AI uses a depth of 3 levels in it's minimax algorithm and that it, "makes pretty good moves but also makes a lot of ill-advised ones as well. The AI's chess intelligence is estimated to be at a level 3 out of 9.". So basically I suggest using this repo as a learning experience and don't expect this AI is going to beat anyone reasonably excellent at chess.

The algorithms used are minimax and alpha-beta pruning.

For a detailed explanation of these see the repo's explanation in its readme and the MIT OpenCourseWare video lecture it links there.

I won't cover minimax's complete algorithm, but I'll tell a bit about the theory behind it. Minimax represents possible game states in a tree where each node has a particular score which represents and is computed from the whole game state at that point. Each node and child node pair have a move on the chess board from the node's state to the child node's state. Minimax's score for its node is such that a high score is favorable to one player and a low score is favorable to the other player. Minimax has two players a player who is trying to maximize the score (i.e. the maximizing player) and a player who is trying to minimize the score (i.e. the minimizing player).

The root node in minimax is the current score which represents the current game state. At each depth of the tree it alternates between the minimizing player's turn and the maximizing player's turn starting with the root node's depth which was the last player to play. Minimax carves a path down the tree so that if the current player is the maximizing player it will achieve the highest lower bound possible for the score and if the current player is the minimizing player it will give them the lowest upper bound possible for the score.

I won't go much into alpha-beta pruning since I didn't go into the details of the minimax's algorithm, but basically it is an extension to the minimax algorithm and allows it to leave out computing certain sections of the tree, because they will never be picked by minimizing and maximizing players. And thus since it eliminates search space from the tree alpha-beta pruning is more efficient than minimax.

Original Thread

Submit Your Video

If you have some great dev videos to share, please fill out this form.