%% function [NewBoardPosition,GameOver] =PlayNim(CurrentBoardPosition,JStartEnd,Qas) % % This function is called by NimScript.m % % It uses a soft max policy to decide on the next board position % % Input: % CurrentBoardPosition: this is an integer index which yields the state % of the game. The actual configuration of the board can be found % in NimScript.m such as NimColumn{CurrentBoardPosition}. % % JStartEnd: This is comprised of rows of two elements indicating all the possible % transitions in the game. A typical row might be [1 4] indicating that % one can move from board position 1 to board 4. It represents % the topology of the game. % % Qas: This is comprised of rows of two elements. The first element of % each is the value of the corresponding row of JStartEnd. % The second element represents the number of times that the % choice given by the row JStartEnd has been taken. % % Output: % % NewBoardPosition is the choice of the board given by PlayNim % % GameOver is a flag indicating that the game is over (last step % reached) %% % The algorithm is a softmax policy. % tau=3; %indsPlayed= find(JStartEnd(:,1)==1); %TotalPlayed= sum(Qas(indsPlayed,2 )); sEnd=max(JStartEnd(:,2)); % the index of the empty state => GameOver GameOver=0; indQas = find(CurrentBoardPosition==JStartEnd(:,1)); % find indices possible moves TotalPlayedQas= sum(Qas(indQas,2 )); % Total Number of moves played tau= 3* min(1,500/TotalPlayedQas); % temperature depends on tau=max(tau,.3); relevantQs = Qas(indQas,1); % These are the values of the possible moves expQs= exp( relevantQs/tau); % This the Boltzmann weighting for the moves sexpQs= length(expQs); choiceVec= expQs/sum(expQs); % This is the relative probabilities of the choices %% Given the relative probs, pick a random number and choose a move % out of the possible moves. Do this by finding the cumulative % distribution (sumChoiceVec), then find in what interval a random number falls. sumChoiceVec(1)= choiceVec(1); for jj=2:sexpQs sumChoiceVec(jj)= sumChoiceVec(jj-1)+choiceVec(jj); end rrand=rand(); inds = find(rrand