function NewQas=UpdateQFinger(Play,Payouts,Qas) % This program updates the value table (as it depends on possible plays) % % PlayNim also includes definitions of JStartEnd and Qas % % Input: % % Play is a vector. The jth entry is the choice made by the jth player. % % % Payouts is a vector. The jth entry contains the payout % to the jth player of the current game. % % Qas: This is a cell. The jth entry corresponds to the jth player % and is comprised of two columns. The kth row contains the current estimate of % the value of playing k and the number of times that k has already been chosen % by the jth player. % % Output: % % NewQas is the updated version of the values, after compensating for % the last % round of rewards given by Payouts % %% Algorithm is of the TD type % % Q_k+1=Q_k+alpha(r-Q_k) % % r is the reward that is eventually delivered % at present: there is no discounting %% % % alpha is the learning rate; %% % Add a weigthed version of Payouts (average it in) to the old state values % to get the new state values % % the more the magnitude of the TDError is non-zero, the more that learning is % taking place % NumPlayers= length(Play); NewQas= Qas; % The new values for jSubj=1:NumPlayers MoveInd= Play(jSubj); TDError = Payouts(jSubj)-Qas{jSubj}(MoveInd,1); % CHANGE ME NumAlreadyp1 = Qas{jSubj}(MoveInd,2)+1; NewQas{jSubj}(MoveInd,1) = NewQas{jSubj}(MoveInd,1) + (1/NumAlreadyp1)*TDError; % CHANGE ME NewQas{jSubj}(MoveInd,2)= NumAlreadyp1; % update the number of times this move was made % end % NewQas= Qas + alpha*(ToAdd-Qas); %%