function [v,h]=studmax(beta,PT,PS,R); %This function solves the student problem for PT as TV transition, PS as %study transition, and R as the possible returns. The rows of R correspond %to the initial states. The columns of R correspond to the two possible %actions, TV and study. The output, v, is the correct value function. The %output, h, is a row consisting of 1's when the optimal action is TV and %2's when the optimal action is study. Beta is the discount factor. v0=[0 0 0]; m=1; while m>0.0001; A=R+beta*[(PT*v0')';(PS*v0')']; [v1,index]=max(A); m=norm(v1-v0,2); v0=v1; end v=v0 h=index