Hi there,
I'm developing an MDP code in GAMS and can't understand the errors:
1. Firstly, I keep getting a "Error 203: Too few arguments for function" error. the function is a standard bellman V(s) = max_a (R(s,a) + gamma * sum_s' P(s'|s,a) * V(s'))
2. At the final display for optimal policy, I keep getting a "Error 767 Unexpected symbol will terminate the loop - symbol replaced by )" around the smax and "Error 409: Unrecognisable Item" the equations is an optimized bellman Q(s,a) = R(s,a) + gamma * sum_s' P(s'|s,a) * V(s')
sets
regions /r1*r11/
actions /A, B, C/
states /low, medium, high/
states_next /low, medium, high/;
parameters
reward(states,actions) /low.A -5, low.B 1, low.C 5, medium.A -5, medium.B 1, medium.C 5, high.A -5, high.B 1, high.C 5/
discount_rate /0.95/
value(states, regions)
value_new(states, regions)
epsilon /0.01/
max_iter /1000/
iter;
value(states, regions)=0;
value_new(states, regions)=0;
iter=0;
* Transition probability
* Probability of transitioning from one state to another when an action is taken
* Format: (region, current state, action, next state)
* Actions A=415V, B=33/11kV, C=330/132kV
Set transition_prob(regions, states, actions, states_next) /r1.high.A.low 0.54, r1.high.B.low 0.54, r1.high.C.low 0.54,
r2.medium.A.low 0.54, r2.medium.B.low 0.54, r2.medium.C.low 0.54,
r3.medium.A.low 0.54, r3.medium.B.low 0.54, r3.medium.C.low 0.54,
r4.medium.A.low 0.54, r4.medium.B.low 0.54, r4.medium.C.low 0.54,
r5.low.A.low 0.54, r5.low.B.low 0.54, r5.low.C.low 0.54,
r6.low.A.low 0.54, r6.low.B.low 0.54, r6.low.C.low 0.54,
r7.low.A.low 0.54, r7.low.B.low 0.54, r7.low.C.low 0.5,
r8.low.A.low 0.54, r8.low.B.low 0.54, r8.low.C.low 0.54,
r9.low.A.low 0.54, r9.low.B.low 0.54, r9.low.C.low 0.54,
r10.low.A.low 0.54, r10.low.B.low 0.54, r10.low.C.low 0.54,
r11.low.A.low 0.54, r11.low.B.low 0.54, r11.low.C.low 0.54/;
* Value iteration to convergence
while((iter = max_iter) or ((value_new - value) < epsilon),
*while(iter = max_iter,
iter = iter + 1;
value = value_new;
loop(regions,
loop(states,
loop(actions,
value_new(states, regions) = max(reward(states, actions) +
discount_rate * sum(transition_prob(regions, states, actions, states_next) * value_new(states, regions)),
value_new(states, regions))
);
);
);
);
* Print the optimal policy
display "Optimal policy for each region:";
loop(regions,
display regions;
loop(states,
display states, " action: ", actions(smax(reward(states, actions) +
discount_rate *
sum(transition_prob(regions, states, actions, states_next) * value(states_next, regions))
, actions))
);
Please kindly assist.
I'm developing an MDP code in GAMS and can't understand the errors:
1. Firstly, I keep getting a "Error 203: Too few arguments for function" error. the function is a standard bellman V(s) = max_a (R(s,a) + gamma * sum_s' P(s'|s,a) * V(s'))
2. At the final display for optimal policy, I keep getting a "Error 767 Unexpected symbol will terminate the loop - symbol replaced by )" around the smax and "Error 409: Unrecognisable Item" the equations is an optimized bellman Q(s,a) = R(s,a) + gamma * sum_s' P(s'|s,a) * V(s')
sets
regions /r1*r11/
actions /A, B, C/
states /low, medium, high/
states_next /low, medium, high/;
parameters
reward(states,actions) /low.A -5, low.B 1, low.C 5, medium.A -5, medium.B 1, medium.C 5, high.A -5, high.B 1, high.C 5/
discount_rate /0.95/
value(states, regions)
value_new(states, regions)
epsilon /0.01/
max_iter /1000/
iter;
value(states, regions)=0;
value_new(states, regions)=0;
iter=0;
* Transition probability
* Probability of transitioning from one state to another when an action is taken
* Format: (region, current state, action, next state)
* Actions A=415V, B=33/11kV, C=330/132kV
Set transition_prob(regions, states, actions, states_next) /r1.high.A.low 0.54, r1.high.B.low 0.54, r1.high.C.low 0.54,
r2.medium.A.low 0.54, r2.medium.B.low 0.54, r2.medium.C.low 0.54,
r3.medium.A.low 0.54, r3.medium.B.low 0.54, r3.medium.C.low 0.54,
r4.medium.A.low 0.54, r4.medium.B.low 0.54, r4.medium.C.low 0.54,
r5.low.A.low 0.54, r5.low.B.low 0.54, r5.low.C.low 0.54,
r6.low.A.low 0.54, r6.low.B.low 0.54, r6.low.C.low 0.54,
r7.low.A.low 0.54, r7.low.B.low 0.54, r7.low.C.low 0.5,
r8.low.A.low 0.54, r8.low.B.low 0.54, r8.low.C.low 0.54,
r9.low.A.low 0.54, r9.low.B.low 0.54, r9.low.C.low 0.54,
r10.low.A.low 0.54, r10.low.B.low 0.54, r10.low.C.low 0.54,
r11.low.A.low 0.54, r11.low.B.low 0.54, r11.low.C.low 0.54/;
* Value iteration to convergence
while((iter = max_iter) or ((value_new - value) < epsilon),
*while(iter = max_iter,
iter = iter + 1;
value = value_new;
loop(regions,
loop(states,
loop(actions,
value_new(states, regions) = max(reward(states, actions) +
discount_rate * sum(transition_prob(regions, states, actions, states_next) * value_new(states, regions)),
value_new(states, regions))
);
);
);
);
* Print the optimal policy
display "Optimal policy for each region:";
loop(regions,
display regions;
loop(states,
display states, " action: ", actions(smax(reward(states, actions) +
discount_rate *
sum(transition_prob(regions, states, actions, states_next) * value(states_next, regions))
, actions))
);
Please kindly assist.