Optimal Controls


1 Markov process

2 Markov Decision processes


  • Probability space
  • State Space:
  • Action Space:
  • Reward: .
  • Admissible controls:

pathwisely uniqueness:

Discrete-time Control Problem


Suppose the reward is a deterministic function of state and action. Define

This implies depends only on and , and thus we can define

Note that:

Similarly,

Bellman Expectation Equation


In this section, we will derive the dynamic programming principle, also known as the Bellman Equation.

3 Controls


Admissible controls:

Proof. For () The Bellman Expectation Equation implies

Taking supremum of left side and obtain the desires.

On the other hand, For all , there is such that

Now the result follows from the the arbitrariness of and .

Now we can define

If is finite dimensional, then is an contraction.

Policy Iteration For solving Bellman Equations:

Special Cases


Matrix Form:

Define

This implies

Remark: is a diagonally dominant matrix and thus invertible.

Suppose otherwise, then null space of is non-degenerate. Take . We have

Suppose , then

contradicts

Iteration Algorithms For solving Linear equations:

import numpy as np

def fixed_point_iteration(A, b, x0, tol=1e-8, max_iter=1000):
    x = x0.copy()
    for k in range(max_iter):
        x_new = A.dot(x) + b
        if np.linalg.norm(x_new - x, ord=2) < tol:
            print(f"在 {k+1} 次迭代后收敛")
            return x_new
        x = x_new
    return x

if __name__ == "__main__":
    A = np.array([[0.5, 0.1],
                  [0.2, 0.3]])
    b = np.array([1.0, 2.0])
    x0 = np.zeros(2)
    
    solution = fixed_point_iteration(A, b, x0)
    print("迭代求得的解为:", solution)