1) MDP, Value Function, Bellman Equation
출처: https://subsay.tistory.com/14 https://www.youtube.com/playlist?list=PLpRS2w0xWHTcTZyyX8LMmtbcMXpd3s4TU 이전 참고 글: 2020/07/29 - [공부/강화학습] - Reinforcement Learning, Open AI GYM # Markov Process MP는 state transition Matrix다. state set S와 state transition probability P로 표현한다. MP = [ S , P ] P = [[P11, ,,, , P1n], ,,, , [Pn1, ,,, , Pnn]] # Markov Reward Process MRP는 MP에 reward와 discount factor를 추가로..