Agent Environment Policy Reward (Short-term Reward) Value (Long-term Reward)
Understand Reinforcement Learning in 5 Minutes_bilibili