Difference between revisions of "Actor Critic"
m (BPeat moved page Asynchronous Advantage Actor Critic (A3C) to Actor Critic without leaving a redirect) |
|||
| Line 23: | Line 23: | ||
* [[Policy Gradient (PG)]] | * [[Policy Gradient (PG)]] | ||
| + | Policy gradients and [[Deep Q Network (DQN)]] can only get us so far, but what if we used two networks to help train and AI instead of one? Thats the idea behind actor critic algorithms. | ||
| + | <youtube>aODdNpihRwM</youtube> | ||
| + | <youtube>w_3mmm0P0j8</youtube> | ||
| + | <youtube>O5BlozCJBSE</youtube> | ||
| + | <youtube>GCfUdkCL7FQ</youtube> | ||
| + | <youtube>bRfUxQs6xIM</youtube> | ||
| + | <youtube>sTZ4GyJ4FZU</youtube> | ||
| + | <youtube>5Ke-d1Itk3k</youtube> | ||
| + | <youtube>GCfUdkCL7FQ</youtube> | ||
| − | + | ||
| − | <youtube> | + | == Asynchronous Advantage Actor Critic (A3C) == |
| + | |||
| + | <youtube>KJt1X-tRCbw</youtube> | ||
Revision as of 16:35, 1 September 2019
YouTube search... ...Google search
- Reinforcement Learning (RL):
- Monte Carlo (MC) Method - Model Free Reinforcement Learning
- Markov Decision Process (MDP)
- Q Learning
- State-Action-Reward-State-Action (SARSA)
- Deep Reinforcement Learning (DRL) DeepRL
- Distributed Deep Reinforcement Learning (DDRL)
- Deep Q Network (DQN)
- Evolutionary Computation / Genetic Algorithms
- Hierarchical Reinforcement Learning (HRL)
- MERLIN
- Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science
- Policy Gradient (PG)
Policy gradients and Deep Q Network (DQN) can only get us so far, but what if we used two networks to help train and AI instead of one? Thats the idea behind actor critic algorithms.
Asynchronous Advantage Actor Critic (A3C)