Difference between revisions of "Actor Critic"

From
Jump to: navigation, search
m (Text replacement - "http:" to "https:")
m
Line 22: Line 22:
 
*** [[Lifelong Latent Actor-Critic (LILAC)]]
 
*** [[Lifelong Latent Actor-Critic (LILAC)]]
 
** [[Hierarchical Reinforcement Learning (HRL)]]
 
** [[Hierarchical Reinforcement Learning (HRL)]]
 
 
 
* [https://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
 
* [https://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
* [[Policy Gradient (PG)]]
+
* [[Policy]]  ... [[Policy vs Plan]] ... [[Constitutional AI]] ... [[Trust Region Policy Optimization (TRPO)]] ... [[Policy Gradient (PG)]] ... [[Proximal Policy Optimization (PPO)]]
  
 
Policy gradients and [[Deep Q Network (DQN)]] can only get us so far, but what if we used two networks to help train and AI instead of one? Thats the idea behind actor critic algorithms.
 
Policy gradients and [[Deep Q Network (DQN)]] can only get us so far, but what if we used two networks to help train and AI instead of one? Thats the idea behind actor critic algorithms.
Line 31: Line 29:
 
<youtube>2vJtbAha3To</youtube>
 
<youtube>2vJtbAha3To</youtube>
 
<youtube>CLZkpo8rEG</youtube>
 
<youtube>CLZkpo8rEG</youtube>
 
 
<youtube>aODdNpihRwM</youtube>
 
<youtube>aODdNpihRwM</youtube>
 
<youtube>w_3mmm0P0j8</youtube>
 
<youtube>w_3mmm0P0j8</youtube>

Revision as of 15:38, 16 April 2023