Difference between revisions of "Actor Critic"

From
Jump to: navigation, search
m (BPeat moved page Asynchronous Advantage Actor Critic (A3C) to Actor Critic without leaving a redirect)
Line 23: Line 23:
 
* [[Policy Gradient (PG)]]
 
* [[Policy Gradient (PG)]]
  
 +
Policy gradients and [[Deep Q Network (DQN)]] can only get us so far, but what if we used two networks to help train and AI instead of one? Thats the idea behind actor critic algorithms.
  
 +
<youtube>aODdNpihRwM</youtube>
 +
<youtube>w_3mmm0P0j8</youtube>
 +
<youtube>O5BlozCJBSE</youtube>
 +
<youtube>GCfUdkCL7FQ</youtube>
 +
<youtube>bRfUxQs6xIM</youtube>
 +
<youtube>sTZ4GyJ4FZU</youtube>
 +
<youtube>5Ke-d1Itk3k</youtube>
 +
<youtube>GCfUdkCL7FQ</youtube>
  
<youtube>Vz5l886eptw</youtube>
+
 
<youtube>e3Jy2vShroE</youtube>
+
== Asynchronous Advantage Actor Critic (A3C) ==
 +
 
 +
<youtube>KJt1X-tRCbw</youtube>

Revision as of 16:35, 1 September 2019