Difference between revisions of "Hierarchical Reinforcement Learning (HRL)"

From
Jump to: navigation, search
(HIerarchical Reinforcement learning with Off-policy correction (HIRO))
Line 35: Line 35:
  
 
== HIerarchical Reinforcement learning with Off-policy correction (HIRO) ==
 
== HIerarchical Reinforcement learning with Off-policy correction (HIRO) ==
 +
* [http://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
 +
* [http://arxiv.org/pdf/1805.08296.pdf Data-Efficient Hierarchical Reinforcement Learning | O. Nachum, S. Gu, H. Lee, and S. Levine - Google Brain]
  
 +
HIRO can be used to learn highly complex behaviors for simulated robots, such
 +
as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of real-time interaction. In comparisons with a number of prior HRL methods.
  
 
http://miro.medium.com/max/678/1*Fq-TQ7Mu2XDOIZ6R7dkRjw.png
 
http://miro.medium.com/max/678/1*Fq-TQ7Mu2XDOIZ6R7dkRjw.png

Revision as of 15:41, 1 September 2019