Difference between revisions of "Hierarchical Reinforcement Learning (HRL)"

From
Jump to: navigation, search
(HIerarchical Reinforcement learning with Off-policy correction (HIRO))
m
 
(7 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
 
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
 
}}
 
}}
[http://www.youtube.com/results?search_query=Hierarchical+Reinforcement+Learning Youtube search...]
+
[https://www.youtube.com/results?search_query=Hierarchical+Reinforcement+Learning Youtube search...]
[http://www.google.com/search?q=Hierarchical+Reinforcement+machine+learning+ML+artificial+intelligence ...Google search]
+
[https://www.google.com/search?q=Hierarchical+Reinforcement+machine+learning+ML+artificial+intelligence ...Google search]
  
* [[HIerarchical Reinforcement learning with Off-policy correction (HIRO)]]
+
* [https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning The Promise of Hierarchical Reinforcement Learning | Yannis Flet-Berliac - The Gradient]   
* [http://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning The Promise of Hierarchical Reinforcement Learning | Yannis Flet-Berliac - The Gradient]   
+
* [https://www.slideshare.net/DavidJardim/hierarchical-reinforcement-learning Hierarchical Reinforcement Learning | David Jardim]
* [http://www.slideshare.net/DavidJardim/hierarchical-reinforcement-learning Hierarchical Reinforcement Learning | David Jardim]
+
 
* [[Reinforcement Learning (RL)]]:
+
* [[Reinforcement Learning (RL)]]
 
** [[Monte Carlo]] (MC) Method - Model Free Reinforcement Learning
 
** [[Monte Carlo]] (MC) Method - Model Free Reinforcement Learning
 
** [[Markov Decision Process (MDP)]]
 
** [[Markov Decision Process (MDP)]]
 +
** [[State-Action-Reward-State-Action (SARSA)]]
 
** [[Q Learning]]
 
** [[Q Learning]]
** [[State-Action-Reward-State-Action (SARSA)]]
+
*** [[Deep Q Network (DQN)]]
 
** [[Deep Reinforcement Learning (DRL)]] DeepRL
 
** [[Deep Reinforcement Learning (DRL)]] DeepRL
*** [[IMPALA (Importance Weighted Actor-Learner Architecture)]]
 
 
** [[Distributed Deep Reinforcement Learning (DDRL)]]
 
** [[Distributed Deep Reinforcement Learning (DDRL)]]
** [[Deep Q Network (DQN)]]
 
 
** [[Evolutionary Computation / Genetic Algorithms]]
 
** [[Evolutionary Computation / Genetic Algorithms]]
** [[Asynchronous Advantage Actor Critic (A3C)]]
+
** [[Actor Critic]]
** [[MERLIN]]
+
*** [[Asynchronous Advantage Actor Critic (A3C)]]
 +
*** [[Advanced Actor Critic (A2C)]]
 +
*** [[Lifelong Latent Actor-Critic (LILAC)]]
 +
** Hierarchical Reinforcement Learning (HRL)
 +
* [[Policy]]  ... [[Policy vs Plan]] ... [[Constitutional AI]] ... [[Trust Region Policy Optimization (TRPO)]] ... [[Policy Gradient (PG)]] ... [[Proximal Policy Optimization (PPO)]]
  
 +
 +
HRL is a promising approach to extend traditional [[Reinforcement Learning (RL)]] methods to solve more complex tasks.
  
 
<youtube>x_QjJry0hTc</youtube>
 
<youtube>x_QjJry0hTc</youtube>
<youtube>QEmuhofpFIU</youtube>
 
 
<youtube>zQy02LsARo0</youtube>
 
<youtube>zQy02LsARo0</youtube>
 
<youtube>K5MlmO0UJtI</youtube>
 
<youtube>K5MlmO0UJtI</youtube>
Line 32: Line 36:
 
<youtube>ARfpQzRCWT4</youtube>
 
<youtube>ARfpQzRCWT4</youtube>
  
http://thegradient.pub/content/images/2019/03/image44.png
+
https://thegradient.pub/content/images/2019/03/image44.png
  
 
== HIerarchical Reinforcement learning with Off-policy correction (HIRO) ==
 
== HIerarchical Reinforcement learning with Off-policy correction (HIRO) ==
* [http://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
+
* [https://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
* [http://arxiv.org/pdf/1805.08296.pdf Data-Efficient Hierarchical Reinforcement Learning | O. Nachum, S. Gu, H. Lee, and S. Levine - Google Brain]
+
* [https://arxiv.org/pdf/1805.08296.pdf Data-Efficient Hierarchical Reinforcement Learning | O. Nachum, S. Gu, H. Lee, and S. Levine - Google Brain]
  
 
HIRO can be used to learn highly complex behaviors for simulated robots, such
 
HIRO can be used to learn highly complex behaviors for simulated robots, such
 
as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of real-time interaction. In comparisons with a number of prior HRL methods.
 
as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of real-time interaction. In comparisons with a number of prior HRL methods.
  
http://miro.medium.com/max/678/1*Fq-TQ7Mu2XDOIZ6R7dkRjw.png
+
<youtube>yLHzDky2ApI</youtube>
 +
 
 +
https://miro.medium.com/max/678/1*Fq-TQ7Mu2XDOIZ6R7dkRjw.png

Latest revision as of 15:35, 16 April 2023