Difference between revisions of "Reinforcement Learning (RL)"
(→Reinforcement Learning | Phil Tabor) |
(→Reinforcement Learning | Phil Tabor) |
||
| Line 89: | Line 89: | ||
⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=8492s 02:21:32]) How to Beat Space Invaders with Policy Gradients | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=8492s 02:21:32]) How to Beat Space Invaders with Policy Gradients | ||
| − | ⌨️ (02:34:41) How to Create Your Own Reinforcement Learning Environment Part 1 | + | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=9281s] 02:34:41) How to Create Your Own Reinforcement Learning Environment Part 1 |
| − | ⌨️ (02:55:39) How to Create Your Own Reinforcement Learning Environment Part 2 | + | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=10539s 02:55:39]) How to Create Your Own Reinforcement Learning Environment Part 2 |
| − | ⌨️ (03:08:20) Fundamentals of Reinforcement Learning | + | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=11300s 03:08:20]) Fundamentals of Reinforcement Learning |
| − | ⌨️ (03:17:09) Markov Decision Processes | + | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=11829s 03:17:09]) Markov Decision Processes |
| − | ⌨️ (03:23:02) The Explore Exploit Dilemma | + | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=12182s 03:23:02]) The Explore Exploit Dilemma |
| − | ⌨️ (03:29:19) Reinforcement Learning in the Open AI Gym: SARSA | + | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=12559s 03:29:19]) Reinforcement Learning in the Open AI Gym: SARSA |
| − | ⌨️ (03:39:56) Reinforcement Learning in the Open AI Gym: Double Q Learning | + | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=13196s 03:39:56]) Reinforcement Learning in the Open AI Gym: Double Q Learning |
| − | ⌨️ (03:54:07) Conclusion | + | ⌨️ ([http://www.youtube.com/watch?v=ELE2_Mftqoc&t=14047s 03:54:07]) Conclusion |
Revision as of 11:50, 1 September 2019
YouTube search... ...Google search
- Markov Decision Process (MDP)
- Monte Carlo (MC) Method - Model Free Reinforcement Learning
- Deep Reinforcement Learning (DRL) - DeepRL
- Neural Architecture Search (NAS) with Reinforcement Learning | Barret Zoph & Quoc V. Le ...Wikipedia
- Distributed Deep Reinforcement Learning (DeepRL)
- Deep Q Learning (DQN)
- Neural Coreference
- State-Action-Reward-State-Action (SARSA)
- Deep Deterministic Policy Gradient (DDPG)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- AdaNet
___________________________________________________________
- Apprenticeship Learning - Inverse Reinforcement Learning (IRL)
- Lifelong Learning
- Dopamine Google DeepMind
- Inside Out - Curious Optimistic Reasoning
- World Models
- Google DeepMind AlphaGo Zero
- Google’s AI picks which machine learning models will produce the best results | Kyle Wiggers - VentureBeat off-policy classification,” or OPC, which evaluates the performance of AI-driven agents by treating evaluation as a classification problem
- Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more | Maxim Lapan
- Reinforcement-Learning-Notebooks - A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python
This is a bit similar to the traditional type of data analysis; the algorithm discovers through trial and error and decides which action results in greater rewards. Three major components can be identified in reinforcement learning functionality: the agent, the environment, and the actions. The agent is the learner or decision-maker, the environment includes everything that the agent interacts with, and the actions are what the agent can do. Reinforcement learning occurs when the agent chooses actions that maximize the expected reward over a given time. This is best achieved when the agent has a good policy to follow. Machine Learning: What it is and Why it Matters | Priyadharshini @ simplilearn
Contents
Q Learning Algorithm and Agent - Reinforcement Learning w/ Python Tutorial | Sentdex - Harrison
Reinforcement Learning | Phil Tabor
⌨️ (00:00:00) Introduction
⌨️ (00:01:30) Intro to Deep Q Learning
⌨️ (00:08:56) How to Code Deep Q Learning in Tensorflow
⌨️ (00:52:03) Deep Q Learning with Pytorch Part 1: The Q Network
⌨️ (01:06:21) Deep Q Learning with Pytorch part 2: Coding the Agent
⌨️ (01:28:54) Deep Q Learning with Pytorch part 3
⌨️ (01:46:39) Intro to Policy Gradients 3: Coding the main loop
⌨️ (01:55:01) How to Beat Lunar Lander with Policy Gradients
⌨️ (02:21:32) How to Beat Space Invaders with Policy Gradients
⌨️ ([1] 02:34:41) How to Create Your Own Reinforcement Learning Environment Part 1
⌨️ (02:55:39) How to Create Your Own Reinforcement Learning Environment Part 2
⌨️ (03:08:20) Fundamentals of Reinforcement Learning
⌨️ (03:17:09) Markov Decision Processes
⌨️ (03:23:02) The Explore Exploit Dilemma
⌨️ (03:29:19) Reinforcement Learning in the Open AI Gym: SARSA
⌨️ (03:39:56) Reinforcement Learning in the Open AI Gym: Double Q Learning
⌨️ (03:54:07) Conclusion
Jump Start
Lunar Lander: Deep Q learning is Easy in PyTorch
Lunar Lander: How to Beat Lunar Lander with Policy Gradients | Tensorflow Tutorial
Breakout: How to Code Deep Q Learning in Tensorflow (Tutorial)
Gridworld: How To Create Your Own Reinforcement Learning Environments