Difference between revisions of "Reinforcement Learning (RL)"
m |
|||
| Line 12: | Line 12: | ||
___________________________________________________________ | ___________________________________________________________ | ||
| − | * [[Inverse Reinforcement Learning (IRL)]] | + | * [[Inverse Reinforcement Learning (IRL)]] |
* [[Inside Out - Curious Optimistic Reasoning]] | * [[Inside Out - Curious Optimistic Reasoning]] | ||
Revision as of 06:05, 4 August 2018
- Markov Decision Process (MDP)
- Deep Reinforcement Learning (DRL)
- Deep Q Learning (DQN)
- Neural Coreference
- State-Action-Reward-State-Action (SARSA)
- Deep Deterministic Policy Gradient (DDPG)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
___________________________________________________________
- Inside Out - Curious Optimistic Reasoning
- Google DeepMind AlphaGo Zero
- Reinforcement-Learning-Notebooks - A collection of Reinforcement Learning algorithms from Sutton and Barto's book and other research papers implemented in Python
This is a bit similar to the traditional type of data analysis; the algorithm discovers through trial and error and decides which action results in greater rewards. Three major components can be identified in reinforcement learning functionality: the agent, the environment, and the actions. The agent is the learner or decision-maker, the environment includes everything that the agent interacts with, and the actions are what the agent can do. Reinforcement learning occurs when the agent chooses actions that maximize the expected reward over a given time. This is best achieved when the agent has a good policy to follow. Machine Learning: What it is and Why it Matters | Priyadharshini @ simplilearn