Difference between revisions of "Deep Reinforcement Learning (DRL)"

Revision as of 20:43, 1 September 2019

OpenAI Gym
Reinforcement Learning (RL)
- Monte Carlo (MC) Method - Model Free Reinforcement Learning
- Markov Decision Process (MDP)
- Q Learning
- State-Action-Reward-State-Action (SARSA)
- Distributed Deep Reinforcement Learning (DDRL)
- Deep Q Network (DQN)
- Evolutionary Computation / Genetic Algorithms
- Actor Critic
- Hierarchical Reinforcement Learning (HRL)
- MERLIN

OTHER: Policy Gradient Methods

_______________________________________________________________________________________

1*BEby_oK1mU8Wq0HABOqeVQ.png

Goal-oriented algorithms, which learn how to attain a complex objective (goal) or maximize along a particular dimension over many steps; for example, maximize the points won in a game over many moves. Reinforcement learning solves the difficult problem of correlating immediate actions with the delayed returns they produce. Like humans, reinforcement learning algorithms sometimes have to wait a while to see the fruit of their decisions. They operate in a delayed return environment, where it can be difficult to understand which action leads to which outcome over many time steps.

MERLIN

In deep learning, this is the driving thesis behind external, key-value-based memory stores. This idea is not new; Neural Turing Machines, one of the first and favorite papers I ever read, augmented neural nets with a differentiable, external memory store accessible via vector-valued “read” and “write” heads to specific locations. We can easily imagine this being extended into RL, where at any given time-step, an agent is given both its environment observation and memories relevant to its current state. That’s exactly what the recent MERLIN architecture extends upon. MERLIN has 2 components: a memory-based predictor (MBP), and a policy network. The MBP is responsible for compressing observations into useful, low-dimensional “state variables” to store directly into a key-value memory matrix. It is also responsible for passing relevant memories to the policy, which uses those memories and the current state to output actions.... MERLIN is not the only Deep Reinforcement Learning (DRL) to use external memory stores — all the way back in 2016, researchers were already applying this idea in an MQN, or Memory Q-Network Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science

1*CVjt2C_IUe7Mdcee4TQIhw.png

Importance Weighted Actor-Learner Architecture (IMPALA)

YouTube search... ...Google search

uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.

Impala-Figures-180206-r01-03%2520%25281%2529.width-1500.png

@@ Line 45: / Line 45: @@
 <youtube>w33Lplx49_A</youtube>
 <youtube>eYlJsDH7ggE</youtube>
+== MERLIN ==
+In deep learning, this is the driving thesis behind external, key-value-based memory stores. This idea is not new; [http://arxiv.org/pdf/1410.5401.pdf Neural Turing Machines], one of the first and favorite papers I ever read, augmented neural nets with a differentiable, external memory store accessible via vector-valued “read” and “write” heads to specific locations. We can easily imagine this being extended into RL, where at any given time-step, an agent is given both its environment observation and memories relevant to its current state. That’s exactly what the recent MERLIN architecture extends upon. MERLIN has 2 components: a memory-based predictor (MBP), and a policy network. The MBP is responsible for compressing observations into useful, low-dimensional “state variables” to store directly into a key-value memory matrix. It is also responsible for passing relevant memories to the policy, which uses those memories and the current state to output actions.... MERLIN is not the only [[Deep Reinforcement Learning (DRL)]] to use external memory stores — all the way back in 2016, researchers were already applying this idea in an MQN, or Memory Q-Network [http://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
+http://miro.medium.com/max/932/1*CVjt2C_IUe7Mdcee4TQIhw.png
 == Importance Weighted Actor-Learner Architecture (IMPALA) ==

Difference between revisions of "Deep Reinforcement Learning (DRL)"

Revision as of 20:43, 1 September 2019

OTHER: Policy Gradient Methods

MERLIN

Importance Weighted Actor-Learner Architecture (IMPALA)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools