Difference between revisions of "Lifelong Latent Actor-Critic (LILAC)"

From
Jump to: navigation, search
m
 
(6 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
 
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
 
}}
 
}}
[http://www.youtube.com/results?search_query=Lifelong+Latent+Actor+Critic+LILAC+Reinforcement+Machine+Learning YouTube search...]
+
[https://www.youtube.com/results?search_query=Lifelong+Latent+Actor+Critic+LILAC+Reinforcement+Machine+Learning YouTube search...]
[http://www.google.com/search?q=Lifelong+Latent+Actor+Critic+LILAC+Reinforcement+Machine+Learning ...Google search]
+
[https://www.google.com/search?q=Lifelong+Latent+Actor+Critic+LILAC+Reinforcement+Machine+Learning ...Google search]
  
* [[Reinforcement Learning (RL)]]:
+
* [[Lifelong Learning]]
 +
* [[Reinforcement Learning (RL)]]
 
** [[Monte Carlo]] (MC) Method - Model Free Reinforcement Learning
 
** [[Monte Carlo]] (MC) Method - Model Free Reinforcement Learning
 
** [[Markov Decision Process (MDP)]]
 
** [[Markov Decision Process (MDP)]]
 +
** [[State-Action-Reward-State-Action (SARSA)]]
 
** [[Q Learning]]
 
** [[Q Learning]]
** [[State-Action-Reward-State-Action (SARSA)]]
+
*** [[Deep Q Network (DQN)]]
 
** [[Deep Reinforcement Learning (DRL)]] DeepRL
 
** [[Deep Reinforcement Learning (DRL)]] DeepRL
 
** [[Distributed Deep Reinforcement Learning (DDRL)]]
 
** [[Distributed Deep Reinforcement Learning (DDRL)]]
** [[Deep Q Network (DQN)]]
 
 
** [[Evolutionary Computation / Genetic Algorithms]]
 
** [[Evolutionary Computation / Genetic Algorithms]]
 
** [[Actor Critic]]
 
** [[Actor Critic]]
 +
*** [[Asynchronous Advantage Actor Critic (A3C)]]
 
*** [[Advanced Actor Critic (A2C)]]
 
*** [[Advanced Actor Critic (A2C)]]
*** [[Asynchronous Advantage Actor Critic (A3C)]]
 
 
*** Lifelong Latent Actor-Critic (LILAC)
 
*** Lifelong Latent Actor-Critic (LILAC)
 
** [[Hierarchical Reinforcement Learning (HRL)]]
 
** [[Hierarchical Reinforcement Learning (HRL)]]
 +
* [[Policy]]  ... [[Policy vs Plan]] ... [[Constitutional AI]] ... [[Trust Region Policy Optimization (TRPO)]] ... [[Policy Gradient (PG)]] ... [[Proximal Policy Optimization (PPO)]]
 +
 +
 +
Lifelong Latent Actor-Critic (LILAC) is an off-policy reinforcement learning algorithm that can reason about and tackle lifelong non-stationarity. It leverages [[latent]] variable models to learn a representation of the environment from current and past experiences, and performs off-[[policy]] [[Reinforcement Learning (RL)]] with this representation. LILAC is based on the following three key ideas:
 +
 +
* <b>Lifelong learning</b>: LILAC can learn from experiences gathered over time, even if the environment changes. This is because LILAC learns a representation of the environment that is invariant to small changes.
 +
* <b>Latent variable models</b>: LILAC uses [[latent]] variable models to learn a compact representation of the environment. This representation captures the underlying structure of the environment, which makes it easier for LILAC to learn to act optimally.
 +
* <b>Maximum entropy policy</b>: LILAC uses a maximum entropy [[policy]], which is a [[policy]] that is as diverse as possible. This makes LILAC more robust to changes in the environment, and also helps it to explore new and potentially better [[Policy|policies]].
 +
 +
LILAC has been shown to outperform state-of-the-art [[Reinforcement Learning|RL algorithms]] on a variety of simulated environments that exhibit lifelong non-stationarity. For example, LILAC was able to learn to navigate a maze that changes over time, and to learn to play a game against an opponent who is constantly adapting. LILAC is a promising [[Reinforcement Learning|RL algorithm]] that has the potential to be used in a wide range of applications, such as robotics, autonomous driving, and finance. Here are some examples of how LILAC could be used in the real world:
 +
 +
* A robot that uses LILAC could learn to navigate a changing environment, such as a warehouse where the shelves are constantly being moved around.
 +
* A self-driving car that uses LILAC could learn to drive in a variety of different conditions, such as traffic, road construction, and weather changes.
 +
* A financial trading agent that uses LILAC could learn to trade in a market that is constantly changing.
 +
 +
= LILAC @ SAIL =
 +
Researchers from [https://ai.stanford.edu/ Stanford AI Lab (SAIL)] have devised a method to deal with data and environments that change over time in a way that outperforms some leading approaches to reinforcement learning. Lifelong Latent Actor-Critic, aka LILAC, uses latent variable models and a maximum entropy policy to leverage past experience for better sample efficiency and performance in dynamic environments. [https://venturebeat.com/2020/07/01/stanford-ai-researchers-introduce-lilac-reinforcement-learning-for-dynamic-environments/ Stanford AI researchers introduce LILAC, reinforcement learning for dynamic environments | Khari Johnson - VentureBeat]
 +
 +
 +
 +
== Continuous Action ==
  
Researchers from [http://ai.stanford.edu/ Stanford AI Lab (SAIL)] have devised a method to deal with data and environments that change over time in a way that outperforms some leading approaches to reinforcement learning. Lifelong Latent Actor-Critic, aka LILAC, uses latent variable models and a maximum entropy policy to leverage past experience for better sample efficiency and performance in dynamic environments. [http://venturebeat.com/2020/07/01/stanford-ai-researchers-introduce-lilac-reinforcement-learning-for-dynamic-environments/ Stanford AI researchers introduce LILAC, reinforcement learning for dynamic environments | Khari Johnson - VentureBeat]
+
<youtube>kWHSH2HgbNQ</youtube>
 +
<youtube>G0L8SN02clA</youtube>

Latest revision as of 12:08, 16 September 2023

YouTube search... ...Google search


Lifelong Latent Actor-Critic (LILAC) is an off-policy reinforcement learning algorithm that can reason about and tackle lifelong non-stationarity. It leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy Reinforcement Learning (RL) with this representation. LILAC is based on the following three key ideas:

  • Lifelong learning: LILAC can learn from experiences gathered over time, even if the environment changes. This is because LILAC learns a representation of the environment that is invariant to small changes.
  • Latent variable models: LILAC uses latent variable models to learn a compact representation of the environment. This representation captures the underlying structure of the environment, which makes it easier for LILAC to learn to act optimally.
  • Maximum entropy policy: LILAC uses a maximum entropy policy, which is a policy that is as diverse as possible. This makes LILAC more robust to changes in the environment, and also helps it to explore new and potentially better policies.

LILAC has been shown to outperform state-of-the-art RL algorithms on a variety of simulated environments that exhibit lifelong non-stationarity. For example, LILAC was able to learn to navigate a maze that changes over time, and to learn to play a game against an opponent who is constantly adapting. LILAC is a promising RL algorithm that has the potential to be used in a wide range of applications, such as robotics, autonomous driving, and finance. Here are some examples of how LILAC could be used in the real world:

  • A robot that uses LILAC could learn to navigate a changing environment, such as a warehouse where the shelves are constantly being moved around.
  • A self-driving car that uses LILAC could learn to drive in a variety of different conditions, such as traffic, road construction, and weather changes.
  • A financial trading agent that uses LILAC could learn to trade in a market that is constantly changing.

LILAC @ SAIL

Researchers from Stanford AI Lab (SAIL) have devised a method to deal with data and environments that change over time in a way that outperforms some leading approaches to reinforcement learning. Lifelong Latent Actor-Critic, aka LILAC, uses latent variable models and a maximum entropy policy to leverage past experience for better sample efficiency and performance in dynamic environments. Stanford AI researchers introduce LILAC, reinforcement learning for dynamic environments | Khari Johnson - VentureBeat


Continuous Action