Lifelong Latent Actor-Critic (LILAC)
- Lifelong Learning
- Reinforcement Learning (RL)
- Monte Carlo (MC) Method - Model Free Reinforcement Learning
- Markov Decision Process (MDP)
- State-Action-Reward-State-Action (SARSA)
- Q Learning
- Deep Reinforcement Learning (DRL) DeepRL
- Distributed Deep Reinforcement Learning (DDRL)
- Evolutionary Computation / Genetic Algorithms
- Actor Critic
- Hierarchical Reinforcement Learning (HRL)
- Policy ... Policy vs Plan ... Constitutional AI ... Trust Region Policy Optimization (TRPO) ... Policy Gradient (PG) ... Proximal Policy Optimization (PPO)
Lifelong Latent Actor-Critic (LILAC) is an off-policy reinforcement learning algorithm that can reason about and tackle lifelong non-stationarity. It leverages latent variable models to learn a representation of the environment from current and past experiences, and performs off-policy Reinforcement Learning (RL) with this representation. LILAC is based on the following three key ideas:
- Lifelong learning: LILAC can learn from experiences gathered over time, even if the environment changes. This is because LILAC learns a representation of the environment that is invariant to small changes.
- Latent variable models: LILAC uses latent variable models to learn a compact representation of the environment. This representation captures the underlying structure of the environment, which makes it easier for LILAC to learn to act optimally.
- Maximum entropy policy: LILAC uses a maximum entropy policy, which is a policy that is as diverse as possible. This makes LILAC more robust to changes in the environment, and also helps it to explore new and potentially better policies.
LILAC has been shown to outperform state-of-the-art RL algorithms on a variety of simulated environments that exhibit lifelong non-stationarity. For example, LILAC was able to learn to navigate a maze that changes over time, and to learn to play a game against an opponent who is constantly adapting. LILAC is a promising RL algorithm that has the potential to be used in a wide range of applications, such as robotics, autonomous driving, and finance. Here are some examples of how LILAC could be used in the real world:
- A robot that uses LILAC could learn to navigate a changing environment, such as a warehouse where the shelves are constantly being moved around.
- A self-driving car that uses LILAC could learn to drive in a variety of different conditions, such as traffic, road construction, and weather changes.
- A financial trading agent that uses LILAC could learn to trade in a market that is constantly changing.
LILAC @ SAIL
Researchers from Stanford AI Lab (SAIL) have devised a method to deal with data and environments that change over time in a way that outperforms some leading approaches to reinforcement learning. Lifelong Latent Actor-Critic, aka LILAC, uses latent variable models and a maximum entropy policy to leverage past experience for better sample efficiency and performance in dynamic environments. Stanford AI researchers introduce LILAC, reinforcement learning for dynamic environments | Khari Johnson - VentureBeat