Difference between revisions of "Apprenticeship Learning - Inverse Reinforcement Learning (IRL)"

Revision as of 16:34, 16 April 2023

Learning Techniques
- Reinforcement Learning
- Imitation Learning
Inside Out - Curious Optimistic Reasoning
Generative Adversarial Network (GAN)
Connecting Brains
Policy ... Policy vs Plan ... Constitutional AI ... Trust Region Policy Optimization (TRPO) ... Policy Gradient (PG) ... Proximal Policy Optimization (PPO)
Generative AI ... Conversational AI ... OpenAI's ChatGPT ... Perplexity ... Microsoft's Bing ... You ...Google's Bard ... Baidu's Ernie
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress | Saurabh Arora, Prashant Doshi 18 Jun 2018
Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications | Daniel S. Brown, Scott Niekum 23 Jun 2018
Guide to MBIRL – Model Based Inverse Reinforcement Learning | Aishwarya Verma

Inverse reinforcement learning (IRL) infers/derives a reward function from observed behavior/demonstrations, allowing for policy improvement and generalization. While ordinary "reinforcement learning" involves using rewards and punishments to learn behavior, in IRL the direction is reversed, and a robot observes a person's behavior to figure out what goal that behavior seems to be trying to achieve.

@@ Line 14: / Line 14: @@
 * [[Generative Adversarial Network (GAN)]]
 * [[Connecting Brains]]
-* [[Policy]]
+* [[Policy]]   ... [[Policy vs Plan]] ... [[Constitutional AI]] ... [[Trust Region Policy Optimization (TRPO)]] ... [[Policy Gradient (PG)]] ... [[Proximal Policy Optimization (PPO)]]
 * [[Generative AI]]  ... [[Conversational AI]] ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[Bing]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]]
 * [https://arxiv.org/pdf/1806.06877.pdf A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress | Saurabh Arora, Prashant Doshi] 18 Jun 2018

Difference between revisions of "Apprenticeship Learning - Inverse Reinforcement Learning (IRL)"

Revision as of 16:34, 16 April 2023

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools