Difference between revisions of "Apprenticeship Learning - Inverse Reinforcement Learning (IRL)"
| Line 2: | Line 2: | ||
* [[Reinforcement Learning]] | * [[Reinforcement Learning]] | ||
| + | * [[Inside Out - Curious Optimistic Reasoning]] | ||
* [[Generative Adversarial Network (GAN)]] | * [[Generative Adversarial Network (GAN)]] | ||
* [http://arxiv.org/pdf/1806.06877.pdf A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress | Saurabh Arora, Prashant Doshi] 18 Jun 2018 | * [http://arxiv.org/pdf/1806.06877.pdf A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress | Saurabh Arora, Prashant Doshi] 18 Jun 2018 | ||
Revision as of 10:01, 29 October 2018
- Reinforcement Learning
- Inside Out - Curious Optimistic Reasoning
- Generative Adversarial Network (GAN)
- A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress | Saurabh Arora, Prashant Doshi 18 Jun 2018
- Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications | Daniel S. Brown, Scott Niekum 23 Jun 2018
Inverse reinforcement learning (IRL) infers/derives a reward function from observed behavior/demonstrations, allowing for policy improvement and generalization. While ordinary "reinforcement learning" involves using rewards and punishments to learn behavior, in IRL the direction is reversed, and a robot observes a person's behavior to figure out what goal that behavior seems to be trying to achieve.
Imitation Learning
The ongoing explosion of spatiotemporal tracking data has now made it possible to analyze and model fine-grained behaviors in a wide range of domains. For instance, tracking data is now being collected for every NBA basketball game with players, referees, and the ball tracked at 25 Hz, along with annotated game events such as passes, shots, and fouls. Other settings include laboratory animals, people in public spaces, professionals in settings such as operating rooms, actors speaking and performing, digital avatars in virtual environments, and even the behavior of other computational systems.