Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
m |
||
| Line 81: | Line 81: | ||
* 0:00 Introduction | * 0:00 Introduction | ||
| − | * 3:09 Embedding Space | + | * 3:09 [[Embedding]] Space |
* 15:35 Overall Transformer Architecture | * 15:35 Overall Transformer Architecture | ||
* 36:06 Transformer (Details) | * 36:06 Transformer (Details) | ||
Revision as of 20:44, 26 June 2023
YouTube ... Quora ...Google search ...Google News ...Bing News
- Reinforcement Learning (RL)
- Human-in-the-Loop (HITL) Learning
- Assistants ... Personal Companions ... Agents ... Negotiation ... LangChain
- Generative AI ... Conversational AI ... OpenAI's ChatGPT ... Perplexity ... Microsoft's Bing ... You ...Google's Bard ... Baidu's Ernie
- Policy ... Policy vs Plan ... Constitutional AI ... Trust Region Policy Optimization (TRPO) ... Policy Gradient (PG) ... Proximal Policy Optimization (PPO)
- Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge
- What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer
- Compendium of problems with RLHF | Raphael S - LessWrong
- Reinforcement Learning from Human Feedback(RLHF)-ChatGPT | Sthanikam Santhosh - Medium
- Learning through human feedback | Google DeepMind
- Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI ... AI Alignment, Reinforcement Learning from Human Feedback, Proximal Policy Optimization (PPO)
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation | Joao Lages
Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - [[Hugging Face]
|
|