Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
|||
| Line 14: | Line 14: | ||
* [[Reinforcement Learning (RL)]] | * [[Reinforcement Learning (RL)]] | ||
* [[Assistants]] ... [[Hybrid Assistants]] ... [[Agents]] ... [[Negotiation]] ... [[LangChain]] | * [[Assistants]] ... [[Hybrid Assistants]] ... [[Agents]] ... [[Negotiation]] ... [[LangChain]] | ||
| − | * [[Generative AI]] ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]] ... [[Microsoft]]'s [[ | + | * [[Generative AI]] ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]] ... [[Microsoft]]'s [[Bing]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]] |
* [https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge] | * [https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge] | ||
* [https://aisupremacy.substack.com/p/what-is-reinforcement-learning-with What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer] | * [https://aisupremacy.substack.com/p/what-is-reinforcement-learning-with What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer] | ||
Revision as of 21:31, 30 March 2023
YouTube ... Quora ...Google search ...Google News ...Bing News
- Reinforcement Learning (RL)
- Assistants ... Hybrid Assistants ... Agents ... Negotiation ... LangChain
- Generative AI ... OpenAI's ChatGPT ... Perplexity ... Microsoft's Bing ... You ...Google's Bard ... Baidu's Ernie
- Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge
- What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer
- Compendium of problems with RLHF | Raphael S - LessWrong
- Reinforcement Learning from Human Feedback(RLHF)-ChatGPT | Sthanikam Santhosh - Medium
- Learning through human feedback | Google DeepMind
- Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI ... AI Alignment, Reinforcement Learning from Human Feedback, Proximal Policy Optimization (PPO)
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation | Joao Lages
Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - [[Hugging Face]
|
|