Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
m |
||
| (25 intermediate revisions by the same user not shown) | |||
| Line 5: | Line 5: | ||
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | ||
}} | }} | ||
| − | [ | + | [https://www.youtube.com/results?search_query=ai+Reinforcement+Human+Feedback+RLHF YouTube] |
| − | [ | + | [https://www.quora.com/search?q=ai%20Reinforcement%20Human%20Feedback%20XRLHF ... Quora] |
| + | [https://www.google.com/search?q=ai+Reinforcement+Human+Feedback+RLHF ...Google search] | ||
| + | [https://news.google.com/search?q=ai+Reinforcement+Human+Feedback+RLHF ...Google News] | ||
| + | [https://www.bing.com/news/search?q=ai+Reinforcement+Human+Feedback+RLHF&qft=interval%3d%228%22 ...Bing News] | ||
| − | |||
* [[Reinforcement Learning (RL)]] | * [[Reinforcement Learning (RL)]] | ||
| − | * [[Generative | + | * [[Human-in-the-Loop (HITL) Learning]] |
| − | * [[ | + | * [[Agents]] ... [[Robotic Process Automation (RPA)|Robotic Process Automation]] ... [[Assistants]] ... [[Personal Companions]] ... [[Personal Productivity|Productivity]] ... [[Email]] ... [[Negotiation]] ... [[LangChain]] |
| + | * [[What is Artificial Intelligence (AI)? | Artificial Intelligence (AI)]] ... [[Generative AI]] ... [[Machine Learning (ML)]] ... [[Deep Learning]] ... [[Neural Network]] ... [[Reinforcement Learning (RL)|Reinforcement]] ... [[Learning Techniques]] | ||
| + | * [[Conversational AI]] ... [[ChatGPT]] | [[OpenAI]] ... [[Bing/Copilot]] | [[Microsoft]] ... [[Gemini]] | [[Google]] ... [[Claude]] | [[Anthropic]] ... [[Perplexity]] ... [[You]] ... [[phind]] ... [[Grok]] | [https://x.ai/ xAI] ... [[Groq]] ... [[Ernie]] | [[Baidu]] | ||
| + | * [[Policy]] ... [[Policy vs Plan]] ... [[Constitutional AI]] ... [[Trust Region Policy Optimization (TRPO)]] ... [[Policy Gradient (PG)]] ... [[Proximal Policy Optimization (PPO)]] | ||
* [https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge] | * [https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge] | ||
* [https://aisupremacy.substack.com/p/what-is-reinforcement-learning-with What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer] | * [https://aisupremacy.substack.com/p/what-is-reinforcement-learning-with What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer] | ||
| Line 77: | Line 82: | ||
* 0:00 Introduction | * 0:00 Introduction | ||
| − | * 3:09 Embedding Space | + | * 3:09 [[Embedding]] Space |
* 15:35 Overall Transformer Architecture | * 15:35 Overall Transformer Architecture | ||
* 36:06 Transformer (Details) | * 36:06 Transformer (Details) | ||
| Line 96: | Line 101: | ||
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| + | <youtube>bSvTVREwSNw</youtube> | ||
Latest revision as of 20:18, 9 April 2024
YouTube ... Quora ...Google search ...Google News ...Bing News
- Reinforcement Learning (RL)
- Human-in-the-Loop (HITL) Learning
- Agents ... Robotic Process Automation ... Assistants ... Personal Companions ... Productivity ... Email ... Negotiation ... LangChain
- Artificial Intelligence (AI) ... Generative AI ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
- Conversational AI ... ChatGPT | OpenAI ... Bing/Copilot | Microsoft ... Gemini | Google ... Claude | Anthropic ... Perplexity ... You ... phind ... Grok | xAI ... Groq ... Ernie | Baidu
- Policy ... Policy vs Plan ... Constitutional AI ... Trust Region Policy Optimization (TRPO) ... Policy Gradient (PG) ... Proximal Policy Optimization (PPO)
- Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge
- What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer
- Compendium of problems with RLHF | Raphael S - LessWrong
- Reinforcement Learning from Human Feedback(RLHF)-ChatGPT | Sthanikam Santhosh - Medium
- Learning through human feedback | Google DeepMind
- Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI ... AI Alignment, Reinforcement Learning from Human Feedback, Proximal Policy Optimization (PPO)
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation | Joao Lages
Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - [[Hugging Face]
|
|