Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
m |
||
| Line 5: | Line 5: | ||
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | ||
}} | }} | ||
| − | [ | + | [https://www.youtube.com/results?search_query=ai+Reinforcement+Human+Feedback+RLHF YouTube] |
| − | [ | + | [https://www.quora.com/search?q=ai%20Reinforcement%20Human%20Feedback%20XRLHF ... Quora] |
| + | [https://www.google.com/search?q=ai+Reinforcement+Human+Feedback+RLHF ...Google search] | ||
| + | [https://news.google.com/search?q=ai+Reinforcement+Human+Feedback+RLHF ...Google News] | ||
| + | [https://www.bing.com/news/search?q=ai+Reinforcement+Human+Feedback+RLHF&qft=interval%3d%228%22 ...Bing News] | ||
| + | |||
* [[Reinforcement Learning (RL)]] | * [[Reinforcement Learning (RL)]] | ||
Revision as of 14:15, 19 March 2023
YouTube ... Quora ...Google search ...Google News ...Bing News
- Reinforcement Learning (RL)
- Assistants ... Hybrid Assistants ... Agents ... Negotiation
- ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review
- Sequence to Sequence (Seq2Seq)
- Recurrent Neural Network (RNN)
- Long Short-Term Memory (LSTM)
- Bidirectional Encoder Representations from Transformers (BERT) ... a better model, but less investment than the larger OpenAI organization
- ChatGPT | OpenAI:
- Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge
- What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer
- Compendium of problems with RLHF | Raphael S - LessWrong
- Reinforcement Learning from Human Feedback(RLHF)-ChatGPT | Sthanikam Santhosh - Medium
- Learning through human feedback | Google DeepMind
- Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI ... AI Alignment, Reinforcement Learning from Human Feedback, Proximal Policy Optimization (PPO)
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation | Joao Lages
Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - [[Hugging Face]
|
|