Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
m |
||
| Line 10: | Line 10: | ||
* [[ChatGPT]] | * [[ChatGPT]] | ||
* [[Reinforcement Learning (RL)]] | * [[Reinforcement Learning (RL)]] | ||
| − | * [[Generative Pre-trained Transformer (GPT)]] | + | * [https://www.technologyreview.com/2023/02/08/1068068/chatgpt-is-everywhere-heres-where-it-came-from/ ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review] |
| − | * [[ | + | ** [[Sequence to Sequence (Seq2Seq)]] |
| + | ** [[Recurrent Neural Network (RNN)]] | ||
| + | ** [[Long Short-Term Memory (LSTM)]] | ||
| + | ** [[Bidirectional Encoder Representations from Transformers (BERT)]] ... a better model, but less investment than the larger [[OpenAI]] organization | ||
| + | ** [[ChatGPT]] | [[OpenAI]]: | ||
| + | *** [[Transformer]] / [[Attention]] Mechanism | ||
| + | *** [[Generative Pre-trained Transformer (GPT)]] | ||
| + | *** [[Reinforcement Learning (RL) from Human Feedback (RLHF)]] | ||
| + | *** [[Supervised]] Learning | ||
| + | *** [[Proximal Policy Optimization (PPO)]]] | ||
* [https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge] | * [https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1 Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge] | ||
* [https://aisupremacy.substack.com/p/what-is-reinforcement-learning-with What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer] | * [https://aisupremacy.substack.com/p/what-is-reinforcement-learning-with What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer] | ||
Revision as of 01:43, 12 February 2023
YouTube search... ...Google search
- ChatGPT
- Reinforcement Learning (RL)
- ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review
- Sequence to Sequence (Seq2Seq)
- Recurrent Neural Network (RNN)
- Long Short-Term Memory (LSTM)
- Bidirectional Encoder Representations from Transformers (BERT) ... a better model, but less investment than the larger OpenAI organization
- ChatGPT | OpenAI:
- Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge
- What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer
- Compendium of problems with RLHF | Raphael S - LessWrong
- Reinforcement Learning from Human Feedback(RLHF)-ChatGPT | Sthanikam Santhosh - Medium
- Learning through human feedback | Google DeepMind
- Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI ... AI Alignment, Reinforcement Learning from Human Feedback, Proximal Policy Optimization (PPO)
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation | Joao Lages
Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - [[Hugging Face]
|
|