Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
m |
||
| Line 38: | Line 38: | ||
<youtube>wA8rjKueB3Q</youtube> | <youtube>wA8rjKueB3Q</youtube> | ||
<b>How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF) | <b>How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF) | ||
| − | </b><br>ChatGPT has recently been released by OpenAI, and it is fundamentally a next token/word prediction model. Given the prompt, predict the next token/word(s). When trained on a massive internet corpus, it manages to be very powerful and can do many tasks like summarization, code completion, question and answer zero-shot. | + | </b><br>ChatGPT has recently been released by [[OpenAI]], and it is fundamentally a next token/word prediction model. Given the prompt, predict the next token/word(s). When trained on a massive internet corpus, it manages to be very powerful and can do many tasks like summarization, code completion, question and answer zero-shot. |
Amidst the hype of ChatGPT, it can be easy to assume that the model can reason and think for itself. Here, we try to demystify how the model works, first starting with a basic introduction of Transformers, and then how we can improve the model's output using Reinforcement Learning with Human Feedback (RLHF). | Amidst the hype of ChatGPT, it can be easy to assume that the model can reason and think for itself. Here, we try to demystify how the model works, first starting with a basic introduction of Transformers, and then how we can improve the model's output using Reinforcement Learning with Human Feedback (RLHF). | ||
| Line 49: | Line 49: | ||
* [https://arxiv.org/pdf/1706.03762.pdf Original Transformer Paper (Attention is all you need)] | * [https://arxiv.org/pdf/1706.03762.pdf Original Transformer Paper (Attention is all you need)] | ||
* [https://arxiv.org/pdf/2005.14165.pdf GPT Paper] | * [https://arxiv.org/pdf/2005.14165.pdf GPT Paper] | ||
| − | * [https://arxiv.org/pdf/1911.00536.pdf DialoGPT Paper (conversational AI by Microsoft) | + | * [https://arxiv.org/pdf/1911.00536.pdf DialoGPT Paper (conversational AI by [[Microsoft]]) |
* [https://arxiv.org/pdf/2203.02155.pdf InstructGPT Paper (with RLHF)] | * [https://arxiv.org/pdf/2203.02155.pdf InstructGPT Paper (with RLHF)] | ||
Revision as of 10:48, 29 January 2023
YouTube search... ...Google search
- ChatGPT
- Reinforcement Learning (RL)
- Generative Pre-trained Transformer (GPT)
- Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face
|
|