Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
m |
||
| Line 17: | Line 17: | ||
* [https://www.deepmind.com/blog/learning-through-human-feedback Learning through human feedback |] [[Google]] DeepMind | * [https://www.deepmind.com/blog/learning-through-human-feedback Learning through human feedback |] [[Google]] DeepMind | ||
* [https://pub.towardsai.net/paper-review-summarization-using-reinforcement-learning-from-human-feedback-e000a66404ff Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI] ... AI Alignment, Reinforcement Learning from Human Feedback, [https://huggingface.co/blog/deep-rl-ppo Proximal Policy Optimization (PPO)] | * [https://pub.towardsai.net/paper-review-summarization-using-reinforcement-learning-from-human-feedback-e000a66404ff Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI] ... AI Alignment, Reinforcement Learning from Human Feedback, [https://huggingface.co/blog/deep-rl-ppo Proximal Policy Optimization (PPO)] | ||
| + | |||
<hr> | <hr> | ||
[https://arxiv.org/abs/1706.03741 Deep reinforcement learning from human preferences | P. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei] | [https://arxiv.org/abs/1706.03741 Deep reinforcement learning from human preferences | P. Christiano, J. Leike, T. B. Brown, M. Martic, S. Legg, and D. Amodei] | ||
<hr> | <hr> | ||
| + | |||
| + | |||
| + | <img src="https://preview.redd.it/fp5mh1sdayca1.png?width=2324&format=png&auto=webp&v=enabled&s=30fce8e48088730461253f0b94ac1f01673475b0" width="800"> | ||
| + | |||
| + | [https://gist.github.com/JoaoLages/c6f2dfd13d2484aa8bb0b2d567fbf093 Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation | Joao Lages] | ||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/rlhf/rlhf.png" width="800"> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/rlhf/rlhf.png" width="800"> | ||
| + | |||
[https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face] | [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face] | ||
Revision as of 13:45, 29 January 2023
YouTube search... ...Google search
- ChatGPT
- Reinforcement Learning (RL)
- Generative Pre-trained Transformer (GPT)
- Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge
- What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer
- Compendium of problems with RLHF | Raphael S - LessWrong
- Reinforcement Learning from Human Feedback(RLHF)-ChatGPT | Sthanikam Santhosh - Medium
- Learning through human feedback | Google DeepMind
- Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI ... AI Alignment, Reinforcement Learning from Human Feedback, Proximal Policy Optimization (PPO)
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation | Joao Lages
|
|