Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
m |
||
| Line 11: | Line 11: | ||
* [[ChatGPT]] | * [[ChatGPT]] | ||
* [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face] | * [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face] | ||
| + | |||
| + | https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/rlhf/rlhf.png | ||