Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"

From
Jump to: navigation, search
m
m
Line 11: Line 11:
 
* [[ChatGPT]]
 
* [[ChatGPT]]
 
* [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face]
 
* [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face]
 +
 +
https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/rlhf/rlhf.png

Revision as of 23:08, 28 January 2023