Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"
m |
m |
||
| Line 73: | Line 73: | ||
* 1:19:00 Reinforcement Learning from Human Feedback (RLHF) | * 1:19:00 Reinforcement Learning from Human Feedback (RLHF) | ||
* 1:45:15 Discussion | * 1:45:15 Discussion | ||
| − | |||
| − | |||
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator. | AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator. | ||
| − | + | * [https://delvingintotech.wordpress.com/ Online AI blog] | |
| − | + | * [https://www.linkedin.com/in/chong-min-tan-94652288/ LinkedIn] | |
| − | + | * [https://www.twitch.tv/johncm99 Twitch] | |
| − | + | * [https://twitter.com/johntanchongmin Twitter] | |
| − | + | * [ https://simmer.io/@chongmin Try out my games here] | |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
Revision as of 11:12, 29 January 2023
YouTube search... ...Google search
- ChatGPT
- Reinforcement Learning (RL)
- Generative Pre-trained Transformer (GPT)
- Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face
- Introduction to Reinforcement Learning with Human Feedback | Edwin Chen - Surge
- What is Reinforcement Learning with Human Feedback (RLHF)? | Michael Spencer
- Compendium of problems with RLHF | Raphael S - LessWrong
- Reinforcement Learning from Human Feedback(RLHF)-ChatGPT | Sthanikam Santhosh - Medium
- Learning through human feedback | Google DeepMind
- Paper Review: Summarization using Reinforcement Learning From Human Feedback | - Towards AI ... AI Alignment, Reinforcement Learning from Human Feedback, Proximal Policy Optimization (PPO)
|
|