Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"

From
Jump to: navigation, search
(Created page with "{{#seo: |title=PRIMO.ai |titlemode=append |keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, M...")
 
m
Line 9: Line 9:
  
 
* [[Reinforcement Learning (RL)]]
 
* [[Reinforcement Learning (RL)]]
** [[Monte Carlo]] (MC) Method - Model Free Reinforcement Learning
 
** [[Markov Decision Process (MDP)]]
 
** [[State-Action-Reward-State-Action (SARSA)]]
 
** [[Q Learning]]
 
*** [[Deep Q Network (DQN)]]
 
** [[Deep Reinforcement Learning (DRL)]] DeepRL
 
** [[Distributed Deep Reinforcement Learning (DDRL)]]
 
** [[Evolutionary Computation / Genetic Algorithms]]
 
** [[Actor Critic]]
 
*** [[Asynchronous Advantage Actor Critic (A3C)]]
 
*** [[Advanced Actor Critic (A2C)]]
 
*** [[Lifelong Latent Actor-Critic (LILAC)]]
 
** [[Hierarchical Reinforcement Learning (HRL)]]
 
* [[Game Theory]]
 
* [[Policy Gradient (PG)]]
 
* [[Trust Region Policy Optimization (TRPO)]]
 
* [[Proximal Policy Optimization (PPO)]]
 
* [[Robotics]]
 
* [http://arxiv.org/abs/1611.01578 Neural Architecture Search (NAS) with Reinforcement Learning | Barret Zoph & Quoc V. Le]  ...[http://en.wikipedia.org/wiki/Neural_architecture_search#NAS_with_Reinforcement_Learning  Wikipedia]
 
* [http://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
 
* [[AdaNet]]
 
* [[Loop#Feedback Loop - The AI Economist|Feedback Loop - The AI Economist]]
 
* [[Learning Techniques]]
 
** [[Apprenticeship Learning - Inverse Reinforcement Learning (IRL)]]
 
 
 
* [[ChatGPT]]
 
* [[ChatGPT]]
 
* [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face]
 
* [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face]

Revision as of 23:02, 28 January 2023