Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"

Revision as of 23:02, 28 January 2023

YouTube search... ...Google search

@@ Line 9: / Line 9: @@
 * [[Reinforcement Learning (RL)]]
-** [[Monte Carlo]] (MC) Method - Model Free Reinforcement Learning
-** [[Markov Decision Process (MDP)]]
-** [[State-Action-Reward-State-Action (SARSA)]]
-** [[Q Learning]]
-*** [[Deep Q Network (DQN)]]
-** [[Deep Reinforcement Learning (DRL)]] DeepRL
-** [[Distributed Deep Reinforcement Learning (DDRL)]]
-** [[Evolutionary Computation / Genetic Algorithms]]
-** [[Actor Critic]]
-*** [[Asynchronous Advantage Actor Critic (A3C)]]
-*** [[Advanced Actor Critic (A2C)]]
-*** [[Lifelong Latent Actor-Critic (LILAC)]]
-** [[Hierarchical Reinforcement Learning (HRL)]]
-* [[Game Theory]]
-* [[Policy Gradient (PG)]]
-* [[Trust Region Policy Optimization (TRPO)]]
-* [[Proximal Policy Optimization (PPO)]]
-* [[Robotics]]
-* [http://arxiv.org/abs/1611.01578 Neural Architecture Search (NAS) with Reinforcement Learning | Barret Zoph & Quoc V. Le]  ...[http://en.wikipedia.org/wiki/Neural_architecture_search#NAS_with_Reinforcement_Learning  Wikipedia]
-* [http://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
-* [[AdaNet]]
-* [[Loop#Feedback Loop - The AI Economist|Feedback Loop - The AI Economist]]
-* [[Learning Techniques]]
-** [[Apprenticeship Learning - Inverse Reinforcement Learning (IRL)]]
 * [[ChatGPT]]
 * [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face]

Difference between revisions of "Reinforcement Learning (RL) from Human Feedback (RLHF)"

Revision as of 23:02, 28 January 2023

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools