|
|
| Line 9: |
Line 9: |
| | | | |
| | * [[Reinforcement Learning (RL)]] | | * [[Reinforcement Learning (RL)]] |
| − | ** [[Monte Carlo]] (MC) Method - Model Free Reinforcement Learning
| |
| − | ** [[Markov Decision Process (MDP)]]
| |
| − | ** [[State-Action-Reward-State-Action (SARSA)]]
| |
| − | ** [[Q Learning]]
| |
| − | *** [[Deep Q Network (DQN)]]
| |
| − | ** [[Deep Reinforcement Learning (DRL)]] DeepRL
| |
| − | ** [[Distributed Deep Reinforcement Learning (DDRL)]]
| |
| − | ** [[Evolutionary Computation / Genetic Algorithms]]
| |
| − | ** [[Actor Critic]]
| |
| − | *** [[Asynchronous Advantage Actor Critic (A3C)]]
| |
| − | *** [[Advanced Actor Critic (A2C)]]
| |
| − | *** [[Lifelong Latent Actor-Critic (LILAC)]]
| |
| − | ** [[Hierarchical Reinforcement Learning (HRL)]]
| |
| − | * [[Game Theory]]
| |
| − | * [[Policy Gradient (PG)]]
| |
| − | * [[Trust Region Policy Optimization (TRPO)]]
| |
| − | * [[Proximal Policy Optimization (PPO)]]
| |
| − | * [[Robotics]]
| |
| − | * [http://arxiv.org/abs/1611.01578 Neural Architecture Search (NAS) with Reinforcement Learning | Barret Zoph & Quoc V. Le] ...[http://en.wikipedia.org/wiki/Neural_architecture_search#NAS_with_Reinforcement_Learning Wikipedia]
| |
| − | * [http://towardsdatascience.com/advanced-reinforcement-learning-6d769f529eb3 Beyond DQN/A3C: A Survey in Advanced Reinforcement Learning | Joyce Xu - Towards Data Science]
| |
| − | * [[AdaNet]]
| |
| − | * [[Loop#Feedback Loop - The AI Economist|Feedback Loop - The AI Economist]]
| |
| − | * [[Learning Techniques]]
| |
| − | ** [[Apprenticeship Learning - Inverse Reinforcement Learning (IRL)]]
| |
| − |
| |
| | * [[ChatGPT]] | | * [[ChatGPT]] |
| | * [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face] | | * [https://huggingface.co/blog/rlhf Illustrating Reinforcement Learning from Human Feedback (RLHF) | N. Lambert, L. Castricato, L. von Werra, and A. Havrilla - Hugging Face] |