Difference between revisions of "Proximal Policy Optimization (PPO)"

From
Jump to: navigation, search
m
m
Line 37: Line 37:
 
*** [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
 
*** [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
 
*** [[Supervised]] Learning
 
*** [[Supervised]] Learning
*** [[Proximal Policy Optimization (PPO)]]]
+
*** [[Proximal Policy Optimization (PPO)]]
  
  

Revision as of 09:17, 26 February 2023