Difference between revisions of "Proximal Policy Optimization (PPO)"

From
Jump to: navigation, search
m
m
Line 14: Line 14:
 
* [https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms | J. Schulman, F. Wolski, P. Dhariwal, A. Radford & O. Klimov  2017]  
 
* [https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms | J. Schulman, F. Wolski, P. Dhariwal, A. Radford & O. Klimov  2017]  
 
* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]]
 
* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]]
* [[Policy}]
+
* [[Policy]]
 
* [[Deep Reinforcement Learning (DRL)]]
 
* [[Deep Reinforcement Learning (DRL)]]
 
* [[Policy Gradient (PG)]]
 
* [[Policy Gradient (PG)]]

Revision as of 11:53, 26 March 2023