Difference between revisions of "Proximal Policy Optimization (PPO)"

From
Jump to: navigation, search
m
m
Line 14: Line 14:
 
* [https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms | J. Schulman, F. Wolski, P. Dhariwal, A. Radford & O. Klimov  2017]  
 
* [https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms | J. Schulman, F. Wolski, P. Dhariwal, A. Radford & O. Klimov  2017]  
 
* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]]
 
* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]]
 +
* [[Policy}}
 
* [[Deep Reinforcement Learning (DRL)]]
 
* [[Deep Reinforcement Learning (DRL)]]
 
* [[Policy Gradient (PG)]]
 
* [[Policy Gradient (PG)]]

Revision as of 11:52, 26 March 2023