Difference between revisions of "Proximal Policy Optimization (PPO)"

From
Jump to: navigation, search
m
m (Text replacement - "BingAI" to "Bing")
Line 13: Line 13:
  
 
* [https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms | J. Schulman, F. Wolski, P. Dhariwal, A. Radford & O. Klimov  2017]  
 
* [https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms | J. Schulman, F. Wolski, P. Dhariwal, A. Radford & O. Klimov  2017]  
* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]]
+
* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[Bing]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]]
 
* [[Policy]]
 
* [[Policy]]
 
* [[Deep Reinforcement Learning (DRL)]]
 
* [[Deep Reinforcement Learning (DRL)]]

Revision as of 21:29, 30 March 2023