Difference between revisions of "Proximal Policy Optimization (PPO)"

Revision as of 14:30, 19 March 2023

@@ Line 13: / Line 13: @@
 * [https://arxiv.org/abs/1707.06347 Proximal policy optimization algorithms | J. Schulman, F. Wolski, P. Dhariwal, A. Radford & O. Klimov  2017]
+* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]]
 * [[Deep Reinforcement Learning (DRL)]]
 * [[Policy Gradient (PG)]]
@@ Line 31: / Line 32: @@
 * [[Assistants]] ... [[Hybrid Assistants]]  ... [[Agents]]  ... [[Negotiation]]
 * [[Natural Language Processing (NLP)]]  ...[[Natural Language Generation (NLG)|Generation]]  ...[[Large Language Model (LLM)|LLM]]  ...[[Natural Language Tools & Services|Tools & Services]]
-* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]]