Difference between revisions of "Proximal Policy Optimization (PPO)"

From
Jump to: navigation, search
m
m
Line 32: Line 32:
 
* [[Natural Language Processing (NLP)]]  ...[[Natural Language Generation (NLG)|Generation]]  ...[[Large Language Model (LLM)|LLM]]  ...[[Natural Language Tools & Services|Tools & Services]]
 
* [[Natural Language Processing (NLP)]]  ...[[Natural Language Generation (NLG)|Generation]]  ...[[Large Language Model (LLM)|LLM]]  ...[[Natural Language Tools & Services|Tools & Services]]
 
* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]]
 
* [[Generative AI]]  ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]]  ... [[Microsoft]]'s [[BingAI]] ... [[You]] ...[[Google]]'s [[Bard]]
* [https://www.technologyreview.com/2023/02/08/1068068/chatgpt-is-everywhere-heres-where-it-came-from/ ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review]
+
 
** [[Sequence to Sequence (Seq2Seq)]]
+
 
** [[Recurrent Neural Network (RNN)]] 
 
** [[Long Short-Term Memory (LSTM)]]
 
** [[Bidirectional Encoder Representations from Transformers (BERT)]]  ... a better model, but less investment than the larger [[OpenAI]] organization
 
** [[ChatGPT]] | [[OpenAI]]:
 
*** [[Attention]] Mechanism  ...[[Transformer]] Model  ...[[Generative Pre-trained Transformer (GPT)]]
 
*** [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
 
*** [[Supervised]] Learning
 
*** [[Proximal Policy Optimization (PPO)]]
 
  
 
<youtube>hlv79rcHws0</youtube>
 
<youtube>hlv79rcHws0</youtube>

Revision as of 14:29, 19 March 2023