Difference between revisions of "Proximal Policy Optimization (PPO)"

From
Jump to: navigation, search
m
m
Line 34: Line 34:
 
** [[Bidirectional Encoder Representations from Transformers (BERT)]]  ... a better model, but less investment than the larger [[OpenAI]] organization
 
** [[Bidirectional Encoder Representations from Transformers (BERT)]]  ... a better model, but less investment than the larger [[OpenAI]] organization
 
** [[ChatGPT]] | [[OpenAI]]:
 
** [[ChatGPT]] | [[OpenAI]]:
*** [[Transformer]] / [[Attention]] Mechanism
+
*** [[Attention]] Mechanism  ...[[Transformer]] Model  ...[[Generative Pre-trained Transformer (GPT)]]
*** [[Generative Pre-trained Transformer (GPT)]]
 
 
*** [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
 
*** [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
 
*** [[Supervised]] Learning
 
*** [[Supervised]] Learning

Revision as of 11:13, 19 March 2023