Difference between revisions of "Proximal Policy Optimization (PPO)"

From
Jump to: navigation, search
m
m
Line 24: Line 24:
 
*** [[Lifelong Latent Actor-Critic (LILAC)]]
 
*** [[Lifelong Latent Actor-Critic (LILAC)]]
 
** [[Hierarchical Reinforcement Learning (HRL)]]
 
** [[Hierarchical Reinforcement Learning (HRL)]]
* [[ChatGPT]]
+
* [https://www.technologyreview.com/2023/02/08/1068068/chatgpt-is-everywhere-heres-where-it-came-from/ ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review]
 +
** [[Sequence to Sequence (Seq2Seq)]]
 +
** [[Recurrent Neural Network (RNN)]] 
 +
** [[Long Short-Term Memory (LSTM)]]
 +
** [[Bidirectional Encoder Representations from Transformers (BERT)]]  ... a better model, but less investment than the larger [[OpenAI]] organization
 +
** [[ChatGPT]] | [[OpenAI]]:
 +
*** [[Transformer]] / [[Attention]] Mechanism
 +
*** [[Generative Pre-trained Transformer (GPT)]]
 +
*** [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
 +
*** [[Supervised]] Learning
 +
*** [[Proximal Policy Optimization (PPO)]]]
  
  

Revision as of 01:46, 12 February 2023