Difference between revisions of "Transformer-XL"

From
Jump to: navigation, search
Line 2: Line 2:
 
[http://www.google.com/search?q=Transformer+XL+attention+model+deep+machine+learning+ML ...Google search]
 
[http://www.google.com/search?q=Transformer+XL+attention+model+deep+machine+learning+ML ...Google search]
  
 +
* [http://medium.com/dair-ai/a-light-introduction-to-transformer-xl-be5737feb13 A Light Introduction to Transformer-XL | Elvis - Medium]
 
* [[Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)]]
 
* [[Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)]]
 
* [[Natural Language Processing (NLP)]]
 
* [[Natural Language Processing (NLP)]]
 
* [[Memory Networks]]
 
* [[Memory Networks]]
 
* [[Attention Mechanism/Model - Transformer Model]]
 
* [[Attention Mechanism/Model - Transformer Model]]
 +
  
 
combines the two leading architectures for language modeling — [1] [[Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)]] to handles the input tokens — words or characters — one by one to learn the relationship between them, and [2] [[Attention Mechanism/Model - Transformer Model]] to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism. [http://towardsdatascience.com/transformer-xl-explained-combining-transformers-and-rnns-into-a-state-of-the-art-language-model-c0cfe9e5a924 Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model; Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” | Rani Horev - Towards Data Science]
 
combines the two leading architectures for language modeling — [1] [[Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)]] to handles the input tokens — words or characters — one by one to learn the relationship between them, and [2] [[Attention Mechanism/Model - Transformer Model]] to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism. [http://towardsdatascience.com/transformer-xl-explained-combining-transformers-and-rnns-into-a-state-of-the-art-language-model-c0cfe9e5a924 Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model; Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” | Rani Horev - Towards Data Science]
  
  
http://skymind.ai/images/wiki/attention_mechanism.png
+
http://cdn-images-1.medium.com/max/2000/0*mrV1VMF_G2mhQ9Jj.png
  
  
 
<youtube>W2rWgXJBZhU</youtube>
 
<youtube>W2rWgXJBZhU</youtube>

Revision as of 16:06, 19 January 2019