Difference between revisions of "Transformer-XL"

From
Jump to: navigation, search
Line 20: Line 20:
 
Combines the two leading architectures for language modeling:
 
Combines the two leading architectures for language modeling:
 
# [[Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)]] to handles the input tokens — words or characters — one by one to learn the relationship between them
 
# [[Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)]] to handles the input tokens — words or characters — one by one to learn the relationship between them
# [[Attention Mechanism/Model - Transformer Model]] to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism.  
+
# [[Attention]] Mechanism/[[Transformer]] Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism.  
  
 
[http://towardsdatascience.com/transformer-xl-explained-combining-transformers-and-rnns-into-a-state-of-the-art-language-model-c0cfe9e5a924 Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model; Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” | Rani Horev - Towards Data Science]
 
[http://towardsdatascience.com/transformer-xl-explained-combining-transformers-and-rnns-into-a-state-of-the-art-language-model-c0cfe9e5a924 Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model; Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” | Rani Horev - Towards Data Science]

Revision as of 13:54, 29 June 2019