Difference between revisions of "Transformer-XL"
(Created page with "[http://www.youtube.com/results?search_query=Transformer+XL+attention+model+ai+deep+learning+model YouTube search...] [http://www.google.com/search?q=Transformer+XL+attention+...") |
|||
| Line 7: | Line 7: | ||
* [[Attention Mechanism/Model - Transformer Model]] | * [[Attention Mechanism/Model - Transformer Model]] | ||
| − | + | combines the two leading architectures for language modeling — [1] [[Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)]] to handles the input tokens — words or characters — one by one to learn the relationship between them, and [2] [[Attention Mechanism/Model - Transformer Model]] to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism. [http://towardsdatascience.com/transformer-xl-explained-combining-transformers-and-rnns-into-a-state-of-the-art-language-model-c0cfe9e5a924 Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model; Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” | Rani Horev - Towards Data Science] | |
| − | |||
| − | |||
Revision as of 16:03, 19 January 2019
YouTube search... ...Google search
- Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)
- Natural Language Processing (NLP)
- Memory Networks
- Attention Mechanism/Model - Transformer Model
combines the two leading architectures for language modeling — [1] Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN) to handles the input tokens — words or characters — one by one to learn the relationship between them, and [2] Attention Mechanism/Model - Transformer Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism. Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model; Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” | Rani Horev - Towards Data Science