Difference between revisions of "Transformer-XL"
| Line 4: | Line 4: | ||
* [[BERT]] | * [[BERT]] | ||
* [http://medium.com/dair-ai/a-light-introduction-to-transformer-xl-be5737feb13 A Light Introduction to Transformer-XL | Elvis - Medium] | * [http://medium.com/dair-ai/a-light-introduction-to-transformer-xl-be5737feb13 A Light Introduction to Transformer-XL | Elvis - Medium] | ||
| + | * [http://towardsdatascience.com/transformer-xl-explained-combining-transformers-and-rnns-into-a-state-of-the-art-language-model-c0cfe9e5a924 Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model | Rani Horev - Towards Data Science] | ||
| + | * [http://openreview.net/forum?id=HJePno0cYm Transformer-XL: Language Modeling with Longer-Term Dependency | Z. Dai, Z. Yang, Y. Yang, W.W. Cohen, J. Carbonell, Quoc V. Le, ad R. Salakhutdinov] | ||
* [[Natural Language Processing (NLP)]] | * [[Natural Language Processing (NLP)]] | ||
* [[Memory Networks]] | * [[Memory Networks]] | ||
* [[Autoencoder (AE) / Encoder-Decoder]] | * [[Autoencoder (AE) / Encoder-Decoder]] | ||
| − | + | ||
Revision as of 16:42, 19 January 2019
YouTube search... ...Google search
- BERT
- A Light Introduction to Transformer-XL | Elvis - Medium
- Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model | Rani Horev - Towards Data Science
- Transformer-XL: Language Modeling with Longer-Term Dependency | Z. Dai, Z. Yang, Y. Yang, W.W. Cohen, J. Carbonell, Quoc V. Le, ad R. Salakhutdinov
- Natural Language Processing (NLP)
- Memory Networks
- Autoencoder (AE) / Encoder-Decoder
Combines the two leading architectures for language modeling:
- Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN) to handles the input tokens — words or characters — one by one to learn the relationship between them
- Attention Mechanism/Model - Transformer Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism.