Transformer-XL
YouTube search... ...Google search
- A Light Introduction to Transformer-XL | Elvis - Medium
- Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)
- Natural Language Processing (NLP)
- Memory Networks
- Attention Mechanism/Model - Transformer Model
- Autoencoder (AE) / Encoder-Decoder
combines the two leading architectures for language modeling:
- 1 Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN) to handles the input tokens — words or characters — one by one to learn the relationship between them
- 2 Attention Mechanism/Model - Transformer Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism. Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model; Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” | Rani Horev - Towards Data Science