Difference between revisions of "Transformer-XL"

Revision as of 15:39, 28 April 2023

Combines the two leading architectures for language modeling:

Recurrent Neural Network (RNN) to handles the input tokens — words or characters — one by one to learn the relationship between them
Attention Mechanism/Transformer Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism.

0*mrV1VMF_G2mhQ9Jj.png

@@ Line 12: / Line 12: @@
 * [http://towardsdatascience.com/transformer-xl-explained-combining-transformers-and-rnns-into-a-state-of-the-art-language-model-c0cfe9e5a924 Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model | Rani Horev - Towards Data Science]
 * [http://openreview.net/forum?id=HJePno0cYm Transformer-XL: Language Modeling with Longer-Term Dependency | Z. Dai, Z. Yang, Y. Yang, W.W. Cohen, J. Carbonell, Quoc V. Le, ad R. Salakhutdinov]
-* [[Natural Language Processing (NLP)]]
+* [[Large Language Model (LLM)]] ... [[Natural Language Processing (NLP)]]  ...[[Natural Language Generation (NLG)|Generation]] ... [[Natural Language Classification (NLC)|Classification]] ...  [[Natural Language Processing (NLP)#Natural Language Understanding (NLU)|Understanding]] ... [[Language Translation|Translation]] ... [[Natural Language Tools & Services|Tools & Services]]
 * [[Memory Networks]]
 * [[Autoencoder (AE) / Encoder-Decoder]]