Difference between revisions of "Transformer-XL"

Revision as of 07:14, 30 June 2019

Combines the two leading architectures for language modeling:

Recurrent Neural Network (RNN) to handles the input tokens — words or characters — one by one to learn the relationship between them
Attention Mechanism/Transformer Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism.

0*mrV1VMF_G2mhQ9Jj.png

@@ Line 19: / Line 19: @@
 Combines the two leading architectures for language modeling:
-# [[Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN)]] to handles the input tokens — words or characters — one by one to learn the relationship between them
+# [[Recurrent Neural Network (RNN)]] to handles the input tokens — words or characters — one by one to learn the relationship between them
 # [[Attention]] Mechanism/[[Transformer]] Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism.