Transformer-XL

From

Revision as of 16:03, 19 January 2019 by BPeat (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

YouTube search... ...Google search

combines the two leading architectures for language modeling — [1] Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN) to handles the input tokens — words or characters — one by one to learn the relationship between them, and [2] Attention Mechanism/Model - Transformer Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism. Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model; Summary of “Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context” | Rani Horev - Towards Data Science

Retrieved from "https://primo.ai/index.php?title=Transformer-XL&oldid=5851"