Difference between revisions of "Transformer-XL"
| Line 1: | Line 1: | ||
| + | {{#seo: | ||
| + | |title=PRIMO.ai | ||
| + | |titlemode=append | ||
| + | |keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, Microsoft, Azure, Amazon, AWS | ||
| + | |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | ||
| + | }} | ||
[http://www.youtube.com/results?search_query=Transformer-XL+attention+model+ai+deep+learning+model YouTube search...] | [http://www.youtube.com/results?search_query=Transformer-XL+attention+model+ai+deep+learning+model YouTube search...] | ||
[http://www.google.com/search?q=Transformer+XL+attention+model+deep+machine+learning+ML ...Google search] | [http://www.google.com/search?q=Transformer+XL+attention+model+deep+machine+learning+ML ...Google search] | ||
Revision as of 18:15, 2 February 2019
YouTube search... ...Google search
- BERT
- A Light Introduction to Transformer-XL | Elvis - Medium
- Transformer-XL Explained: Combining Transformers and RNNs into a State-of-the-art Language Model | Rani Horev - Towards Data Science
- Transformer-XL: Language Modeling with Longer-Term Dependency | Z. Dai, Z. Yang, Y. Yang, W.W. Cohen, J. Carbonell, Quoc V. Le, ad R. Salakhutdinov
- Natural Language Processing (NLP)
- Memory Networks
- Autoencoder (AE) / Encoder-Decoder
Combines the two leading architectures for language modeling:
- Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Recurrent Neural Network (RNN) to handles the input tokens — words or characters — one by one to learn the relationship between them
- Attention Mechanism/Model - Transformer Model to receive a segment of tokens and learns the dependencies between at once them using an attention mechanism.