Difference between revisions of "Large Language Model (LLM)"

From
Jump to: navigation, search
m
m
Line 55: Line 55:
 
** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)]  540B
 
** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)]  540B
 
** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu]  ... 11B  
 
** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu]  ... 11B  
** [[Hybrid Assistants#RETRO | RETRO]] | [[Google | DeepMind]]  
+
** [[Large Language Model (LLM)#RETRO | RETRO]] | [[Google | DeepMind]]  
 
** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain  ... trillion parameters
 
** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain  ... trillion parameters
 
** [https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/  Textless NLP  ... Generating expressive speech from raw audio]
 
** [https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/  Textless NLP  ... Generating expressive speech from raw audio]
Line 73: Line 73:
  
 
[https://lifearchitect.ai/models/ Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson]
 
[https://lifearchitect.ai/models/ Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson]
 +
 +
 +
= <span id="RETRO"></span>RETRO | DeepMind =
 +
 +
[https://www.youtube.com/results?search_query=RETRO+DeepMind+natural+language+agent YouTube search...]
 +
[https://www.google.com/search?q=RETRO+DeepMind+natural+language+agent ...Google search]
 +
 +
* [http://www.technologyreview.com/2021/12/08/1041557/deepmind-language-model-beat-others-25-times-size-gpt-3-megatron/ DeepMind says its new language model can beat others 25 times its size | Will Douglas Heaven - MIT Technology Review]  ... RETRO (for “Retrieval-Enhanced Transformer”), uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network
 +
 +
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. - [https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens RETRO | S. Borgeaud, A. Mensch, J. Hoffmann, & L. Sifre] [[Google | DeepMind]]
 +
 +
 +
{|<!-- T -->
 +
| valign="top" |
 +
{| class="wikitable" style="width: 550px;"
 +
||
 +
<youtube>IaltsI1BCro</youtube>
 +
<b>Are Bigger Language Models Better? | DeepMind Gopher and RETRO
 +
</b><br>Retrieval-Enhanced Transformer (RETRO) is autoregressive language model
 +
from DeepMind’s Improving Language Models by Retrieving from Trillions of Tokens (2021) Jordan Harrod
 +
|}
 +
|<!-- M -->
 +
| valign="top" |
 +
{| class="wikitable" style="width: 550px;"
 +
||
 +
<youtube>-93KBOg77Sg</youtube>
 +
<b>DeepMind's RETRO Transformer Model
 +
</b><br>Retrieval-Enhanced Language Model cross-attends trillions of tokens for SoTA on Wikitext103 and The Pile with 25x fewer parameters. Vaclav Kosar
 +
|}
 +
|}<!-- B -->

Revision as of 14:12, 25 February 2023

YouTube search... ...Google search


Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson


RETRO | DeepMind

YouTube search... ...Google search

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. - RETRO | S. Borgeaud, A. Mensch, J. Hoffmann, & L. Sifre DeepMind


Are Bigger Language Models Better? | DeepMind Gopher and RETRO
Retrieval-Enhanced Transformer (RETRO) is autoregressive language model from DeepMind’s Improving Language Models by Retrieving from Trillions of Tokens (2021) Jordan Harrod

DeepMind's RETRO Transformer Model
Retrieval-Enhanced Language Model cross-attends trillions of tokens for SoTA on Wikitext103 and The Pile with 25x fewer parameters. Vaclav Kosar