Difference between revisions of "Large Language Model (LLM)"
m |
m |
||
| Line 55: | Line 55: | ||
** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)] 540B | ** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)] 540B | ||
** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu] ... 11B | ** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu] ... 11B | ||
| − | ** [[ | + | ** [[Large Language Model (LLM)#RETRO | RETRO]] | [[Google | DeepMind]] |
** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain ... trillion parameters | ** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain ... trillion parameters | ||
** [https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/ Textless NLP ... Generating expressive speech from raw audio] | ** [https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/ Textless NLP ... Generating expressive speech from raw audio] | ||
| Line 73: | Line 73: | ||
[https://lifearchitect.ai/models/ Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson] | [https://lifearchitect.ai/models/ Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson] | ||
| + | |||
| + | |||
| + | = <span id="RETRO"></span>RETRO | DeepMind = | ||
| + | |||
| + | [https://www.youtube.com/results?search_query=RETRO+DeepMind+natural+language+agent YouTube search...] | ||
| + | [https://www.google.com/search?q=RETRO+DeepMind+natural+language+agent ...Google search] | ||
| + | |||
| + | * [http://www.technologyreview.com/2021/12/08/1041557/deepmind-language-model-beat-others-25-times-size-gpt-3-megatron/ DeepMind says its new language model can beat others 25 times its size | Will Douglas Heaven - MIT Technology Review] ... RETRO (for “Retrieval-Enhanced Transformer”), uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network | ||
| + | |||
| + | We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. - [https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens RETRO | S. Borgeaud, A. Mensch, J. Hoffmann, & L. Sifre] [[Google | DeepMind]] | ||
| + | |||
| + | |||
| + | {|<!-- T --> | ||
| + | | valign="top" | | ||
| + | {| class="wikitable" style="width: 550px;" | ||
| + | || | ||
| + | <youtube>IaltsI1BCro</youtube> | ||
| + | <b>Are Bigger Language Models Better? | DeepMind Gopher and RETRO | ||
| + | </b><br>Retrieval-Enhanced Transformer (RETRO) is autoregressive language model | ||
| + | from DeepMind’s Improving Language Models by Retrieving from Trillions of Tokens (2021) Jordan Harrod | ||
| + | |} | ||
| + | |<!-- M --> | ||
| + | | valign="top" | | ||
| + | {| class="wikitable" style="width: 550px;" | ||
| + | || | ||
| + | <youtube>-93KBOg77Sg</youtube> | ||
| + | <b>DeepMind's RETRO Transformer Model | ||
| + | </b><br>Retrieval-Enhanced Language Model cross-attends trillions of tokens for SoTA on Wikitext103 and The Pile with 25x fewer parameters. Vaclav Kosar | ||
| + | |} | ||
| + | |}<!-- B --> | ||
Revision as of 14:12, 25 February 2023
YouTube search... ...Google search
- Natural Language Processing (NLP) ...Generation ...LLM ...Tools & Services
- Assistants ... Hybrid Assistants ... Agents ... Negotiation
- Attention Mechanism ...Transformer Model ...Generative Pre-trained Transformer (GPT)
- Models:
- AlexaTM | Amazon 20B
- Alpa ... serving large models like GPT-3 simple, affordable, accessible
- Bidirectional Encoder Representations from Transformers (BERT) 340M
- BioGPT ... Microsoft language model trained for biomedical tasks
- BLOOM ... Big Science Language Open-science Open-access Multilingual ... 176B
- Cedille ... open-source French language model 6B
- ChatGPT | OpenAI
- Chinchilla | DeepMind 70B
- ctrl ... a Conditional Transformer Language Model for Controllable Generation | Salesforce
- Codex | OpenAI ... translates natural language into code
- Dataflow-as-a-Service | SambaNova
- DialogGPT ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs
- Flamingo | DeepMind ... Flamingo Pytorch 80B
- GLM-130B ... Open Bilingual Pre-Trained Model 130B
- GLaM | Google
- Gopher | DeepMind 280B
- GShard | Google ... Scaling Giant Models with Conditional Computation and Automatic Sharding
- GPT-2 | OpenAI 1.5B
- GPT-3 | OpenAI 175B
- GPT-Neo ... Open-source GPT-3 by EleutherAI 20B
- InstructGPT ... OpenAI 1.3B InstructGPT model over outputs from a 175B GPT-3 model
- Jurassic-1 ... huge 178B language model to rival OpenAI's GPT-3
- LaMDA | Google ... experimental language model 137B
- LLaMA ... Large Language Model Meta AI, 13B and 65B parameter versions
- Luminous ... Europe 200B
- Macaw | AI2 11B
- Med-PaLM ... aligned to the medical domain
- Megatron ... Monolithic Transformer Language NLP Model 11B
- minGPT | Andrej Karpathy - GitHub
- Muse ... VLM-4, a set of natively trained large Language Models in French, Italian, Spanish, German, and English
- MT-NLG 530B
- nanoGPT ... for training/finetuning medium-sized GPTs
- NLLB | Meta 54.5B & 200B parameters; NLLB-200
- OpenGPT-X ... model for Europe
- OPT-175B...Facebook-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... Facebook 175B ... BlenderBot 175B
- Palmyra | Hugging Face ... a privacy-first LLM for enterprises
- Pathways Language Model (PaLM) 540B
- PLATO-XL | Baidu ... 11B
- RETRO | DeepMind
- Switch Transformers | Google Brain ... trillion parameters
- Textless NLP ... Generating expressive speech from raw audio
- T0pp | Hugging Face
- Toolformer | Meta ... models can teach themselves to use tools and APIs
- Turing-NLG | Microsoft
- UnifiedQA ... single QA system
- WebGPT ... GPT-3 version that can search the web
- Wu Dao 1.0 (Enlightment 1.0) ... China’s first homegrown super-scale intelligent model
- XGLM | Hugging Face 7.5B
- YaLM ... Yandex YaLM 100B
- Yuan 1.0 | Inspur ... 245B
Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson
RETRO | DeepMind
YouTube search... ...Google search
- DeepMind says its new language model can beat others 25 times its size | Will Douglas Heaven - MIT Technology Review ... RETRO (for “Retrieval-Enhanced Transformer”), uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. - RETRO | S. Borgeaud, A. Mensch, J. Hoffmann, & L. Sifre DeepMind
|
|