Revision as of 14:12, 25 February 2023

Natural Language Processing (NLP) ...Generation ...LLM ...Tools & Services
Assistants ... Hybrid Assistants ... Agents ... Negotiation
Attention Mechanism ...Transformer Model ...Generative Pre-trained Transformer (GPT)
Models:
- AlexaTM | Amazon 20B
- Alpa ... serving large models like GPT-3 simple, affordable, accessible
- Bidirectional Encoder Representations from Transformers (BERT) 340M
- BioGPT ... Microsoft language model trained for biomedical tasks
- BLOOM ... Big Science Language Open-science Open-access Multilingual ... 176B
- Cedille ... open-source French language model 6B
- ChatGPT | OpenAI
  - ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review
    - Attention Mechanism ...Transformer Model ...Generative Pre-trained Transformer (GPT)
    - Reinforcement Learning (RL) from Human Feedback (RLHF)
    - Supervised Learning
    - Proximal Policy Optimization (PPO)
- Chinchilla | DeepMind 70B
- ctrl ... a Conditional Transformer Language Model for Controllable Generation | Salesforce
- Codex | OpenAI ... translates natural language into code
- Dataflow-as-a-Service | SambaNova
- DialogGPT ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs
- Flamingo | DeepMind ... Flamingo Pytorch 80B
- GLM-130B ... Open Bilingual Pre-Trained Model 130B
- GLaM | Google
- Gopher | DeepMind 280B
- GShard | Google ... Scaling Giant Models with Conditional Computation and Automatic Sharding
- GPT-2 | OpenAI 1.5B
- GPT-3 | OpenAI 175B
- GPT-Neo ... Open-source GPT-3 by EleutherAI 20B
- InstructGPT ... OpenAI 1.3B InstructGPT model over outputs from a 175B GPT-3 model
- Jurassic-1 ... huge 178B language model to rival OpenAI's GPT-3
- LaMDA | Google ... experimental language model 137B
- LLaMA ... Large Language Model Meta AI, 13B and 65B parameter versions
- Luminous ... Europe 200B
- Macaw | AI2 11B
- Med-PaLM ... aligned to the medical domain
- Megatron ... Monolithic Transformer Language NLP Model 11B
- minGPT | Andrej Karpathy - GitHub
- Muse ... VLM-4, a set of natively trained large Language Models in French, Italian, Spanish, German, and English
- MT-NLG 530B
- nanoGPT ... for training/finetuning medium-sized GPTs
- NLLB | Meta 54.5B & 200B parameters; NLLB-200
- OpenGPT-X ... model for Europe
- OPT-175B...Facebook-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... Facebook 175B ... BlenderBot 175B
- Palmyra | Hugging Face ... a privacy-first LLM for enterprises
- Pathways Language Model (PaLM) 540B
- PLATO-XL | Baidu ... 11B
- RETRO | DeepMind
- Switch Transformers | Google Brain ... trillion parameters
- Textless NLP ... Generating expressive speech from raw audio
- T0pp | Hugging Face
- Toolformer | Meta ... models can teach themselves to use tools and APIs
- Turing-NLG | Microsoft
- UnifiedQA ... single QA system
- WebGPT ... GPT-3 version that can search the web
- Wu Dao 1.0 (Enlightment 1.0) ... China’s first homegrown super-scale intelligent model
- XGLM | Hugging Face 7.5B
- YaLM ... Yandex YaLM 100B
- Yuan 1.0 | Inspur ... 245B

Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson

RETRO | DeepMind

DeepMind says its new language model can beat others 25 times its size | Will Douglas Heaven - MIT Technology Review ... RETRO (for “Retrieval-Enhanced Transformer”), uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. - RETRO | S. Borgeaud, A. Mensch, J. Hoffmann, & L. Sifre DeepMind

Are Bigger Language Models Better? \| DeepMind Gopher and RETRO Retrieval-Enhanced Transformer (RETRO) is autoregressive language model from DeepMind’s Improving Language Models by Retrieving from Trillions of Tokens (2021) Jordan Harrod

DeepMind's RETRO Transformer Model Retrieval-Enhanced Language Model cross-attends trillions of tokens for SoTA on Wikitext103 and The Pile with 25x fewer parameters. Vaclav Kosar

@@ Line 55: / Line 55: @@
 ** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)]   540B
 ** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu]  ... 11B
-** [[Hybrid Assistants#RETRO | RETRO]] | [[Google | DeepMind]]
+** [[Large Language Model (LLM)#RETRO | RETRO]] | [[Google | DeepMind]]
 ** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain  ... trillion parameters
 ** [https://ai.facebook.com/blog/textless-nlp-generating-expressive-speech-from-raw-audio/  Textless NLP  ... Generating expressive speech from raw audio]
@@ Line 73: / Line 73: @@
 [https://lifearchitect.ai/models/ Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson]
+= <span id="RETRO"></span>RETRO | DeepMind =
+[https://www.youtube.com/results?search_query=RETRO+DeepMind+natural+language+agent YouTube search...]
+[https://www.google.com/search?q=RETRO+DeepMind+natural+language+agent ...Google search]
+* [http://www.technologyreview.com/2021/12/08/1041557/deepmind-language-model-beat-others-25-times-size-gpt-3-megatron/ DeepMind says its new language model can beat others 25 times its size | Will Douglas Heaven - MIT Technology Review]  ... RETRO (for “Retrieval-Enhanced Transformer”), uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network
+We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. - [https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens RETRO | S. Borgeaud, A. Mensch, J. Hoffmann, & L. Sifre] [[Google | DeepMind]]
+{|<!-- T -->
+| valign="top" |
+{| class="wikitable" style="width: 550px;"
+||
+<youtube>IaltsI1BCro</youtube>
+<b>Are Bigger Language Models Better? | DeepMind Gopher and RETRO
+</b><br>Retrieval-Enhanced Transformer (RETRO) is autoregressive language model
+from DeepMind’s Improving Language Models by Retrieving from Trillions of Tokens (2021) Jordan Harrod
+|}
+|<!-- M -->
+| valign="top" |
+{| class="wikitable" style="width: 550px;"
+||
+<youtube>-93KBOg77Sg</youtube>
+<b>DeepMind's RETRO Transformer Model
+</b><br>Retrieval-Enhanced Language Model cross-attends trillions of tokens for SoTA on Wikitext103 and The Pile with 25x fewer parameters. Vaclav Kosar
+|}
+|}<!-- B -->

Difference between revisions of "Large Language Model (LLM)"

Revision as of 14:12, 25 February 2023

RETRO | DeepMind

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools