Revision as of 21:08, 13 March 2023

YouTube search... ...Google search ...Google News ...Bing News

Natural Language Processing (NLP) ...Generation ...LLM ...Tools & Services
Assistants ... Hybrid Assistants ... Agents ... Negotiation
Attention Mechanism ...Transformer Model ...Generative Pre-trained Transformer (GPT)
Generative AI ... OpenAI's ChatGPT ... Perplexity ... Microsoft's BingAI ... You ...Google's Bard
Models:
- AlexaTM | Amazon 20B
- Alpa ... serving large models like GPT-3 simple, affordable, accessible
- Bidirectional Encoder Representations from Transformers (BERT) 340M
- BioGPT ... Microsoft language model trained for biomedical tasks
- BLOOM ... Big Science Language Open-science Open-access Multilingual ... 176B
- Cedille ... open-source French language model 6B
- ChatGPT | OpenAI
  - ChatGPT is everywhere. Here’s where it came from | Will Douglas Heaven - MIT Technology Review
    - Attention Mechanism ...Transformer Model ...Generative Pre-trained Transformer (GPT)
    - Reinforcement Learning (RL) from Human Feedback (RLHF)
    - Supervised Learning
    - Proximal Policy Optimization (PPO)
- Chinchilla | DeepMind 70B
- ctrl ... a Conditional Transformer Language Model for Controllable Generation | Salesforce
- Codex | OpenAI ... translates natural language into code
- Dataflow-as-a-Service | SambaNova
- DialogGPT ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs
- Flamingo | DeepMind ... Flamingo Pytorch 80B
- GLM-130B ... Open Bilingual Pre-Trained Model 130B
- GLaM | Google
- Gopher | DeepMind 280B
- GShard | Google ... Scaling Giant Models with Conditional Computation and Automatic Sharding
- GPT-2 | OpenAI 1.5B
- GPT-3 | OpenAI 175B
- GPT-Neo ... Open-source GPT-3 by EleutherAI 20B
- InstructGPT ... OpenAI 1.3B InstructGPT model over outputs from a 175B GPT-3 model
- Jurassic-1 ... huge 178B language model to rival OpenAI's GPT-3
- LaMDA | Google ... experimental language model 137B
- LLaMA ... Large Language Model Meta AI, 13B and 65B parameter versions
- Luminous ... Europe 200B
- Macaw | AI2 11B
- Med-PaLM ... aligned to the medical domain
- Megatron ... Monolithic Transformer Language NLP Model 11B
- minGPT | Andrej Karpathy - GitHub
- Muse ... VLM-4, a set of natively trained large Language Models in French, Italian, Spanish, German, and English
- MT-NLG 530B
- nanoGPT ... for training/finetuning medium-sized GPTs
- NLLB | Meta 54.5B & 200B parameters; NLLB-200
- OpenChatKit | TogetherCompute ... The first open-source ChatGPT alternative released; a 20B chat-GPT model under the Apache-2.0 license, which is available for free on Hugging Face.
- OpenGPT-X ... model for Europe
- OPT-175B...Facebook-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... Facebook 175B ... BlenderBot 175B
- Palmyra | Hugging Face ... a privacy-first LLM for enterprises
- Pathways Language Model (PaLM) 540B
- PLATO-XL | Baidu ... 11B
- RETRO | DeepMind
- Switch Transformers | Google Brain ... trillion parameters
- Textless NLP ... Generating expressive speech from raw audio
- T0pp | Hugging Face
- Toolformer | Meta ... models can teach themselves to use tools and APIs
- Turing-NLG | Microsoft
- UnifiedQA ... single QA system
- WebGPT ... GPT-3 version that can search the web
- Wu Dao 1.0 (Enlightment 1.0) ... China’s first homegrown super-scale intelligent model
- XGLM | Hugging Face 7.5B
- YaLM ... Yandex YaLM 100B
- Yuan 1.0 | Inspur ... 245B

Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson

RETRO | DeepMind

DeepMind says its new language model can beat others 25 times its size | Will Douglas Heaven - MIT Technology Review ... RETRO (for “Retrieval-Enhanced Transformer”), uses an external memory to look up passages of text on the fly, avoiding some of the costs of training a vast neural network

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (Retro) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25× fewer parameters. After fine-tuning, Retro performance translates to downstream knowledge-intensive tasks such as question answering. Retro combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train Retro from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale. - RETRO | S. Borgeaud, A. Mensch, J. Hoffmann, & L. Sifre DeepMind

Are Bigger Language Models Better? \| DeepMind Gopher and RETRO Retrieval-Enhanced Transformer (RETRO) is autoregressive language model from DeepMind’s Improving Language Models by Retrieving from Trillions of Tokens (2021) Jordan Harrod

DeepMind's RETRO Transformer Model Retrieval-Enhanced Language Model cross-attends trillions of tokens for SoTA on Wikitext103 and The Pile with 25x fewer parameters. Vaclav Kosar

@@ Line 44: / Line 44: @@
 ** [https://uploads-ssl.webflow.com/60fd4503684b466578c0d307/61138924626a6981ee09caf6_jurassic_tech_paper.pdf  Jurassic-1] ... huge 178B language model to rival [[OpenAI]]'s GPT-3
 ** [https://www.blog.google/technology/ai/lamda/ LaMDA |] [[Google]]  ... experimental language model  137B
-** [https://www.reuters.com/technology/meta-launch-ai-language-model-llama-2023-02-24/ LLaMA] ... Large Language Model [[Meta]] AI, 13B and 65B parameter versions
+** [[LLaMA]] ... Large Language Model [[Meta]] AI, 13B and 65B parameter versions
 ** [https://www.aleph-alpha.com/luminous-explore-a-model-for-world-class-semantic-representation Luminous] ... Europe  200B
 ** [https://github.com/allenai/macaw Macaw | AI2]  11B

Difference between revisions of "Large Language Model (LLM)"

Revision as of 21:08, 13 March 2023

RETRO | DeepMind

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools