Difference between revisions of "Large Language Model (LLM)"
m |
m |
||
| Line 11: | Line 11: | ||
* [[Assistants]] ... [[Hybrid Assistants]] ... [[Agents]] ... [[Negotiation]] | * [[Assistants]] ... [[Hybrid Assistants]] ... [[Agents]] ... [[Negotiation]] | ||
* Models: | * Models: | ||
| − | ** [https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning AlexaTM |] [[Amazon]] 20B | + | ** [https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning AlexaTM |] [[Amazon]] 20B |
** [https://opt.alpa.ai/ Alpa] ... serving large models like GPT-3 simple, affordable, accessible | ** [https://opt.alpa.ai/ Alpa] ... serving large models like GPT-3 simple, affordable, accessible | ||
| − | ** [[Bidirectional Encoder Representations from Transformers (BERT)]] 340M | + | ** [[Bidirectional Encoder Representations from Transformers (BERT)]] 340M |
** [https://github.com/microsoft/BioGPT BioGPT] ... [[Microsoft]] language model trained for biomedical tasks | ** [https://github.com/microsoft/BioGPT BioGPT] ... [[Microsoft]] language model trained for biomedical tasks | ||
** [https://bigscience.notion.site/BLOOM-BigScience-176B-Model-ad073ca07cdf479398d5f95d88e218c4 BLOOM] ... Big Science Language Open-science Open-access Multilingual ... 176B | ** [https://bigscience.notion.site/BLOOM-BigScience-176B-Model-ad073ca07cdf479398d5f95d88e218c4 BLOOM] ... Big Science Language Open-science Open-access Multilingual ... 176B | ||
| Line 24: | Line 24: | ||
**** [[Supervised]] Learning | **** [[Supervised]] Learning | ||
**** [[Proximal Policy Optimization (PPO)]] | **** [[Proximal Policy Optimization (PPO)]] | ||
| − | ** [https://www.deepmind.com/publications/an-empirical-analysis-of-compute-optimal-large-language-model-training Chinchilla |] [[Google | DeepMind]] 70B | + | ** [https://www.deepmind.com/publications/an-empirical-analysis-of-compute-optimal-large-language-model-training Chinchilla |] [[Google | DeepMind]] 70B |
** [https://arxiv.org/abs/2203.15556 ctrl] ... a Conditional Transformer Language Model for Controllable Generation | Salesforce | ** [https://arxiv.org/abs/2203.15556 ctrl] ... a Conditional Transformer Language Model for Controllable Generation | Salesforce | ||
** [https://openai.com/ Codex |] [[OpenAI]] ... translates natural language into code | ** [https://openai.com/ Codex |] [[OpenAI]] ... translates natural language into code | ||
** [https://sambanova.ai/solutions/gpt/ Dataflow-as-a-Service | SambaNova] | ** [https://sambanova.ai/solutions/gpt/ Dataflow-as-a-Service | SambaNova] | ||
** [https://www.infoq.com/news/2019/11/microsoft-ai-conversation/ DialogGPT] ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs | ** [https://www.infoq.com/news/2019/11/microsoft-ai-conversation/ DialogGPT] ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs | ||
| − | ** [https://medium.com/syncedreview/deepminds-flamingo-visual-language-model-demonstrates-sota-few-shot-multimodal-learning-f795c3034b94 Flamingo |] [[Google|DeepMind]] ... [https://github.com/lucidrains/flamingo-pytorch Flamingo Pytorch] 80B | + | ** [https://medium.com/syncedreview/deepminds-flamingo-visual-language-model-demonstrates-sota-few-shot-multimodal-learning-f795c3034b94 Flamingo |] [[Google|DeepMind]] ... [https://github.com/lucidrains/flamingo-pytorch Flamingo Pytorch] 80B |
** [https://github.com/THUDM/GLM-130B GLM-130B] ... Open Bilingual Pre-Trained Model | ** [https://github.com/THUDM/GLM-130B GLM-130B] ... Open Bilingual Pre-Trained Model | ||
** [https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval Gopher |] [[Google | DeepMind]] | ** [https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval Gopher |] [[Google | DeepMind]] | ||
| Line 40: | Line 40: | ||
** [https://www.blog.google/technology/ai/lamda/ LaMDA |] [[Google]] ... experimental language model | ** [https://www.blog.google/technology/ai/lamda/ LaMDA |] [[Google]] ... experimental language model | ||
** [https://www.reuters.com/technology/meta-launch-ai-language-model-llama-2023-02-24/ LLaMA] ... Large Language Model [[Meta]] AI, 13B and 65B parameter versions | ** [https://www.reuters.com/technology/meta-launch-ai-language-model-llama-2023-02-24/ LLaMA] ... Large Language Model [[Meta]] AI, 13B and 65B parameter versions | ||
| − | ** [https://www.aleph-alpha.com/luminous-explore-a-model-for-world-class-semantic-representation Luminous] ... Europe 200B | + | ** [https://www.aleph-alpha.com/luminous-explore-a-model-for-world-class-semantic-representation Luminous] ... Europe 200B |
** [https://github.com/allenai/macaw Macaw | AI2] | ** [https://github.com/allenai/macaw Macaw | AI2] | ||
** [https://arxiv.org/pdf/2212.13138.pdf Med-PaLM] ... aligned to the medical domain | ** [https://arxiv.org/pdf/2212.13138.pdf Med-PaLM] ... aligned to the medical domain | ||
| Line 49: | Line 49: | ||
** [https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation/ NLLB |] [[Meta]] 54.5B & 200B parameters; NLLB-200 | ** [https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation/ NLLB |] [[Meta]] 54.5B & 200B parameters; NLLB-200 | ||
** [https://idw-online.de/en/news786967 OpenGPT-X] ... model for Europe | ** [https://idw-online.de/en/news786967 OpenGPT-X] ... model for Europe | ||
| − | ** [https://www.reuters.com/technology/facebook-owner-meta-opens-access-ai-large-language-model-2022-05-03/ OPT-175B]...[[Meta|Facebook]]-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... [[Meta|Facebook]] | + | ** [https://www.reuters.com/technology/facebook-owner-meta-opens-access-ai-large-language-model-2022-05-03/ OPT-175B]...[[Meta|Facebook]]-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... [[Meta|Facebook]] 175B ... BlenderBot |
** [https://huggingface.co/Writer/palmyra-base Palmyra |] [[Hugging Face]] ... a privacy-first LLM for enterprises | ** [https://huggingface.co/Writer/palmyra-base Palmyra |] [[Hugging Face]] ... a privacy-first LLM for enterprises | ||
| − | ** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)] 540B | + | ** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)] 540B |
| − | ** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu] ... 11B | + | ** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu] ... 11B |
** [https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens RETRO |] [[Google | DeepMind]] | ** [https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens RETRO |] [[Google | DeepMind]] | ||
** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain ... trillion parameters | ** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain ... trillion parameters | ||
| Line 62: | Line 62: | ||
** [https://openai.com/blog/webgpt/ WebGPT] ... GPT-3 version that can search the web | ** [https://openai.com/blog/webgpt/ WebGPT] ... GPT-3 version that can search the web | ||
** [https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/ Wu Dao 1.0 (Enlightment 1.0)] ... China’s first homegrown super-scale intelligent model | ** [https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/ Wu Dao 1.0 (Enlightment 1.0)] ... China’s first homegrown super-scale intelligent model | ||
| − | ** [https://github.com/yandex/YaLM-100B YaLM] ... Yandex YaLM 100B | + | ** [https://github.com/yandex/YaLM-100B YaLM] ... Yandex YaLM 100B |
| − | ** [https://arxiv.org/abs/2110.04725 Yuan 1.0 | Inspur] ... 245B | + | ** [https://arxiv.org/abs/2110.04725 Yuan 1.0 | Inspur] ... 245B |
* [https://openai.com/blog/gpt-2-6-month-follow-up/ OpenAI Blog] | [[OpenAI]] | * [https://openai.com/blog/gpt-2-6-month-follow-up/ OpenAI Blog] | [[OpenAI]] | ||
* [[Attention]] Mechanism/[[Transformer]] Model | * [[Attention]] Mechanism/[[Transformer]] Model | ||
Revision as of 11:27, 25 February 2023
YouTube search... ...Google search
- Natural Language Processing (NLP) ...Generation ...LLM ...Tools & Services
- Assistants ... Hybrid Assistants ... Agents ... Negotiation
- Models:
- AlexaTM | Amazon 20B
- Alpa ... serving large models like GPT-3 simple, affordable, accessible
- Bidirectional Encoder Representations from Transformers (BERT) 340M
- BioGPT ... Microsoft language model trained for biomedical tasks
- BLOOM ... Big Science Language Open-science Open-access Multilingual ... 176B
- Cedille ... open-source French language model
- ChatGPT | OpenAI
- Chinchilla | DeepMind 70B
- ctrl ... a Conditional Transformer Language Model for Controllable Generation | Salesforce
- Codex | OpenAI ... translates natural language into code
- Dataflow-as-a-Service | SambaNova
- DialogGPT ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs
- Flamingo | DeepMind ... Flamingo Pytorch 80B
- GLM-130B ... Open Bilingual Pre-Trained Model
- Gopher | DeepMind
- GLaM | Google
- GShard | Google ... Scaling Giant Models with Conditional Computation and Automatic Sharding
- GPT-2 | OpenAI ... Generative Pre-trained Transformer 2 by OpenAI
- GPT-Neo ... Open-source GPT-3 by EleutherAI
- InstructGPT ... OpenAI 1.3B InstructGPT model over outputs from a 175B GPT-3 model
- Jurassic-1 ... huge 178B language model to rival OpenAI's GPT-3]
- LaMDA | Google ... experimental language model
- LLaMA ... Large Language Model Meta AI, 13B and 65B parameter versions
- Luminous ... Europe 200B
- Macaw | AI2
- Med-PaLM ... aligned to the medical domain
- minGPT | Andrej Karpathy - GitHub
- Megatron NLG ... Monolithic Transformer Language NLP Model Triple the Size of OpenAI’s GPT-3
- Muse ... VLM-4, a set of natively trained large Language Models in French, Italian, Spanish, German, and English
- nanoGPT ... for training/finetuning medium-sized GPTs
- NLLB | Meta 54.5B & 200B parameters; NLLB-200
- OpenGPT-X ... model for Europe
- OPT-175B...Facebook-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... Facebook 175B ... BlenderBot
- Palmyra | Hugging Face ... a privacy-first LLM for enterprises
- Pathways Language Model (PaLM) 540B
- PLATO-XL | Baidu ... 11B
- RETRO | DeepMind
- Switch Transformers | Google Brain ... trillion parameters
- Textless NLP ... Generating expressive speech from raw audio
- T0pp | Hugging Face
- Toolformer | Meta ... models can teach themselves to use tools and APIs
- Turing-NLG | Microsoft
- UnifiedQA ... single QA system
- WebGPT ... GPT-3 version that can search the web
- Wu Dao 1.0 (Enlightment 1.0) ... China’s first homegrown super-scale intelligent model
- YaLM ... Yandex YaLM 100B
- Yuan 1.0 | Inspur ... 245B
- OpenAI Blog | OpenAI
- Attention Mechanism/Transformer Model
- Generative Pre-trained Transformer (GPT)
- SambaNova Systems ... Dataflow-as-a-Service GPT
Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson