Difference between revisions of "Large Language Model (LLM)"

From
Jump to: navigation, search
m
m
Line 11: Line 11:
 
* [[Assistants]] ... [[Hybrid Assistants]]  ... [[Agents]]  ... [[Negotiation]]
 
* [[Assistants]] ... [[Hybrid Assistants]]  ... [[Agents]]  ... [[Negotiation]]
 
* Models:
 
* Models:
** [https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning AlexaTM |] [[Amazon]]  20B parametere
+
** [https://www.amazon.science/blog/20b-parameter-alexa-model-sets-new-marks-in-few-shot-learning AlexaTM |] [[Amazon]]  20B
 
** [https://opt.alpa.ai/ Alpa]  ... serving large models like GPT-3 simple, affordable, accessible  
 
** [https://opt.alpa.ai/ Alpa]  ... serving large models like GPT-3 simple, affordable, accessible  
** [[Bidirectional Encoder Representations from Transformers (BERT)]] 340M parameters
+
** [[Bidirectional Encoder Representations from Transformers (BERT)]] 340M
 
** [https://github.com/microsoft/BioGPT BioGPT]  ... [[Microsoft]] language model trained for biomedical tasks
 
** [https://github.com/microsoft/BioGPT BioGPT]  ... [[Microsoft]] language model trained for biomedical tasks
 
** [https://bigscience.notion.site/BLOOM-BigScience-176B-Model-ad073ca07cdf479398d5f95d88e218c4 BLOOM]  ... Big Science Language Open-science Open-access Multilingual  ... 176B
 
** [https://bigscience.notion.site/BLOOM-BigScience-176B-Model-ad073ca07cdf479398d5f95d88e218c4 BLOOM]  ... Big Science Language Open-science Open-access Multilingual  ... 176B
Line 24: Line 24:
 
**** [[Supervised]] Learning
 
**** [[Supervised]] Learning
 
**** [[Proximal Policy Optimization (PPO)]]
 
**** [[Proximal Policy Optimization (PPO)]]
** [https://www.deepmind.com/publications/an-empirical-analysis-of-compute-optimal-large-language-model-training Chinchilla |] [[Google | DeepMind]]  70B parameters
+
** [https://www.deepmind.com/publications/an-empirical-analysis-of-compute-optimal-large-language-model-training Chinchilla |] [[Google | DeepMind]]  70B
 
** [https://arxiv.org/abs/2203.15556 ctrl] ... a Conditional Transformer Language Model for Controllable Generation | Salesforce
 
** [https://arxiv.org/abs/2203.15556 ctrl] ... a Conditional Transformer Language Model for Controllable Generation | Salesforce
 
** [https://openai.com/ Codex |] [[OpenAI]] ... translates natural language into code
 
** [https://openai.com/ Codex |] [[OpenAI]] ... translates natural language into code
 
** [https://sambanova.ai/solutions/gpt/ Dataflow-as-a-Service | SambaNova]
 
** [https://sambanova.ai/solutions/gpt/ Dataflow-as-a-Service | SambaNova]
 
** [https://www.infoq.com/news/2019/11/microsoft-ai-conversation/ DialogGPT]  ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs  
 
** [https://www.infoq.com/news/2019/11/microsoft-ai-conversation/ DialogGPT]  ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs  
** [https://medium.com/syncedreview/deepminds-flamingo-visual-language-model-demonstrates-sota-few-shot-multimodal-learning-f795c3034b94 Flamingo |] [[Google|DeepMind]] ... [https://github.com/lucidrains/flamingo-pytorch Flamingo Pytorch] 80B parameters
+
** [https://medium.com/syncedreview/deepminds-flamingo-visual-language-model-demonstrates-sota-few-shot-multimodal-learning-f795c3034b94 Flamingo |] [[Google|DeepMind]] ... [https://github.com/lucidrains/flamingo-pytorch Flamingo Pytorch] 80B  
 
** [https://github.com/THUDM/GLM-130B GLM-130B]  ... Open Bilingual Pre-Trained Model
 
** [https://github.com/THUDM/GLM-130B GLM-130B]  ... Open Bilingual Pre-Trained Model
 
** [https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval Gopher |] [[Google | DeepMind]]
 
** [https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval Gopher |] [[Google | DeepMind]]
Line 40: Line 40:
 
** [https://www.blog.google/technology/ai/lamda/ LaMDA |] [[Google]]  ... experimental language model
 
** [https://www.blog.google/technology/ai/lamda/ LaMDA |] [[Google]]  ... experimental language model
 
** [https://www.reuters.com/technology/meta-launch-ai-language-model-llama-2023-02-24/ LLaMA] ... Large Language Model [[Meta]] AI, 13B and 65B parameter versions   
 
** [https://www.reuters.com/technology/meta-launch-ai-language-model-llama-2023-02-24/ LLaMA] ... Large Language Model [[Meta]] AI, 13B and 65B parameter versions   
** [https://www.aleph-alpha.com/luminous-explore-a-model-for-world-class-semantic-representation Luminous] ... Europe  200B parameters
+
** [https://www.aleph-alpha.com/luminous-explore-a-model-for-world-class-semantic-representation Luminous] ... Europe  200B
 
** [https://github.com/allenai/macaw Macaw | AI2]
 
** [https://github.com/allenai/macaw Macaw | AI2]
 
** [https://arxiv.org/pdf/2212.13138.pdf Med-PaLM]  ... aligned to the medical domain
 
** [https://arxiv.org/pdf/2212.13138.pdf Med-PaLM]  ... aligned to the medical domain
Line 49: Line 49:
 
** [https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation/ NLLB |] [[Meta]]  54.5B & 200B parameters; NLLB-200
 
** [https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation/ NLLB |] [[Meta]]  54.5B & 200B parameters; NLLB-200
 
** [https://idw-online.de/en/news786967 OpenGPT-X]  ... model for Europe
 
** [https://idw-online.de/en/news786967 OpenGPT-X]  ... model for Europe
** [https://www.reuters.com/technology/facebook-owner-meta-opens-access-ai-large-language-model-2022-05-03/ OPT-175B]...[[Meta|Facebook]]-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... [[Meta|Facebook]] 175-billion-parameter language model - Open Pretrained Transformer ... BlenderBot
+
** [https://www.reuters.com/technology/facebook-owner-meta-opens-access-ai-large-language-model-2022-05-03/ OPT-175B]...[[Meta|Facebook]]-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... [[Meta|Facebook]] 175B ... BlenderBot
 
** [https://huggingface.co/Writer/palmyra-base  Palmyra |] [[Hugging Face]] ... a privacy-first LLM for enterprises
 
** [https://huggingface.co/Writer/palmyra-base  Palmyra |] [[Hugging Face]] ... a privacy-first LLM for enterprises
** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)]  540B parameters
+
** [https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Pathways Language Model (PaLM)]  540B
** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu]  ... 11B parameter chatbot
+
** [http://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu]  ... 11B  
 
** [https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens RETRO |] [[Google | DeepMind]]  
 
** [https://www.deepmind.com/publications/improving-language-models-by-retrieving-from-trillions-of-tokens RETRO |] [[Google | DeepMind]]  
 
** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain  ... trillion parameters
 
** [https://arxiv.org/abs/2101.03961 Switch Transformers |] [[Google]] Brain  ... trillion parameters
Line 62: Line 62:
 
** [https://openai.com/blog/webgpt/ WebGPT] ... GPT-3 version that can search the web
 
** [https://openai.com/blog/webgpt/ WebGPT] ... GPT-3 version that can search the web
 
** [https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/  Wu Dao 1.0 (Enlightment 1.0)]  ... China’s first homegrown super-scale intelligent model  
 
** [https://syncedreview.com/2021/03/23/chinas-gpt-3-baai-introduces-superscale-intelligence-model-wu-dao-1-0/  Wu Dao 1.0 (Enlightment 1.0)]  ... China’s first homegrown super-scale intelligent model  
** [https://github.com/yandex/YaLM-100B YaLM] ... Yandex YaLM 100B parameters
+
** [https://github.com/yandex/YaLM-100B YaLM] ... Yandex YaLM 100B  
** [https://arxiv.org/abs/2110.04725 Yuan 1.0 | Inspur]  ... 245B parameters
+
** [https://arxiv.org/abs/2110.04725 Yuan 1.0 | Inspur]  ... 245B
 
* [https://openai.com/blog/gpt-2-6-month-follow-up/ OpenAI Blog] | [[OpenAI]]
 
* [https://openai.com/blog/gpt-2-6-month-follow-up/ OpenAI Blog] | [[OpenAI]]
 
* [[Attention]] Mechanism/[[Transformer]] Model
 
* [[Attention]] Mechanism/[[Transformer]] Model

Revision as of 11:27, 25 February 2023

YouTube search... ...Google search


Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson