Difference between revisions of "Large Language Model (LLM)"
m |
m (→Large Language Model (LLM) Stack) |
||
| Line 266: | Line 266: | ||
*** [[LangChain]] ... chain together different components to create more advanced use cases around LLMs | *** [[LangChain]] ... chain together different components to create more advanced use cases around LLMs | ||
*** [[LlamaIndex]] ... data framework that allows users to connect custom data sources to [[Large Language Model (LLM)]]s. It provides tools to structure data, offers data connectors to ingest existing data sources and data formats (APIs, PDFs, docs, SQL, etc.), and provides an advanced retrieval/query interface over the data. | *** [[LlamaIndex]] ... data framework that allows users to connect custom data sources to [[Large Language Model (LLM)]]s. It provides tools to structure data, offers data connectors to ingest existing data sources and data formats (APIs, PDFs, docs, SQL, etc.), and provides an advanced retrieval/query interface over the data. | ||
| − | *** [[ChatGPT]] | + | *** [[ChatGPT]] ... creates chat-based applications. |
** <b>APIs/plugins</b> | ** <b>APIs/plugins</b> | ||
*** [https://github.com/serp-ai/ChatGPT-Plugins Serp] ... giving [[ChatGPT]] the ability to use web browsing, [[python]] code execution, and custom plugins | *** [https://github.com/serp-ai/ChatGPT-Plugins Serp] ... giving [[ChatGPT]] the ability to use web browsing, [[python]] code execution, and custom plugins | ||
Revision as of 06:12, 28 June 2023
YouTube ... Quora ...Google search ...Google News ...Bing News
- Large Language Model (LLM) ... Natural Language Processing (NLP) ... Generation ... Classification ... Understanding ... Translation ... Tools & Services
- Multimodal Language Models ... GPT-4 ... GPT-5
- Embedding: Search ... Clustering ... Recommendation ... Anomaly Detection ... Classification ... Dimensional Reduction ... ...find outliers
- Attention Mechanism ... Transformer ... Generative Pre-trained Transformer (GPT) ... GAN ... BERT
- Artificial Intelligence (AI) ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
- Assistants ... Personal Companions ... Agents ... Negotiation ... LangChain
- Excel ... Documents ... Database ... Graph ... LlamaIndex
- Generative AI ... Conversational AI ... OpenAI's ChatGPT ... Perplexity ... Microsoft's Bing ... You ...Google's Bard ... Baidu's Ernie
- Capabilities
- Video/Image ... Vision ... Colorize ... Image/Video Transfer Learning
- End-to-End Speech ... Synthesize Speech ... Speech Recognition ... Music
- Development ... AI Pair Programming Tools ... Analytics ... Visualization ... Diagrams for Business Analysis
- Prompt Engineering (PE) ... PromptBase ... Prompt Injection Attack
- Foundation Models (FM)
- Singularity ... Sentience ... AGI ... Curious Reasoning ... Emergence ... Moonshots ... Explainable AI ... Automated Learning
- Chain of Thought (CoT) ... Tree of Thoughts (ToT)
- Aviary ... fully free, cloud-based infrastructure designed to help developers choose and deploy the right technologies and approach for their LLM-based applications.
- 8 Potentially Surprising Things To Know About Large Language Models LLMs | Dhanshree Shripad Shenwai - Marketechpost
- This AI Paper Introduces SELF-REFINE: A Framework For Improving Initial Outputs From LLMs Through Iterative Feedback And Refinement | Aneesh Tickoo - MarkTechPost
- Meet LMQL: An Open Source Programming Language and Platform for Large Language Model (LLM) Interaction | Tanya Malhotra - MarkTechPost
- What Are Large Language Models Used For? | NVIDIA
Large Language Model (LLM) is a Neural Network that learns skills, such as generating language and conducting conversations, by analyzing vast amounts of text from across the internet. The Neural Network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using Self-Supervised Learning or Semi-Supervised Learning. LLMs use deep neural networks, such as Transformers, to learn from billions or trillions of words, and to produce texts on any topic or domain. LLMs are general purpose models which excel at a wide range of tasks, as opposed to being trained for one specific task (such as Sentiment Analysis, Named Entity Recognition (NER), or Mathematical Reasoning). They are capable of generating human-like text, from poetry to programming code.
One of the more interesting, but seemingly academic, concerns of the new era of AI sucking up everything on the web was that AIs will eventually start to absorb other AI-generated content and regurgitate it in a self-reinforcing loop. Not so academic after all, it appears, because Bing just did it! When asked, it produced verbatim a COVID-19 conspiracy coaxed out of ChatGPT by disinformation researchers just last month. AI is eating itself: Bing’s AI quotes COVID disinfo sourced from ChatGPT | Devin Coldewey, Frederic Lardinois - TechCrunch
Contents
Multimodal
- Multimodal Language Models ... GPT-4 ... GPT-5
Multimodal Language Models; Multimodal Language Model (MLM)/Multimodal Large Language Model (MLLM) are is a type of Large Language Model (LLM) that combines text with other kinds of information, such as images, videos, audio, and other sensory data1. This allows MLLMs to solve some of the problems of the current generation of LLMs and unlock new applications that were impossible with text-only models What you need to know about multimodal language models | Ben Dickson - TechTalks
- GPT-4 | OpenAI ... can accept prompts of both text and images1. This means that it can take images as well as text as input, giving it the ability to describe the humor in unusual images, summarize text from screenshots, and answer exam questions that contain diagrams. rumored to be more than 1 trillion parameters.
- Kosmos-1 | Microsoft ... can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). It can analyze images for content, solve visual puzzles, perform visual text recognition, and pass visual IQ tests. 1.6B
- PaLM-E | Google ... an Embodied Multimodal Language Model that directly incorporates real-world continuous sensor modalities into language models and thereby establishes the link between words and percepts. It was developed by Google to be a model for robotics and can solve a variety of tasks on multiple types of robots and for multiple modalities (images, robot states, and neural scene representations). PaLM-E is also a generally-capable vision-and-language model. It can perform visual tasks, such as describing images, detecting objects, or classifying scenes, and is also proficient at language tasks, like quoting poetry, solving math equations or generating code. 562B
- Multimodal-CoT (Multimodal Chain-of-Thought Reasoning) GitHub ... incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. Under 1B
- BLIP-2 | Salesforce Research ... a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. It achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods.
Large Language Models (LLM)
A Large Language Model (LLM) is a type of machine learning model that utilizes deep learning algorithms to process and understand language. They are trained on large amounts of data to learn language patterns so they can perform tasks such as translating texts or responding in chatbot conversations. LLMs are general-purpose models that excel at a wide range of tasks, as opposed to being trained for one specific task. It can be accessed and used through an API or a platform.
- Everything you should know about AI models | Eray Eliaçık - Dataconomy
- AlexaTM | Amazon 20B
- Alpa ... serving large models like GPT-3 simple, affordable, accessible
- Bidirectional Encoder Representations from Transformers (BERT) 340M
- BioGPT ... Microsoft language model trained for biomedical tasks
- Baize ... 7B, 13B, 30B & 60B
- BLOOM ... Big Science Language Open-science Open-access Multilingual ... 176B
- BloombergGPT 50B ... trained financial data
- Cedille ... open-source French language model 6B
- ChatGPT | OpenAI
- Chinchilla | DeepMind 70B
- ctrl ... a Conditional Transformer Language Model for Controllable Generation | Salesforce
- Codex | OpenAI ... translates natural language into codeF
- Dataflow-as-a-Service | SambaNova
- DialogGPT ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs
- Dolly | Databricks
- Flamingo | DeepMind ... Flamingo Pytorch 80B
- FLAN-T5-XXL | Goggle ... 11B
- GLM-130B ... Open Bilingual Pre-Trained Model 130B
- GLaM | Google
- Gopher | DeepMind 280B
- GShard | Google ... Scaling Giant Models with Conditional Computation and Automatic Sharding
- GPT-2 | OpenAI 1.5B
- GPT-3 | OpenAI 175B
- GPT-Neo ... Open-source GPT-3 by EleutherAI 20B
- InstructGPT ... OpenAI 1.3B InstructGPT model over outputs from a 175B GPT-3 model
- Jurassic-1 ... huge 178B language model to rival OpenAI's GPT-3
- LaMDA | Google ... Language Model for Dialogue Applications; experimental language model 137B
- LLaMA ... Large Language Model Meta AI, 13B and 65B parameter versions
- Luminous ... Europe 200B
- Macaw | AI2 11B
- Med-PaLM ... aligned to the medical domain
- Megatron | nVidia ... Monolithic Transformer Language NLP Model 11B
- minGPT | Andrej Karpathy - GitHub
- Muse ... VLM-4, a set of natively trained large Language Models in French, Italian, Spanish, German, and English
- nanoGPT ... for training/finetuning medium-sized GPTs
- NLLB | Meta 54.5B & 200B parameters; NLLB-200
- OpenChatKit | TogetherCompute ... The first open-source ChatGPT alternative released; a 20B chat-GPT model under the Apache-2.0 license, which is available for free on Hugging Face.
- OpenGPT-X ... model for Europe
- OPT-175B...Facebook-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... Facebook 175B ... BlenderBot 175B
- Palmyra | Hugging Face ... a privacy-first LLM for enterprises
- PaLM | Google ... Pathways Language Model ... 540B
- PLATO-XL | Baidu ... 11B
- RETRO | DeepMind
- StackLLaMA | Hugging Face 7B .. trained with Stack Exchange Using RLHF
- Switch Transformers | Google Brain ... trillion parameters
- Textless NLP ... Generating expressive speech from raw audio
- T0pp | Hugging Face
- Toolformer | Meta ... models can teach themselves to use tools and APIs
- Turing-NLG | Microsoft
- UnifiedQA ... single QA system
- WebGPT ... GPT-3 version that can search the web
- Wu Dao 1.0 (Enlightment 1.0) ... China’s first homegrown super-scale intelligent model
- XGLM | Hugging Face 7.5B
- YaLM ... Yandex YaLM 100B
- Yuan 1.0 | Inspur ... 245B
Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson
LLM Token / Parameter / Weight
YouTube ... Quora ... Google search ... Google News ... Bing News
- Embedding: Search ... Clustering ... Recommendation ... Anomaly Detection ... Classification ... Dimensional Reduction ... ...find outliers
- NVIDIA A100 HPC (High-Performance Computing) Accelerator for ChatGPT
- Leaked LLaMA Unveils the Power of Open Source for AI | Anirudh VK - AIM
- LLM Parameter Counting | kipply's blob
- Numbers every LLM Developer should know | Waleed Kadous - Anyscale
In the context of a large language model (LLM), a token is a basic unit of meaning, such as a word, a punctuation mark, or a number. Parameters are the numerical values that define the behavior of the model. They are adjusted during training to optimize the model's ability to generate relevant and coherent text. Weights are a type of parameter that defines the strength of connections between neurons across different layers in the model. They are adjusted during training to optimize the model's ability to learn relationships between different tokens.
Here is a more detailed explanation of each term:
- Token: A token is a basic unit of meaning in a language. In natural language processing, tokens are typically words, but they can also be punctuation marks, numbers, or other symbols. For example, the sentence "The quick brown fox jumps over the lazy dog" contains 13 tokens.
- Scaling Transformer to 1M tokens and beyond with Recurrent Memory Transformer (RMT) | A. Bulatov, Y. Kuratov, & M. Burtsev - arXiv - Cornell University ... Researchers are designing ways for ChatGPT to do 1M+ tokens by letting the model learn the meaning of groups of tokens instead of only tokens. ChatGPT only remembers a few thousand tokens (or word chunks) at a time. AKA, it has small short term memory.
- Parameter: A parameter is a numerical value that defines the behavior of a model. In the context of LLMs, parameters are adjusted during training to optimize the model's ability to generate relevant and coherent text. For example, a parameter might define the strength of a connection between two neurons in the model's architecture.
- Weight: A weight is a type of parameter that defines the strength of connections between neurons across different layers in the model. Weights are adjusted during training to optimize the model's ability to learn relationships between different tokens. For example, a weight might define the strength of the connection between the neuron that represents the token "the" and the neuron that represents the token "quick".
- Embedding weights: These weights are associated with each token in the vocabulary and are used to represent the meaning of the token.
- Self-attention weights: used to calculate the attention weights between each token in a sequence.
- Feedforward weights: used to calculate the output of the feedforward layer in each block of the LLM.
- Bias weights: added to the outputs of the embedding layer, the self-attention layer, and the feedforward layer.
Evaluating Large Language Models (LLM)
YouTube ... Quora ...Google search ...Google News ...Bing News
- In-Context Learning (ICL) ... Context
- Holistic Evaluation of Language Models (HELM) | Stanford ... a living benchmark that aims to improve the transparency of language models.
- Blazingly Fast LLM Evaluation for In-Context Learning | Jeremy Dohmann - Mosaic
- Evals - GitHub ... a framework for evaluating LLMs (large language models) or systems built using LLMs as components.
- Evaluating Large Language Models (LLMs) with Eleuther AI | Bharat Ramanathan - Weights & Biases ... With a flexible and tokenization-agnostic interface, the lm-eval library provides a single framework for evaluating and reporting auto-regressive language models on various Natural Language Understanding (NLU) tasks. There are currently over 200 evaluation tasks that support the evaluation of models such as GPT-2 ,T5, Gpt-J, Gpt-Neo, Gpt-NeoX, Flan-T5.
There are several factors that should be considered while evaluating Large Language Models (LLMs). These include:
- authenticity
- speed
- grammar
- readability
- unbiasedness
- backtracking
- safety
- responsibility
- understanding the context
- text operations
Backtracking
Backtracking is a general algorithmic technique that considers searching every possible combination in order to solve a computational problem. It incrementally builds candidates to the solutions and abandons a candidate’s backtracks as soon as it determines that the candidate cannot be completed to a reasonable solution. In machine learning, backtracking can be used to solve constraint satisfaction problems, such as crosswords, verbal arithmetic, Sudoku, and many other puzzles.
Large Language Model (LLM) Ecosystem Explained
YouTube ... Quora ...Google search ...Google News ...Bing News
The Large Language Model (LLM) ecosystem refers to the various commercial and open-source LLM providers, their offerings, and the tooling that helps accelerate their wide adoption. The functionality of LLMs can be segmented into five areas: Knowledge Answering, Translation, Text Generation, Response Generation, and Classification. There are many options to choose from for all types of language tasks.
|
LLM Ecosystem explained: Your ultimate Guide to AI | code_your_own_AI
A comprehensive LLM /AI ecosystem is essential for the creation and implementation of sophisticated AI applications. It facilitates the efficient processing of large-scale data, the development of complex machine learning models, and the deployment of intelligent systems capable of performing complex tasks. As the field of AI continues to evolve and expand, the importance of a well-integrated and cohesive AI ecosystem cannot be overstated. A complete overview of today's LLM and how you can train them for your needs. |
Large Language Model (LLM) Stack
YouTube ... Quora ...Google search ...Google News ...Bing News
Excerpt from... Emerging Architectures for LLM Applications | Matt Bornstein & Rajko Radovanovic - andreessen horowitz ... Large language models are a powerful new primitive for building software. But since they are so new—and behave so differently from normal computing resources—it’s not always obvious how to use them. At a very high level, the workflow can be divided into three stages:
Data preprocessing / embedding: This stage involves storing private data (legal documents, in our example) to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, then stored in a specialized database called a vector database.
- Contextual data Contextual data for LLM apps can come in a variety of formats, including text documents, PDFs, CSVs, and SQL tables.
- Data pipelines Data pipelines with LLMs are a way to use large language models (LLMs) to process and analyze large amounts of data. LLMs can be used to extract information from text, translate languages, and answer questions.
- Databricks ... direct file access and direct native support for Python, data science and AI frameworks
- Airflow ... an open-source project that lets you programmatically author, schedule, and monitor your data pipelines using Python
- Unstructured ... is data that does not have a predefined schema, such as text documents, images, and audio files.
- Embedding model
- OpenAI
- Cohere ... a large language model (LLM) that is trained on a massive dataset of text and code. It can be used to represent the meaning of text as a list of numbers, which is useful for comparing text for similarity, clustering text, and classifying text.
- Hugging Face
- Vector database
- Pinecone ... provide long-term memory for storing and query vector embeddings, a type of data that represents semantic information
- Weaviate ... open source vector database that allows storing and retrieving data objects based on their semantic properties by indexing them with vectors
- ChromaDB ... the open-source embedding database, build Python or JavaScript LLM apps with memory
- Data pipelines Data pipelines with LLMs are a way to use large language models (LLMs) to process and analyze large amounts of data. LLMs can be used to extract information from text, translate languages, and answer questions.
Prompt construction / retrieval: When a user submits a query (a legal question, in this case), the application constructs a series of prompts to submit to the language model. A compiled prompt typically combines a prompt template hard-coded by the developer; examples of valid outputs called few-shot examples; any necessary information retrieved from external APIs; and a set of relevant documents retrieved from the vector database.
- Prompt Few-shot examples Most developers start new projects by experimenting with simple prompts, consisting of direct instructions (zero-shot prompting) or possibly some example outputs (few-shot prompting).
- Playground
- OpenAI ... a web-based tool that makes it easy to test prompts and get familiar with how the API works.
- nat.dev ... An LLM playground you can run on your laptop; Use any model from OpenAI, Anthropic, Cohere, Forefront, HuggingFace, Aleph Alpha, Replicate, Banana and llama.cpp. Open Playground - GitHub
- Humanloop ... use the playground to experiment with new prompts, collect model generated data and user feedback, and finetune models
- Orchestration
- LangChain ... chain together different components to create more advanced use cases around LLMs
- LlamaIndex ... data framework that allows users to connect custom data sources to Large Language Model (LLM)s. It provides tools to structure data, offers data connectors to ingest existing data sources and data formats (APIs, PDFs, docs, SQL, etc.), and provides an advanced retrieval/query interface over the data.
- ChatGPT ... creates chat-based applications.
- APIs/plugins
- Playground
Prompt execution / inference: Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference—including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.
- LLM cache
- Redis
- SQLite
- GPTCache
- Logging/LLMops
- Weights & Biases
- MLflow
- PromptLayer
- Helicone
- Validation
- Guardrails
- Rebuff
- Microsoft Guidance
- LMQL
- App hosting
- Vercel
- Steamship
- Streamlit
- Modal
- LM APIs (proprietary)
- LLM APIs (open)
- Cloud providers
- AWS
- GCP
- Azure
- CoreWeave
- Opinionated clouds
- Databricks
- Anyscale
- Mosaic
- Modal
- RunPod