Large Language Model (LLM)

From
Revision as of 20:21, 8 September 2023 by BPeat (talk | contribs)
Jump to: navigation, search

YouTube ... Quora ...Google search ...Google News ...Bing News


Large Language Model (LLM) is a Neural Network that learns skills, such as generating language and conducting conversations, by analyzing vast amounts of text from across the internet. The Neural Network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using Self-Supervised Learning or Semi-Supervised Learning. LLMs use deep neural networks, such as Transformers, to learn from billions or trillions of words, and to produce texts on any topic or domain. LLMs are general purpose models which excel at a wide range of tasks, as opposed to being trained for one specific task (such as Sentiment Analysis, Named Entity Recognition (NER), or Mathematical Reasoning). They are capable of generating human-like text, from poetry to programming code.



One of the more interesting, but seemingly academic, concerns of the new era of AI sucking up everything on the web was that AIs will eventually start to absorb other AI-generated content and regurgitate it in a self-reinforcing loop. Not so academic after all, it appears, because Bing just did it! When asked, it produced verbatim a COVID-19 conspiracy coaxed out of ChatGPT by disinformation researchers just last month. AI is eating itself: Bing’s AI quotes COVID disinfo sourced from ChatGPT | Devin Coldewey, Frederic Lardinois - TechCrunch



Multimodal

Multimodal Language Models; Multimodal Language Model (MLM)/Multimodal Large Language Model (MLLM) are is a type of Large Language Model (LLM) that combines text with other kinds of information, such as images, videos, audio, and other sensory data1. This allows MLLMs to solve some of the problems of the current generation of LLMs and unlock new applications that were impossible with text-only models What you need to know about multimodal language models | Ben Dickson - TechTalks

  • GPT-4 | OpenAI ... can accept prompts of both text and images1. This means that it can take images as well as text as input, giving it the ability to describe the humor in unusual images, summarize text from screenshots, and answer exam questions that contain diagrams. 1 trillion parameters.
  • Kosmos-1 | Microsoft ... can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). It can analyze images for content, solve visual puzzles, perform visual text recognition, and pass visual IQ tests. 1.6B
  • PaLM-E | Google ... an Embodied Multimodal Language Model that directly incorporates real-world continuous sensor modalities into language models and thereby establishes the link between words and percepts. It was developed by Google to be a model for robotics and can solve a variety of tasks on multiple types of robots and for multiple modalities (images, robot states, and neural scene representations). PaLM-E is also a generally-capable vision-and-language model. It can perform visual tasks, such as describing images, detecting objects, or classifying scenes, and is also proficient at language tasks, like quoting poetry, solving math equations or generating code. 562B
  • Gemini | Google ... (Generalized Multimodal Intelligence Network)synergistic network of multiple separate AI models that work in unison to handle an astonishingly wide variety of tasks. >100PB, 1 trillion tokens
  • Multimodal-CoT (Multimodal Chain-of-Thought Reasoning) GitHub ... incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. Under 1B
  • BLIP-2 | Salesforce Research ... a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. It achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods.

Large Language Models (LLM)

A Large Language Model (LLM) is a type of machine learning model that utilizes deep learning algorithms to process and understand language. They are trained on large amounts of data to learn language patterns so they can perform tasks such as translating texts or responding in chatbot conversations. LLMs are general-purpose models that excel at a wide range of tasks, as opposed to being trained for one specific task. It can be accessed and used through an API or a platform.


Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson


LLM Token / Parameter / Weight

YouTube ... Quora ... Google search ... Google News ... Bing News

In the context of a large language model (LLM), a token is a basic unit of meaning, such as a word, a punctuation mark, or a number. Parameters are the numerical values that define the behavior of the model. They are adjusted during training to optimize the model's ability to generate relevant and coherent text. Weights are a type of parameter that defines the strength of connections between neurons across different layers in the model. They are adjusted during training to optimize the model's ability to learn relationships between different tokens.

Here is a more detailed explanation of each term:

  • Parameter: A parameter is a numerical value that defines the behavior of a model. In the context of LLMs, parameters are adjusted during training to optimize the model's ability to generate relevant and coherent text. For example, a parameter might define the strength of a connection between two neurons in the model's architecture.
  • Weight: A weight is a type of parameter that defines the strength of connections between neurons across different layers in the model. Weights are adjusted during training to optimize the model's ability to learn relationships between different tokens. For example, a weight might define the strength of the connection between the neuron that represents the token "the" and the neuron that represents the token "quick".
    • Embedding weights: These weights are associated with each token in the vocabulary and are used to represent the meaning of the token.
    • Self-attention weights: used to calculate the attention weights between each token in a sequence.
    • Feedforward weights: used to calculate the output of the feedforward layer in each block of the LLM.
    • Bias weights: added to the outputs of the embedding layer, the self-attention layer, and the feedforward layer.

Large Language Model (LLM) Stack

YouTube ... Quora ...Google search ...Google News ...Bing News

Excerpt from... Emerging Architectures for LLM Applications | Matt Bornstein & Rajko Radovanovic - andreessen horowitz ... Large language models are a powerful new primitive for building software. But since they are so new—and behave so differently from normal computing resources—it’s not always obvious how to use them. At a very high level, the workflow can be divided into three stages:



Data preprocessing / embedding: This stage involves storing private data (legal documents, in our example) to be retrieved later. Typically, the documents are broken into chunks, passed through an embedding model, then stored in a specialized database called a vector database.


  • Contextual data Contextual data for LLM apps can come in a variety of formats, including text documents, PDFs, CSVs, and SQL tables.
    • Data pipelines Data pipelines with LLMs are a way to use large language models (LLMs) to process and analyze large amounts of data. LLMs can be used to extract information from text, translate languages, and answer questions.
      • Databricks ... direct file access and direct native support for Python, data science and AI frameworks
      • Airflow ... an open-source project that lets you programmatically author, schedule, and monitor your data pipelines using Python
      • Unstructured ... is data that does not have a predefined schema, such as text documents, images, and audio files.
    • Embedding model
      • OpenAI ... called text-embedding-ada-002, 2nd generation embedding model
      • Cohere ... a large language model (LLM) that is trained on a massive dataset of text and code. It can be used to represent the meaning of text as a list of numbers, which is useful for comparing text for similarity, clustering text, and classifying text.
      • Hugging Face
    • Vector database
      • Pinecone ... provide long-term memory for storing and query vector embeddings, a type of data that represents semantic information
      • Weaviate ... open source vector database that allows storing and retrieving data objects based on their semantic properties by indexing them with vectors
      • ChromaDB ... the open-source embedding database, build Python or JavaScript LLM apps with memory



Prompt construction / retrieval: When a user submits a query (a legal question, in this case), the application constructs a series of prompts to submit to the language model. A compiled prompt typically combines a prompt template hard-coded by the developer; examples of valid outputs called few-shot examples; any necessary information retrieved from external APIs; and a set of relevant documents retrieved from the vector database.


  • Prompt Few-shot examples Most developers start new projects by experimenting with simple prompts, consisting of direct instructions (zero-shot prompting) or possibly some example outputs (few-shot prompting).
    • Playground
      • OpenAI ... a web-based tool that makes it easy to test prompts and get familiar with how the API works.
      • nat.dev ... An LLM playground you can run on your laptop; Use any model from OpenAI, Anthropic, Cohere, Forefront, HuggingFace, Aleph Alpha, Replicate, Banana and llama.cpp. Open Playground - GitHub
      • Humanloop ... use the playground to experiment with new prompts, collect model generated data and user feedback, and finetune models
    • Orchestration
      • LangChain ... chain together different components to create more advanced use cases around LLMs
      • LlamaIndex ... data framework that allows users to connect custom data sources to Large Language Model (LLM)s. It provides tools to structure data, offers data connectors to ingest existing data sources and data formats (APIs, PDFs, docs, SQL, etc.), and provides an advanced retrieval/query interface over the data.
      • ChatGPT ... creates chat-based applications.
    • APIs/plugins
      • Serp ... giving ChatGPT the ability to use web browsing, python code execution, and custom plugins
      • Wolfram... Simple API, Short Answers API, Spoken Results API, Full Results API, & Conversational API
      • Zapier ... automates 5,000+ app integrations



Prompt execution / inference: Once the prompts have been compiled, they are submitted to a pre-trained LLM for inference—including both proprietary model APIs and open-source or self-trained models. Some developers also add operational systems like logging, caching, and validation at this stage.


  • LLM cache
    • Redis ... (remote dictionary server) provides an adaptive prompt creation mechanism based on the current context, which helps overcome the context length limitations of LLMs
    • SQLite ... disk-based storage to cache LLM prompts and responses for LangChain
    • GPTCache ... uses different storage backends, such as Redis, SQLite, or MinIO, to cache LLM prompts
  • Logging/LLMops
    • Weights & Biases (W&B) ... improve prompt engineering with visually interactive evaluation loops.
    • MLflow
    • PromptLayer
    • Helicone
  • Validation
    • Guardrails
    • Rebuff
    • Microsoft Guidance
    • LMQL
  • App hosting
    • Vercel
    • Steamship
    • Streamlit
    • Modal
  • LM APIs (proprietary)
  • LLM APIs (open)
  • Cloud providers
    • AWS
    • GCP
    • Azure
    • CoreWeave
  • Opinionated clouds


Risks

Open-source LLMs (Large Language Models) are trained on massive amounts of data from the Internet, which makes them accessible and versatile, but also poses some risks. Some of the risk factors associated with open-source LLMs are:

  • Bias and toxicity: LLMs can reflect and amplify the social biases and harmful language that exist in their training data, such as discrimination, exclusion, stereotypes, hate speech, etc. This can cause unfairness, offense, and harm to certain groups or individuals¹².
  • Privacy and security: LLMs can leak or infer private or sensitive information from their training data or from user inputs, such as personal details, passwords, intellectual property, etc. This can compromise the confidentiality and integrity of the data and expose it to malicious actors¹².
  • Misinformation and manipulation: LLMs can produce false or misleading information that can confuse or deceive users, such as inaccurate facts, bad advice, fake news, etc. This can affect the quality and trustworthiness of the information and influence the users' decisions and actions¹²³.
  • Malicious uses: LLMs can be used by adversaries to cause harm or disruption, such as spreading disinformation, creating scams or frauds, generating malicious code or weapons, etc. This can threaten the security and stability of individuals, organizations, and society²⁴.
  • Human-computer interaction harms: LLMs can affect the psychological and social well-being of users who interact with them, such as creating unrealistic expectations, reducing critical thinking, diminishing human agency, etc. This can impact the users' identity, autonomy, and relationships².

Life or death isn’t an issue at Morgan Stanley, but producing highly accurate responses to financial and investing questions is important to the firm, its clients, and its regulators. The answers provided by the system were carefully evaluated by human reviewers before it was released to any users. Then it was piloted for several months by 300 financial advisors. As its primary approach to ongoing evaluation, Morgan Stanley has a set of 400 “golden questions” to which the correct answers are known. Every time any change is made to the system, employees test it with the golden questions to see if there has been any “regression,” or less accurate answers. - How to Train Generative AI Using Your Company’s Data | Tom Davenport & Maryam Alavi - Harvard Business Review

Multi-step Multi-model Approach

The Multi-step Multi-model Approach with Large Language Models (LLMs) refers to the utilization of multiple LLMs in a sequential manner to tackle complex language processing tasks. As with any multi-model approach, there are considerations related to computational resources, deployment complexity, and potential challenges in combining the outputs effectively. However, when properly implemented, the Multi-step Multi-model Approach with LLMs can lead to significant improvements in various language-related applications.

In the context of the Multi-step Multi-model Approach with LLMs, the following steps are generally involved:

  • Problem Formulation: Clearly define the language processing task you want to address. It could be natural language understanding (NLU), natural language generation (NLG), question-answering, sentiment analysis, language translation, or any other language-related challenge.
  • Model Selection: Choose a set of diverse LLMs that are well-suited for the specific language processing tasks you want to address. Different LLMs might excel in different aspects of language understanding, so having a variety of models can be beneficial.
  • Model Pre-training: Before fine-tuning the LLMs for your specific task, you need to pre-train them on a large corpus of text data. Pre-training helps the models learn the underlying patterns and structures in the language.
  • Fine-tuning: After pre-training, the LLMs are fine-tuned on a task-specific dataset. Fine-tuning involves training the models on labeled data relevant to your target task. This process allows the models to adapt to the specific problem and make better predictions.
  • Ensemble Construction: Once you have multiple LLMs that are fine-tuned for the target tasks, you can create an ensemble of these models. Ensemble methods combine the predictions of individual models to make the final decision. Techniques like averaging, voting, or more sophisticated approaches can be used to combine the outputs of the LLMs.
  • Sequential Application: The ensemble of LLMs can be applied sequentially, where the output of one model becomes the input to the next model in the pipeline. Each LLM can focus on different aspects of the language processing task, contributing to the overall understanding and generation process.
  • Model Evaluation and Refinement: Evaluate the performance of the multi-step multi-model approach using appropriate metrics and validation techniques. If necessary, fine-tuning or retraining of individual LLMs can be performed to optimize the overall system.

Benefits of the Multi-step Multi-model Approach with LLMs:

  • Enhanced Language Understanding: By leveraging multiple LLMs, you can benefit from their diverse capabilities and obtain a more comprehensive understanding of the language context.
  • Improved Generation Quality: When generating responses or text, the ensemble of LLMs can produce more coherent and contextually appropriate outputs.
  • Flexibility and Adaptability: The approach can be adapted to a wide range of language processing tasks, making it a versatile solution.