Large Language Model (LLM)

Jump to: navigation, search

YouTube ... Quora ...Google search ...Google News ...Bing News

Large Language Model (LLM) is a Neural Network that learns skills, such as generating language and conducting conversations, by analyzing vast amounts of text from across the internet. The Neural Network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using Self-Supervised Learning or Semi-Supervised Learning. LLMs use deep neural networks, such as Transformers, to learn from billions or trillions of words, and to produce texts on any topic or domain. LLMs are general purpose models which excel at a wide range of tasks, as opposed to being trained for one specific task (such as Sentiment Analysis, Named Entity Recognition (NER), or Mathematical Reasoning). They are capable of generating human-like text, from poetry to programming code.

One of the more interesting, but seemingly academic, concerns of the new era of AI sucking up everything on the web was that AIs will eventually start to absorb other AI-generated content and regurgitate it in a self-reinforcing loop. Not so academic after all, it appears, because Bing just did it! When asked, it produced verbatim a COVID-19 conspiracy coaxed out of ChatGPT by disinformation researchers just last month. AI is eating itself: Bing’s AI quotes COVID disinfo sourced from ChatGPT | Devin Coldewey, Frederic Lardinois - TechCrunch


Multimodal Language Models; Multimodal Language Model (MLM)/Multimodal Large Language Model (MLLM) are is a type of Large Language Model (LLM) that combines text with other kinds of information, such as images, videos, audio, and other sensory data1. This allows MLLMs to solve some of the problems of the current generation of LLMs and unlock new applications that were impossible with text-only models What you need to know about multimodal language models | Ben Dickson - TechTalks

  • GPT-4 | OpenAI ... can accept prompts of both text and images1. This means that it can take images as well as text as input, giving it the ability to describe the humor in unusual images, summarize text from screenshots, and answer exam questions that contain diagrams. 1 trillion parameters.
  • Kosmos-1 | Microsoft ... can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). It can analyze images for content, solve visual puzzles, perform visual text recognition, and pass visual IQ tests. 1.6B
  • PaLM-E | Google ... an Embodied Multimodal Language Model that directly incorporates real-world continuous sensor modalities into language models and thereby establishes the link between words and percepts. It was developed by Google to be a model for robotics and can solve a variety of tasks on multiple types of robots and for multiple modalities (images, robot states, and neural scene representations). PaLM-E is also a generally-capable vision-and-language model. It can perform visual tasks, such as describing images, detecting objects, or classifying scenes, and is also proficient at language tasks, like quoting poetry, solving math equations or generating code. 562B
  • Gemini | Google ... (Generalized Multimodal Intelligence Network)synergistic network of multiple separate AI models that work in unison to handle an astonishingly wide variety of tasks. >100PB, 1 trillion tokens
  • Multimodal-CoT (Multimodal Chain-of-Thought Reasoning) GitHub ... incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. Under 1B
  • BLIP-2 | Salesforce Research ... a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. It achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods.

Large Language Models (LLM)

A Large Language Model (LLM) is a type of machine learning model that utilizes deep learning algorithms to process and understand language. They are trained on large amounts of data to learn language patterns so they can perform tasks such as translating texts or responding in chatbot conversations. LLMs are general-purpose models that excel at a wide range of tasks, as opposed to being trained for one specific task. It can be accessed and used through an API or a platform.

Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson ... PaLM

LLM Token / Parameter / Weight

YouTube ... Quora ... Google search ... Google News ... Bing News

In the context of a large language model (LLM), a token is a basic unit of meaning, such as a word, a punctuation mark, or a number. Parameters are the numerical values that define the behavior of the model. They are adjusted during training to optimize the model's ability to generate relevant and coherent text. Weights are a type of parameter that defines the strength of connections between neurons across different layers in the model. They are adjusted during training to optimize the model's ability to learn relationships between different tokens.

Here is a more detailed explanation of each term:


A token is a basic unit of meaning in a language. In natural language processing, tokens are typically words, but they can also be punctuation marks, numbers, or other symbols. For example, the sentence "The quick brown fox jumps over the lazy dog" contains 13 tokens.


A parameter is a numerical value that defines the behavior of a model. In the context of LLMs, parameters are adjusted during training to optimize the model's ability to generate relevant and coherent text. For example, a parameter might define the strength of a connection between two neurons in the model's architecture.


A weight is a type of parameter that defines the strength of connections between neurons across different layers in the model. Weights are adjusted during training to optimize the model's ability to learn relationships between different tokens. For example, a weight might define the strength of the connection between the neuron that represents the token "the" and the neuron that represents the token "quick".

  • Embedding weights: These weights are associated with each token in the vocabulary and are used to represent the meaning of the token.
  • Self-attention weights: used to calculate the attention weights between each token in a sequence.
  • Feedforward weights: used to calculate the output of the feedforward layer in each block of the LLM.
  • Bias weights: added to the outputs of the embedding layer, the self-attention layer, and the feedforward layer.


Open-source LLMs (Large Language Models) are trained on massive amounts of data from the Internet, which makes them accessible and versatile, but also poses some risks. Some of the risk factors associated with open-source LLMs are:

  • Bias and toxicity: LLMs can reflect and amplify the social biases and harmful language that exist in their training data, such as discrimination, exclusion, stereotypes, hate speech, etc. This can cause unfairness, offense, and harm to certain groups or individuals¹².
  • Privacy and security: LLMs can leak or infer private or sensitive information from their training data or from user inputs, such as personal details, passwords, intellectual property, etc. This can compromise the confidentiality and integrity of the data and expose it to malicious actors¹².
  • Misinformation and manipulation: LLMs can produce false or misleading information that can confuse or deceive users, such as inaccurate facts, bad advice, fake news, etc. This can affect the quality and trustworthiness of the information and influence the users' decisions and actions¹²³.
  • Malicious uses: LLMs can be used by adversaries to cause harm or disruption, such as spreading disinformation, creating scams or frauds, generating malicious code or weapons, etc. This can threaten the security and stability of individuals, organizations, and society²⁴.
  • Human-computer interaction harms: LLMs can affect the psychological and social well-being of users who interact with them, such as creating unrealistic expectations, reducing critical thinking, diminishing human agency, etc. This can impact the users' identity, autonomy, and relationships².

Life or death isn’t an issue at Morgan Stanley, but producing highly accurate responses to financial and investing questions is important to the firm, its clients, and its regulators. The answers provided by the system were carefully evaluated by human reviewers before it was released to any users. Then it was piloted for several months by 300 financial advisors. As its primary approach to ongoing evaluation, Morgan Stanley has a set of 400 “golden questions” to which the correct answers are known. Every time any change is made to the system, employees test it with the golden questions to see if there has been any “regression,” or less accurate answers. - How to Train Generative AI Using Your Company’s Data | Tom Davenport & Maryam Alavi - Harvard Business Review

Multi-step Multi-model Approach

The Multi-step Multi-model Approach with Large Language Models (LLMs) refers to the utilization of multiple LLMs in a sequential manner to tackle complex language processing tasks. As with any multi-model approach, there are considerations related to computational resources, deployment complexity, and potential challenges in combining the outputs effectively. However, when properly implemented, the Multi-step Multi-model Approach with LLMs can lead to significant improvements in various language-related applications.

In the context of the Multi-step Multi-model Approach with LLMs, the following steps are generally involved:

  • Problem Formulation: Clearly define the language processing task you want to address. It could be natural language understanding (NLU), natural language generation (NLG), question-answering, sentiment analysis, language translation, or any other language-related challenge.
  • Model Selection: Choose a set of diverse LLMs that are well-suited for the specific language processing tasks you want to address. Different LLMs might excel in different aspects of language understanding, so having a variety of models can be beneficial.
  • Model Pre-training: Before fine-tuning the LLMs for your specific task, you need to pre-train them on a large corpus of text data. Pre-training helps the models learn the underlying patterns and structures in the language.
  • Fine-tuning: After pre-training, the LLMs are fine-tuned on a task-specific dataset. Fine-tuning involves training the models on labeled data relevant to your target task. This process allows the models to adapt to the specific problem and make better predictions.
  • Ensemble Construction: Once you have multiple LLMs that are fine-tuned for the target tasks, you can create an ensemble of these models. Ensemble methods combine the predictions of individual models to make the final decision. Techniques like averaging, voting, or more sophisticated approaches can be used to combine the outputs of the LLMs.
  • Sequential Application: The ensemble of LLMs can be applied sequentially, where the output of one model becomes the input to the next model in the pipeline. Each LLM can focus on different aspects of the language processing task, contributing to the overall understanding and generation process.
  • Model Evaluation and Refinement: Evaluate the performance of the multi-step multi-model approach using appropriate metrics and validation techniques. If necessary, fine-tuning or retraining of individual LLMs can be performed to optimize the overall system.

Benefits of the Multi-step Multi-model Approach with LLMs:

  • Enhanced Language Understanding: By leveraging multiple LLMs, you can benefit from their diverse capabilities and obtain a more comprehensive understanding of the language context.
  • Improved Generation Quality: When generating responses or text, the ensemble of LLMs can produce more coherent and contextually appropriate outputs.
  • Flexibility and Adaptability: The approach can be adapted to a wide range of language processing tasks, making it a versatile solution.