Large Language Model (LLM)
YouTube ... Quora ...Google search ...Google News ...Bing News
- Large Language Model (LLM) ... Multimodal ... Foundation Models (FM) ... Generative Pre-trained ... Transformer ... GPT-4 ... GPT-5 ... Attention ... GAN ... BERT
- Natural Language Processing (NLP) ... Generation (NLG) ... Classification (NLC) ... Understanding (NLU) ... Translation ... Summarization ... Sentiment ... Tools
- Embedding ... Fine-tuning ... RAG ... Search ... Clustering ... Recommendation ... Anomaly Detection ... Classification ... Dimensional Reduction. ...find outliers
- Artificial Intelligence (AI) ... Generative AI ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
- Conversational AI ... ChatGPT | OpenAI ... Bing/Copilot | Microsoft ... Gemini | Google ... Claude | Anthropic ... Perplexity ... You ... phind ... Ernie | Baidu
- Cohere
- Agents ... Robotic Process Automation ... Assistants ... Personal Companions ... Productivity ... Email ... Negotiation ... LangChain
- Excel ... Documents ... Database; Vector & Relational ... Graph ... LlamaIndex
- Video/Image ... Vision ... Colorize ... Image/Video Transfer Learning
- End-to-End Speech ... Synthesize Speech ... Speech Recognition ... Music
- Analytics ... Visualization ... Graphical Tools ... Diagrams & Business Analysis ... Requirements ... Loop ... Bayes ... Network Pattern
- Development ... Notebooks ... AI Pair Programming ... Codeless ... Hugging Face ... AIOps/MLOps ... AIaaS/MLaaS
- Prompt Engineering (PE) ... PromptBase ... Prompt Injection Attack
- Artificial General Intelligence (AGI) to Singularity ... Curious Reasoning ... Emergence ... Moonshots ... Explainable AI ... Automated Learning
- Chain of Thought (CoT) ... Tree of Thoughts (ToT)
- Aviary ... fully free, cloud-based infrastructure designed to help developers choose and deploy the right technologies and approach for their LLM-based applications.
- Loss Curve
- Risk, Compliance and Regulation ... Ethics ... Privacy ... Law ... AI Governance ... AI Verification and Validation
- 8 Potentially Surprising Things To Know About Large Language Models LLMs | Dhanshree Shripad Shenwai - Marketechpost
- This AI Paper Introduces SELF-REFINE: A Framework For Improving Initial Outputs From LLMs Through Iterative Feedback And Refinement | Aneesh Tickoo - MarkTechPost
- Meet LMQL: An Open Source Programming Language and Platform for Large Language Model (LLM) Interaction | Tanya Malhotra - MarkTechPost
- What Are Large Language Models Used For? | NVIDIA
- Does your company need its own LLM? | Jason Ly - TechTalks ... inspired many to ask how to get their hands on their ‘own LLM’, or sometimes more ambitiously, their ‘own ChatGPT’.
- Emerging Architectures for LLM Applications | M. Bornstein and R. Radovanovic - Andreessen Horowitz ... a reference architecture for the emerging LLM app stack
- A jargon-free explanation of how AI large language models work | Timothy B. Lee & Sean Trot - ARS Technica
- Essential Guide to Foundation Models and Large Language Models | Babar M Bhatti - Medium
- How AI Built This Substack ... covering latest LLM developments
Large Language Model (LLM) is a Neural Network that learns skills, such as generating language and conducting conversations, by analyzing vast amounts of text from across the internet. The Neural Network with many parameters (typically billions of weights or more), trained on large quantities of unlabeled text using Self-Supervised Learning or Semi-Supervised Learning. LLMs use deep neural networks, such as Transformers, to learn from billions or trillions of words, and to produce texts on any topic or domain. LLMs are general purpose models which excel at a wide range of tasks, as opposed to being trained for one specific task (such as Sentiment Analysis, Named Entity Recognition (NER), or Mathematical Reasoning). They are capable of generating human-like text, from poetry to programming code.
One of the more interesting, but seemingly academic, concerns of the new era of AI sucking up everything on the web was that AIs will eventually start to absorb other AI-generated content and regurgitate it in a self-reinforcing loop. Not so academic after all, it appears, because Bing just did it! When asked, it produced verbatim a COVID-19 conspiracy coaxed out of ChatGPT by disinformation researchers just last month. AI is eating itself: Bing’s AI quotes COVID disinfo sourced from ChatGPT | Devin Coldewey, Frederic Lardinois - TechCrunch
Contents
Multimodal
- Large Language Model (LLM) ... Multimodal ... Foundation Models (FM) ... Generative Pre-trained ... Transformer ... GPT-4 ... GPT-5 ... Attention ... GAN ... BERT
Large Multimodal Models (LMM)/Multimodal Language Model (MLM)/Multimodal Large Language Model (MLLM) are a type of Large Language Model (LLM) that combines text with other kinds of information, such as images, videos, audio, and other sensory data1. This allows LMMs to solve some of the problems of the current generation of LLMs and unlock new applications that were impossible with text-only models What you need to know about multimodal language models | Ben Dickson - TechTalks
- GPT-4 | OpenAI ... can accept prompts of both text and images1. This means that it can take images as well as text as input, giving it the ability to describe the humor in unusual images, summarize text from screenshots, and answer exam questions that contain diagrams. 1 trillion parameters.
- Kosmos-1 | Microsoft ... can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). It can analyze images for content, solve visual puzzles, perform visual text recognition, and pass visual IQ tests. 1.6B
- PaLM-E | Google ... an Embodied Multimodal Language Model that directly incorporates real-world continuous sensor modalities into language models and thereby establishes the link between words and percepts. It was developed by Google to be a model for robotics and can solve a variety of tasks on multiple types of robots and for multiple modalities (images, robot states, and neural scene representations). PaLM-E is also a generally-capable vision-and-language model. It can perform visual tasks, such as describing images, detecting objects, or classifying scenes, and is also proficient at language tasks, like quoting poetry, solving math equations or generating code. 562B
- Gemini | Google ... (Generalized Multimodal Intelligence Network)synergistic network of multiple separate AI models that work in unison to handle an astonishingly wide variety of tasks. >100PB, 1 trillion tokens
- Multimodal-CoT (Multimodal Chain-of-Thought Reasoning) GitHub ... incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. Under 1B
- BLIP-2 | Salesforce Research ... a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen large language models. It achieves state-of-the-art performance on various vision-language tasks, despite having significantly fewer trainable parameters than existing methods.
Large Language Models (LLM)
A Large Language Model (LLM) is a type of machine learning model that utilizes deep learning algorithms to process and understand language. They are trained on large amounts of data to learn language patterns so they can perform tasks such as translating texts or responding in chatbot conversations. LLMs are general-purpose models that excel at a wide range of tasks, as opposed to being trained for one specific task. It can be accessed and used through an API or a platform.
- Everything you should know about AI models | Eray Eliaçık - Dataconomy
- AlexaTM | Amazon 20B
- Alpa ... serving large models like GPT-3 simple, affordable, accessible
- Bidirectional Encoder Representations from Transformers (BERT) 340M
- BioGPT ... Microsoft language model trained for biomedical tasks
- Baize ... 7B, 13B, 30B & 60B
- BLOOM ... Big Science Language Open-science Open-access Multilingual ... 176B
- BloombergGPT 50B ... trained financial data
- Cedille ... open-source French language model 6B
- ChatGPT | OpenAI
- Chinchilla | DeepMind 70B
- ctrl ... a Conditional Transformer Language Model for Controllable Generation | Salesforce
- Codex | OpenAI ... translates natural language into codeF
- Dataflow-as-a-Service | SambaNova
- DialogGPT ...Microsoft Releases DialogGPT AI Conversation Model | Anthony Alford - InfoQ - trained on over 147M dialogs
- Dolly | Databricks
- Flamingo | DeepMind ... Flamingo Pytorch 80B
- FLAN-T5-XXL | Goggle ... 11B
- Flan-U-PaLM Google ... capable of generating executable Python code
- GLM-130B ... Open Bilingual Pre-Trained Model 130B
- GLaM | Google
- Gopher | DeepMind 280B
- GShard | Google ... Scaling Giant Models with Conditional Computation and Automatic Sharding
- GPT-2 | OpenAI 1.5B
- GPT-3 | OpenAI 175B
- GPT-Neo ... Open-source GPT-3 by EleutherAI 20B
- HTML-T5 | Google ... A domain-specific LLM trained on a massive corpus of HTML documents, specializing in understanding and summarizing website content.
- InstructGPT ... OpenAI 1.3B InstructGPT model over outputs from a 175B GPT-3 model
- Jurassic-1 ... huge 178B language model to rival OpenAI's GPT-3
- LaMDA | Google ... Language Model for Dialogue Applications; experimental language model 137B
- LLaMA ... Large Language Model Meta AI, 13B and 65B parameter versions
- Luminous ... Europe 200B
- Macaw | AI2 11B
- Med-PaLM ... aligned to the medical domain ... PaLM
- Megatron | nVidia ... Monolithic Transformer Language NLP Model 11B
- minGPT | Andrej Karpathy - GitHub
- Mistral ... Mixtral 8x7b ... Mixture-of-Experts (MoE)
- Muse ... VLM-4, a set of natively trained large Language Models in French, Italian, Spanish, German, and English
- nanoGPT ... for training/finetuning medium-sized GPTs
- NLLB | Meta 54.5B & 200B parameters; NLLB-200
- OpenChatKit | TogetherCompute ... The first open-source ChatGPT alternative released; a 20B chat-GPT model under the Apache-2.0 license, which is available for free on Hugging Face.
- OpenELM | Apple
- OpenGPT-X ... model for Europe
- OPT-175B...Facebook-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... Facebook 175B ... BlenderBot 175B
- Palmyra | Hugging Face ... a privacy-first LLM for enterprises
- PaLM | Google ... Pathways Language Model ... 540B
- Phi-3 | Microsoft ... outperform much larger models in math and computer science ... Phi-3-mini 3.8B
- PLATO-XL | Baidu ... 11B
- RETRO | DeepMind
- StackLLaMA | Hugging Face 7B .. trained with Stack Exchange Using RLHF
- Switch Transformers | Google Brain ... trillion parameters
- Textless NLP ... Generating expressive speech from raw audio
- T0pp | Hugging Face
- Toolformer | Meta ... models can teach themselves to use tools and APIs
- Turing-NLG | Microsoft
- UnifiedQA ... single QA system
- WebGPT ... GPT-3 version that can search the web
- Wu Dao 1.0 (Enlightment 1.0) ... China’s first homegrown super-scale intelligent model
- XGLM | Hugging Face 7.5B
- YaLM ... Yandex YaLM 100B
- Yuan 1.0 | Inspur ... 245B
Inside language models (from GPT-3 to PaLM) | Alan-D-Thompson ... PaLM
LLM Token / Parameter / Weight
YouTube ... Quora ... Google search ... Google News ... Bing News
- Embedding ... Fine-tuning ... Search ... Clustering ... Recommendation ... Anomaly Detection ... Classification ... Dimensional Reduction. ...find outliers
- Large Language Model (LLM) Evaluation
- NVIDIA A100 HPC (High-Performance Computing) Accelerator for ChatGPT
- Leaked LLaMA Unveils the Power of Open Source for AI | Anirudh VK - AIM
- LLM Parameter Counting | kipply's blob
- Numbers every LLM Developer should know | Waleed Kadous - Anyscale
In the context of a large language model (LLM), a token is a basic unit of meaning, such as a word, a punctuation mark, or a number. Parameters are the numerical values that define the behavior of the model. They are adjusted during training to optimize the model's ability to generate relevant and coherent text. Weights are a type of parameter that defines the strength of connections between neurons across different layers in the model. They are adjusted during training to optimize the model's ability to learn relationships between different tokens.
Here is a more detailed explanation of each term:
Token
A token is a basic unit of meaning in a language. In natural language processing, tokens are typically words, but they can also be punctuation marks, numbers, or other symbols. For example, the sentence "The quick brown fox jumps over the lazy dog" contains 13 tokens.
- Scaling Transformer to 1M tokens and beyond with Recurrent Memory Transformer (RMT) | A. Bulatov, Y. Kuratov, & M. Burtsev - arXiv - Cornell University ... Researchers are designing ways for ChatGPT to do 1M+ tokens by letting the model learn the meaning of groups of tokens instead of only tokens. ChatGPT only remembers a few thousand tokens (or word chunks) at a time. AKA, it has small short term memory.
Parameter
A parameter is a numerical value that defines the behavior of a model. In the context of LLMs, parameters are adjusted during training to optimize the model's ability to generate relevant and coherent text. For example, a parameter might define the strength of a connection between two neurons in the model's architecture.
Weight
A weight is a type of parameter that defines the strength of connections between neurons across different layers in the model. Weights are adjusted during training to optimize the model's ability to learn relationships between different tokens. For example, a weight might define the strength of the connection between the neuron that represents the token "the" and the neuron that represents the token "quick".
- Embedding weights: These weights are associated with each token in the vocabulary and are used to represent the semantic meaning of the tokens.
- Self-attention weights: These weights are used to determine the influence of different tokens on each other within a sequence
- Feedforward weights: These weights are used in the feedforward layers of the model to compute the layer's output, which is a part of each block in the large language model (LLM)
- Bias weights: Bias weights are added to the outputs of various layers, including the embedding, self-attention, and feedforward layers, to help the model make more accurate predictions
Sharing
Sharing "weights" refers to the distribution of the parameters that determine the strength of connections between neurons in different layers of a neural network. These weights are crucial for the model's ability to process and generate information. During the training phase, these weights are adjusted to optimize the model's performance, allowing it to learn and understand the relationships between different tokens. This process involves using a learning rate, which is a hyperparameter that controls the size of the steps taken to update the weights. Additionally, techniques like weight pruning can be used to simplify the model by removing weights that have minimal impact on the output. Regularization methods such as L1 and L2 are also employed to prevent overfitting by adding a penalty term to the loss function based on the magnitude of the weights. When Meta shares the "weights" of the LLaMA model, they are providing the parameters that have been learned during the training process, which include embedding, self-attention, feedforward, and bias weights.
Risks
- How Risky Is Your Open-Source LLM Project? A New Research Explains The Risk Factors Associated With Open-Source LLMs | Anant Shahi - MarkTechPost
- Instead of AI sentience, focus on the current risks of large language models
- Ethical and social risks of harm from Language Models | DeepMind
Open-source LLMs (Large Language Models) are trained on massive amounts of data from the Internet, which makes them accessible and versatile, but also poses some risks. Some of the risk factors associated with open-source LLMs are:
- Bias and toxicity: LLMs can reflect and amplify the social biases and harmful language that exist in their training data, such as discrimination, exclusion, stereotypes, hate speech, etc. This can cause unfairness, offense, and harm to certain groups or individuals¹².
- Privacy and security: LLMs can leak or infer private or sensitive information from their training data or from user inputs, such as personal details, passwords, intellectual property, etc. This can compromise the confidentiality and integrity of the data and expose it to malicious actors¹².
- Misinformation and manipulation: LLMs can produce false or misleading information that can confuse or deceive users, such as inaccurate facts, bad advice, fake news, etc. This can affect the quality and trustworthiness of the information and influence the users' decisions and actions¹²³.
- Malicious uses: LLMs can be used by adversaries to cause harm or disruption, such as spreading disinformation, creating scams or frauds, generating malicious code or weapons, etc. This can threaten the security and stability of individuals, organizations, and society²⁴.
- Human-computer interaction harms: LLMs can affect the psychological and social well-being of users who interact with them, such as creating unrealistic expectations, reducing critical thinking, diminishing human agency, etc. This can impact the users' identity, autonomy, and relationships².
Life or death isn’t an issue at Morgan Stanley, but producing highly accurate responses to financial and investing questions is important to the firm, its clients, and its regulators. The answers provided by the system were carefully evaluated by human reviewers before it was released to any users. Then it was piloted for several months by 300 financial advisors. As its primary approach to ongoing evaluation, Morgan Stanley has a set of 400 “golden questions” to which the correct answers are known. Every time any change is made to the system, employees test it with the golden questions to see if there has been any “regression,” or less accurate answers. - How to Train Generative AI Using Your Company’s Data | Tom Davenport & Maryam Alavi - Harvard Business Review
Multi-step Multi-model Approach
The Multi-step Multi-model Approach with Large Language Models (LLMs) refers to the utilization of multiple LLMs in a sequential manner to tackle complex language processing tasks. As with any multi-model approach, there are considerations related to computational resources, deployment complexity, and potential challenges in combining the outputs effectively. However, when properly implemented, the Multi-step Multi-model Approach with LLMs can lead to significant improvements in various language-related applications.
In the context of the Multi-step Multi-model Approach with LLMs, the following steps are generally involved:
- Problem Formulation: Clearly define the language processing task you want to address. It could be natural language understanding (NLU), natural language generation (NLG), question-answering, sentiment analysis, language translation, or any other language-related challenge.
- Model Selection: Choose a set of diverse LLMs that are well-suited for the specific language processing tasks you want to address. Different LLMs might excel in different aspects of language understanding, so having a variety of models can be beneficial.
- Model Pre-training: Before fine-tuning the LLMs for your specific task, you need to pre-train them on a large corpus of text data. Pre-training helps the models learn the underlying patterns and structures in the language.
- Fine-tuning: After pre-training, the LLMs are fine-tuned on a task-specific dataset. Fine-tuning involves training the models on labeled data relevant to your target task. This process allows the models to adapt to the specific problem and make better predictions.
- Ensemble Construction: Once you have multiple LLMs that are fine-tuned for the target tasks, you can create an ensemble of these models. Ensemble methods combine the predictions of individual models to make the final decision. Techniques like averaging, voting, or more sophisticated approaches can be used to combine the outputs of the LLMs.
- Sequential Application: The ensemble of LLMs can be applied sequentially, where the output of one model becomes the input to the next model in the pipeline. Each LLM can focus on different aspects of the language processing task, contributing to the overall understanding and generation process.
- Model Evaluation and Refinement: Evaluate the performance of the multi-step multi-model approach using appropriate metrics and validation techniques. If necessary, fine-tuning or retraining of individual LLMs can be performed to optimize the overall system.
Benefits of the Multi-step Multi-model Approach with LLMs:
- Enhanced Language Understanding: By leveraging multiple LLMs, you can benefit from their diverse capabilities and obtain a more comprehensive understanding of the language context.
- Improved Generation Quality: When generating responses or text, the ensemble of LLMs can produce more coherent and contextually appropriate outputs.
- Flexibility and Adaptability: The approach can be adapted to a wide range of language processing tasks, making it a versatile solution.
LM Studio: Discover, download, and run local LLMs
LM Studio is a desktop application that allows users to experiment with local and open-source Large Language Models (LLMs). It provides an easy-to-use platform for running LLMs locally on Mac, Windows, and potentially Linux. To download and install LM Studio, follow these steps:
1.Download LM Studio:
- Visit the LM Studio website at lmstudio.ai.
- Download the LM Studio desktop app for your specific platform (Mac or Windows)
- The download size is approximately 400MB, so it may take some time depending on your internet speed
2. Choose a Model:
- After launching LM Studio, select a model to download from the available options provided within the application
- Models like Zephyr-7B, Mixtral 8x7B, Google's Gemma model, and others are available for use
3. Run LLMs Locally:
- Once you have chosen and downloaded a model, you can start using it locally through LM Studio.
- You can converse with the LLM by selecting your model and enabling GPU acceleration if desired
System Requirements: Ensure your system meets the requirements to run LM Studio effectively:
- Apple Silicon Mac with macOS 13.6 or newer.
- Windows/Linux PC with a processor supporting AVX2.
- Recommended RAM of 16GB+ and VRAM of 6GB+ for PCs.
- NVIDIA/AMD GPUs are supported