Difference between revisions of "Fine-tuning"

From
Jump to: navigation, search
(Created page with "{{#seo: |title=PRIMO.ai |titlemode=append |keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot...")
 
m
 
(52 intermediate revisions by the same user not shown)
Line 14: Line 14:
 
</script>
 
</script>
 
}}
 
}}
[https://www.youtube.com/results?search_query=ai+~Embedding YouTube]
+
[https://www.youtube.com/results?search_query=ai+Fine+tuning YouTube]
[https://www.quora.com/search?q=ai%20~Embedding ... Quora]
+
[https://www.quora.com/search?q=ai%20Fine%20tuning ... Quora]
[https://www.google.com/search?q=ai+~Embedding ...Google search]
+
[https://www.google.com/search?q=ai+Fine+tuning ...Google search]
[https://news.google.com/search?q=ai+~Embedding ...Google News]
+
[https://news.google.com/search?q=ai+Fine+tuning ...Google News]
[https://www.bing.com/news/search?q=ai+~Embedding&qft=interval%3d%228%22 ...Bing News]
+
[https://www.bing.com/news/search?q=ai+Fine+tuning&qft=interval%3d%228%22 ...Bing News]
  
* [[Embedding]]:  [[Agents#AI-Powered Search|Search]]  ... [[Clustering]] ... [[Recommendation]] ... [[Anomaly Detection]] ... [[Classification]] ... [[Dimensional Reduction]] ... [[...find outliers]]
+
* [[Embedding]] ... [[Fine-tuning]] ... [[Retrieval-Augmented Generation (RAG)|RAG]] ... [[Agents#AI-Powered Search|Search]] ... [[Clustering]] ... [[Recommendation]] ... [[Anomaly Detection]] ... [[Classification]] ... [[Dimensional Reduction]]. [[...find outliers]]
* [[AI Solver]] ... [[Algorithms]] ... [[Algorithm Administration|Administration]] ... [[Model Search]] ... [[Discriminative vs. Generative]] ... [[Optimizer]] ... [[Train, Validate, and Test]]
+
* [[Prompting vs AI Model Fine-Tuning vs AI Embeddings]]
* [[Math for Intelligence]] ... [[Finding Paul Revere]] ... [[Social Network Analysis (SNA)]] ... [[Dot Product]] ... [[Kernel Trick]]
+
* [[Alpaca]]
* [[Capabilities]]:
+
* [[Train Large Language Model (LLM) From Scratch]]
*** [[AI-Powered Search|Search]] (where results are ranked by relevance to a query string)
+
* [https://arstechnica.com/information-technology/2023/08/you-can-now-train-chatgpt-on-your-own-documents-via-api/ You can now train ChatGPT on your own documents via API | Benj Edwards - ARS Technica] ... Developers can now bring their own data to customize GPT-3.5 Turbo outputs; running [[supervised]] fine-tuning to make this model perform better for their use cases by uploading documents using the command-line tool [https://en.wikipedia.org/wiki/CURL cURL] to query an API web address
*** [[Clustering]] (where text strings are grouped by similarity)
+
** [https://platform.openai.com/docs/guides/fine-tuning Fine-tuning for GPT 3.5 Turbo | OpenAI]
*** [[Recommendation]]s (where items with related text strings are recommended)
+
* [https://towardsdatascience.com/fine-tuning-large-language-models-llms-23473d763b91 Fine-Tuning Large Language Models (LLMs) | Shawhin Talebi - Medium] ... A conceptual overview with example Python code
*** [[Anomaly Detection]] (where outliers with little relatedness are identified)
 
*** [[Classification]] (where text strings are classified by their most similar label)
 
*** [[Dimensional Reduction]]
 
*** [[...find outliers]] ... diversity measurement (where similarity distributions are analyzed)
 
* [https://medium.com/@ryanntk/choosing-the-right-embedding-model-a-guide-for-llm-applications-7a60180d28e3 Choosing the Right Embedding Model: A Guide for LLM Applications | Ryan Nguyen - Medium]
 
* [https://partee.io/2022/08/11/vector-embeddings/ Vector Embeddings: From the Basics to Production | Sam Partee - Partee.IO]
 
Types:
 
* [[Local Linear Embedding (LLE)]]
 
* [[T-Distributed Stochastic Neighbor Embedding (t-SNE)]]
 
  
  
= <span id="AI Encoding & AI Embedding"></span>AI Encoding & AI Embedding =
+
A process of retraining a language model on a new dataset of data. This can be used to improve the model's performance on a specific task, such as generating text, translating languages, or answering questions. Fine-tuning is a way to add new knowledge to an existing AI model. It’s a simple upgrade that allows the model to learn new information.  
* [[Excel]] ... [[LangChain#Documents|Documents]] ... [[Database]] ... [[Graph]] ... [[LlamaIndex]]
 
  
The terms "AI encodings" and "AI embeddings" are sometimes used interchangeably, but there is a subtle difference between the two.
+
Here are some more detailed information on fine-tuning:
  
* </b>Encodings</b> are a general term for any representation of data that is used by a [[Machine Learning (ML)]] model. This could be a one-hot encoding, a bag-of-words representation, or a more complex representation such as a word embedding.
+
* Fine-tuning is a relatively simple process. The first step is to select a pre-trained language model. There are many pre-trained language models available, such as GPT-3, RoBERTa, and XLNet. Once you have selected a pre-trained language model, you need to gather a dataset of data for fine-tuning. This dataset should be relevant to the task that you want the model to perform. For example, if you want to fine-tune a language model for question answering, you would need to gather a dataset of questions and answers.
* </b>Embeddings</b> are a specific type of AI encoding that is learned from data. Embeddings are typically represented as [[Math for Intelligence#Vector|vectors]] of real numbers, and they capture the meaning and context of the data they represent.  
+
* The next step is to fine-tune the language model on the dataset of data. This is done by using a technique called supervised learning. In supervised learning, the model is given a set of labeled examples. In the case of fine-tuning, the labels are the answers to the questions in the dataset. The model is then trained to predict the labels for the unlabeled examples in the dataset.
 +
* Fine-tuning can be a time-consuming process, but it can significantly improve the performance of a language model on a specific task. For example, fine-tuning a language model on a dataset of question and answers can improve the model's ability to answer new questions.
 +
 
 +
 
 +
<youtube>Ezz_5csCJqI</youtube>
 +
<youtube>y9PHWGOa8HA</youtube>
 +
<youtube>HMbctCYJLbw</youtube>
 +
<youtube>LitybCiLhSc</youtube>
  
  
In other words, all embeddings are encodings, but not all encodings are embeddings. Here are some examples of AI encodings that are not embeddings:
 
  
* One-hot Encoding is a simple way to represent categorical data as a [[Math for Intelligence#Vector|vector]]. For example, the word "dog" would be represented as a [[Math for Intelligence#Vector|vector]] of 100 zeros, with a single 1 at the index corresponding to the word "dog" in a vocabulary of 100 words.
+
= Methods For Fine-tuning an LLM =
* Bag-of-words is a more sophisticated way to represent text data as a [[Math for Intelligence#Vector|vector]]. This involves counting the number of times each word appears in a document, and then representing the document as a [[Math for Intelligence#Vector|vector]] of these counts.
+
* [https://dr-bruce-cottman.medium.com/part-1-eight-major-methods-for-finetuning-an-llm-6f746c7259ee Part 1: Eight Major Methods For FineTuning an LLM | Bruce Cottman - Medium] ... Gradient-based, LoRA, QLoRA, and four others as advanced variations of ULMFiT: selecting a small subset of the available parameters in a trained LLM.
 +
* [https://www.lakera.ai/insights/llm-fine-tuning-guide The Ultimate Guide to LLM Fine Tuning: Best Practices & Tools | Lakera]
 +
* [https://www.simform.com/blog/completeguide-finetuning-llm/ A Complete Guide to Fine Tuning Large Language Models | Hiren Dhaduk]
 +
* [https://www.analyticsvidhya.com/blog/2023/08/fine-tuning-large-language-models/ A Comprehensive Guide to Fine-Tuning Large Language Models | Babina Banjara]
 +
* [https://research.aimultiple.com/llm-fine-tuning/ LLM Fine Tuning Guide for Enterprises in 2023 | Cem Dilmegani]
 +
* [https://www.unite.ai/understanding-llm-fine-tuning-tailoring-large-language-models-to-your-unique-requirements/ Understanding LLM Fine-Tuning: Tailoring Large Language Models to Your Unique Requirements | Aayush Mittal]
 +
* [https://www.analyticsvidhya.com/blog/2023/08/fine-tuning-large-language-models/ Fine-tuning Large Language Models | AnalyticsVidhya]
  
  
AI Embeddings are a type of representation of text that captures the meaning of the text. This can be used for tasks such as search, classification, and recommendation. allow the model to search in a “database” and return the best result. Here are some examples of AI Embeddings:
+
Fine-tuning can be applied to various types of models, such as convolutional neural networks, recurrent neural networks, and large language models. There are different ways to fine-tune a model, depending on the amount and similarity of the data available for the new task, the complexity and size of the model, and the computational resources and time constraints. Here are some examples of Fine-tuning:
 +
 
 +
* Fine-tuning OpenAI's base models such as Davinc, Curie, Babbage, and Ada to improve their performance on a variety of tasks, such as generating text, translating languages, and answering questions.
 +
* Fine-tuning a binary classifier to rate each completion for truthfulness based on expert-labeled examples.
 +
* Incorporating proprietary content into a language model to improve its ability to provide relevant answers to questions.
 +
* Full-model fine-tuning: This method involves updating all the parameters of the pre-trained model on the new task. This method can achieve high performance, but it is also computationally expensive and prone to overfitting if the new data is small or noisy.
 +
* Partial-model fine-tuning: This method involves updating only a subset of the parameters of the pre-trained model, while keeping the rest fixed or frozen. This method can reduce the computational cost and prevent overfitting, but it also requires choosing which layers or modules to fine-tune and which ones to freeze. A common heuristic is to freeze the earlier layers that capture general features and fine-tune the later layers that capture task-specific features.
 +
* Adapter-based fine-tuning: This method involves adding small neural networks, called adapters, to each layer or module of the pre-trained model, and updating only the parameters of the adapters on the new task, while keeping the original parameters frozen. This method can achieve parameter-efficient fine-tuning, as adapters have much fewer parameters than the original model, and preserve the performance and robustness of the pre-trained model.
 +
 
 +
Fine-tuning is a powerful technique that can be used to improve the performance of language models on a variety of tasks. If you are looking to improve the performance of a language model on a specific task, fine-tuning is a good option to consider.
  
* Word embeddings are a type of embedding that represents words as [[Math for Intelligence#Vector|vectors]] of real numbers. These [[Math for Intelligence#Vector|vectors]] are typically learned from a large corpus of text, and they capture the meaning and context of the words they represent.
+
== Instruction Tuning ==
* Image embeddings are a type of embedding that represents images as [[Math for Intelligence#Vector|vectors]] of real numbers. These [[Math for Intelligence#Vector|vectors]] are typically learned from a large dataset of images, and they capture the visual features of the images they represent.
+
* [[Agents]] ... [[Robotic Process Automation (RPA)|Robotic Process Automation]] ... [[Assistants]] ... [[Personal Companions]] ... [[Personal Productivity|Productivity]] ... [[Email]] ... [[Negotiation]] ... [[LangChain]]
 +
* [https://github.com/SinclairCoder/Instruction-Tuning-Papers Instruction-Tuning-Papers | GitHub]
 +
* [https://self-supervised.cs.jhu.edu/sp2023/files/Instruction%20tuning%20of%20LLMs%20-%20Talk@JHU.pdf Instruction Tuning of Large Language Models | Yizhong Wang - John Hopkins University (JHU)]
 +
* [https://arxiv.org/abs/2304.03277 Instruction Tuning with GPT-4 | B. Peng, C. Li, P. He, M. Galley, & J. Gao - arXiv]
 +
* [https://smilegate.ai/en/2021/09/12/instruction-tuning-flan/ Instruction tuning – FLAN | Convergence Research Team Hongmae Shim - Smilegate AI]
 +
* [https://sh-tsang.medium.com/brief-review-flan-palm-scaling-instruction-finetuned-language-models-79f47cbcb882 Brief Review — Flan-PaLM: Scaling Instruction-Finetuned Language Models | Sik-Ho Tsang - Medium] ... Flan-PaLM, PaLM Fine-Tuned Using FLAN
  
Embedding...
 
* projecting an input into another more convenient representation space. For example we can project (embed) faces into a space in which face matching can be more reliable. | [https://www.quora.com/profile/Chomba-Bupe Chomba Bupe]
 
* a mapping of a discrete — categorical — variable to a [[Math for Intelligence#Vector|vector]] of continuous numbers. In the [[context]] of neural networks, embeddings are low-dimensional, learned continuous [[Math for Intelligence#Vector|vector]] representations of discrete variables. [[Neural Network]] embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space. [https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526 Neural Network Embeddings Explained | Will Koehrsen - Towards Data Science]
 
* a relatively low-dimensional space into which you can translate high-dimensional [[Math for Intelligence#Vector|vectors]]. Embeddings make it easier to do [[Machine Learning (ML)]] on large inputs like sparse [[Math for Intelligence#Vector|vectors]] representing words. Ideally, an embedding captures some of the semantics of the input by placing semantically similar inputs close together in the embedding space. An embedding can be learned and reused across models. [https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture Embeddings | Machine Learning Crash Course]
 
* Search: Embeddings can be used to rank search results by relevance to a query string.
 
* Clustering: Embeddings can be used to group text strings by similarity.
 
* Recommendations: Embeddings can be used to recommend items that are related to a user's interests.
 
* Anomaly detection: Embeddings can be used to identify outliers with little relatedness.
 
* Diversity measurement: Embeddings can be used to analyze similarity distributions.
 
* Classification: Embeddings can be used to classify text strings by their most similar label.
 
  
 +
Instructional tuning is a technique that aims to teach [[Large Language Model (LLM)]] to follow natural language instructions, such as prompts, examples, and constraints, to perform better on various [[Natural Language Processing (NLP)]] tasks. Instructional tuning can improve the capabilities and controllability of LLMs across different tasks, domains, and modalities. It can also enable [[Large Language Model (LLM)|LLMs]] to generalize to unseen tasks by using instructions as a bridge between the pretraining objective and the user’s objective.
  
<hr><center><b><i>
+
Instructional tuning involves fine-tuning [[Large Language Model (LLM)|LLMs]] with instructional data, which consists of pairs of human-written instructions and desired outputs. For example, an instruction could be “Write a summary of the following article in three sentences” and an output could be “The article discusses the benefits of instructional tuning for [[Large Language Model (LLM)|large language models]]. It presents a survey paper that covers the fundamentals, challenges, and applications of this technique. It also introduces a new method called LoRA that leverages [[Large Language Model (LLM)|LLMs]] to generate instructional data for themselves.” Instructional data can be collected from various sources, such as existing NLP datasets, expert annotations, or even [[Large Language Model (LLM)|LLMs]] themselves.
  
By employing techniques like Word Embeddings, Sentence Embeddings, or Contextual embedding, [[Math for Intelligence#Vector|vector]] embeddings provide a compact and meaningful representation of textual data. Word embeddings, for instance, map words to fixed-length [[Math for Intelligence#Vector|vectors]], where words with similar meanings are positioned closer to one another in the [[Math for Intelligence#Vector|vector]] space. This allows for efficient semantic search, information retrieval, and language understanding tasks.
+
=== <span id="Implementing Chain of Thought (CoT)"></span>Implementing Chain of Thought (CoT) ===
 +
Provide context, examples and prompt to ask model for [[Chain of Thought (CoT)]] output specific to task and use this output to tune a model; which is smaller and/or cheaper to inference.
  
</i></b></center><hr>
+
To use [[Chain of Thought (CoT)|CoT]] [[Large Language Model (LLM)]] output to fine-tune a model, you can follow these steps:
  
 +
# Generate [[Chain of Thought (CoT)|CoT]] demonstrations. This can be done by prompting a large [[Large Language Model (LLM)|LLM]] to solve complex questions via [[Few Shot Learning#Zero-Shot Prompting|zero-shot]] [[Chain of Thought (CoT)|CoT]] reasoning. For example, you could prompt the [[Large Language Model (LLM)|LLM]] to solve a math problem, and then ask it to explain its reasoning step by step.
 +
# Collect a dataset of [[Chain of Thought (CoT)|CoT]] demonstrations. Once you have generated a set of [[Chain of Thought (CoT)|CoT]] demonstrations, you can collect them into a dataset. This dataset will be used to fine-tune your model.
 +
# Fine-tune your model on the [[Chain of Thought (CoT)|CoT]] dataset. You can use any standard fine-tuning technique to fine-tune your model on the [[Chain of Thought (CoT)|CoT]] dataset. For example, you could use supervised learning to train your model to predict the next step in a [[Chain of Thought (CoT)|CoT]] demonstration sequence.
 +
# Evaluate your fine-tuned model. Once you have fine-tuned your model, you can evaluate its performance on a held-out test set of [[Chain of Thought (CoT)|CoT]] demonstrations.
  
Embeddings have 3 primary purposes:
+
== LoRA ==
# Finding nearest neighbors in the embedding space. These can be used to make recommendations based on user interests or [[Clustering|cluster]] categories.
+
* [https://arxiv.org/abs/2106.09685 LoRA: Low-Rank Adaptation of Large Language Models | E. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, & W. Chen]
# As input to a [[Machine Learning (ML)]] model for a supervised task.
 
# For [[visualization]] of concepts and relations between categories.
 
  
[[Fine-tuning]]
+
Low-Rank Adaptation (LoRA) is a technique that leverages [[Large Language Model (LLM)]]s to generate instructional data for themselves. Instructional data consists of pairs of human-written instructions and desired outputs, which can be used to fine-tune [[Large Language Model (LLM)|LLMs]] to follow natural language instructions. LoRA involves prompting [[Large Language Model (LLM)|LLMs]] to generate instructions and instances for various [[Natural Language Processing (NLP)]] tasks, such as [[Summarization]], [[Sentiment Analysis]], question answering, etc. The generated instructional data can then be used to fine-tune the [[Large Language Model (LLM)|LLMs]] on the same or different tasks, improving their capabilities and controllability.
= <span id="Fine-tuning"></span>Fine-tuning =
 
A process of retraining a language model on a new dataset of data. This can be used to improve the model's performance on a specific task, such as generating text, translating languages, or answering questions. Fine-tuning is a way to add new knowledge to an existing AI model. It’s a simple upgrade that allows the model to learn new information.  
 
  
Here are some more detailed information on fine-tuning:
+
LoRA was proposed by a team of researchers from [[Microsoft]] Research in a paper titled “LoRA: [[Self-Supervised]] Generation of Instructional Data for [[Large Language Model (LLM)|Large Language Models]]” . The paper introduces a framework for [[Self-Supervised]] generation of instructional data using [[Large Language Model (LLM)|LLMs]], such as [[GPT-4]]. The paper also demonstrates that LoRA can enable [[Large Language Model (LLM)|LLMs]] to learn new tasks and skills by generating instructions and instances for them.
  
* Fine-tuning is a relatively simple process. The first step is to select a pre-trained language model. There are many pre-trained language models available, such as GPT-3, RoBERTa, and XLNet. Once you have selected a pre-trained language model, you need to gather a dataset of data for fine-tuning. This dataset should be relevant to the task that you want the model to perform. For example, if you want to fine-tune a language model for question answering, you would need to gather a dataset of questions and answers.
+
LoRA generates instructions and instances by using a two-step process:
* The next step is to fine-tune the language model on the dataset of data. This is done by using a technique called supervised learning. In supervised learning, the model is given a set of labeled examples. In the case of fine-tuning, the labels are the answers to the questions in the dataset. The model is then trained to predict the labels for the unlabeled examples in the dataset.
 
* Fine-tuning can be a time-consuming process, but it can significantly improve the performance of a language model on a specific task. For example, fine-tuning a language model on a dataset of question and answers can improve the model's ability to answer new questions.
 
  
 +
* First, LoRA prompts the LLM to generate instructions for various NLP tasks, such as summarization, sentiment analysis, question answering, etc. The instructions are natural language descriptions of what the LLM should do given an input text. For example, an instruction for summarization could be “Write a summary of the following article in three sentences”. LoRA uses a set of templates and heuristics to guide the LLM to generate diverse and valid instructions.
 +
* Second, LoRA prompts the LLM to generate instances for each instruction, which are pairs of input texts and desired outputs. The input texts are sampled from a large corpus of text, such as Wikipedia or Common Crawl. The desired outputs are generated by the LLM itself, following the instruction. For example, an instance for summarization could be:
  
Here are some examples of fine-tuning:
+
<b>Input</b>: <i>The 2023 FIFA World Cup is scheduled to be the 23rd edition of the FIFA World Cup, the quadrennial international men’s association football championship contested by the national teams of the member associations of FIFA. It is scheduled to take place in Qatar from 21 November to 18 December 2023. This will be the first World Cup ever to be held in the Arab world and the first in a Muslim-majority country. This will be the second World Cup held entirely in Asia after the 2002 tournament in South Korea and Japan. In addition, the tournament will be the last to involve 32 teams, with an increase to 48 teams scheduled for the 2026 tournament in Canada, Mexico and United States.</i>
  
* Fine-tuning OpenAI's base models such as Davinc, Curie, Babbage, and Ada to improve their performance on a variety of tasks, such as generating text, translating languages, and answering questions.
+
<b>Output</b>: <i>The 2023 FIFA World Cup is a global football tournament that will take place in Qatar from November to December 2023. It will be the first World Cup in the Arab world and a Muslim-majority country, and the second in Asia. The tournament will feature 32 teams for the last time before expanding to 48 teams in 2026.</i>
* Fine-tuning a binary classifier to rate each completion for truthfulness based on expert-labeled examples.
 
* Incorporating proprietary content into a language model to improve its ability to provide relevant answers to questions.
 
  
Fine-tuning is a powerful technique that can be used to improve the performance of language models on a variety of tasks. If you are looking to improve the performance of a language model on a specific task, fine-tuning is a good option to consider.
+
LoRA ensures the quality of generated instructions and instances by using several techniques, such as:
  
<youtube>LitybCiLhSc</youtube>
+
* Filtering out invalid or duplicate instructions based on syntactic and semantic criteria, such as length, readability, specificity, and uniqueness.
<youtube>Ezz_5csCJqI</youtube>
+
* Evaluating the quality of generated instances based on metrics such as fluency, coherence, relevance, and accuracy.
<youtube>y9PHWGOa8HA</youtube>
+
* Comparing the generated instances with human-written outputs from existing NLP datasets or expert annotations, and selecting the ones that have high similarity or agreement.
<youtube>HMbctCYJLbw</youtube>
+
* Applying post-processing steps such as spelling correction, punctuation normalization, and capitalization to improve the readability and consistency of the generated instances.
  
[[Prompting vs AI Model Fine-Tuning vs AI Embeddings]]
+
== QLoRA ==
= <span id="Prompting vs AI Model Fine-Tuning vs AI Embeddings"></span>Prompting vs AI Model Fine-Tuning vs AI Embeddings =
+
Quantized Low Rank Adapters (QLoRA), a method for efficient fine-tuning of quantized large language models (LLMs). Here is a summary of what I have learned from the web search results:
* [https://coderevolution.ro/knowledge-base/faq/what-is-the-difference-between-ai-model-fine-tuning-and-ai-embeddings What is the difference between AI model fine-tuning and AI Embeddings]
 
* [https://github.com/openai/openai-cookbook GitHub - openai/openai-cookbook: Examples and guides]
 
* [https://huggingface.co/blog/how-to-train-sentence-transformers Train and Fine-Tune Sentence Transformers Models | Hugging Face]
 
* [https://stackoverflow.com/questions/40345607/how-does-fine-tuning-word-embeddings-work How does Fine-tuning Word Embeddings work? | Stack Overflow]
 
* [https://medium.com/intuitionmachine/ai-development-tradeoffs-using-prompting-fine-tuning-and-search-engine-embeddings-91ff75beb7e2 AI Development Tradeoffs using Prompting, Fine-Tuning, and Search Engine Embeddings | Carlos E. Perez - Medium]
 
  
Beyond simple prompt engineering, there are two design approaches to consider: building an embedding [[database]] of all proprietary content and dynamically searching for relevant information at runtime, or sending the content to the AI provider to fine-tune the model.
+
* QLoRA is a method that combines low-rank matrix factorization and 4-bit quantization to compress the weights of the LLM and the adapters. Adapters are small neural networks that are added to each layer of the LLM and are trained on a specific task, while the LLM itself is frozen12.
 +
* QLoRA reduces the [[memory]] usage of fine-tuning LLMs by up to 98%, enabling fine-tuning LLMs with billions of parameters on a single GPU, which would otherwise require hundreds of GBs of [[memory]].
 +
* QLoRA preserves the performance of full 16-bit fine-tuning on various tasks, such as instruction following and chatbot generation. QLoRA has been applied to fine-tune LLMs such as LLaMA and T5 on these tasks and has achieved state-of-the-art results.
 +
* QLoRA introduces several innovations to save [[memory]] and improve speed, such as:
 +
** NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights12.
 +
** Double Quantization, a technique that reduces the average [[memory]] footprint by quantizing the quantization constants12.
 +
** Paged Optimizers, a method that manages [[memory]] spikes by paging out optimizer states12.
  
<table>
+
== ULMFiT ==
  <tr>
+
Universal Language Model Fine-tuning (ULMFiT) is a method for fine-tuning a pre-trained language model for a specific downstream task, such as text classification, sentiment analysis, or hate speech detection. ULMFiT can be applied to any task in natural language processing (NLP). ULMFiT has three main steps:
    <th>Feature</th>
 
    <th>AI Model Fine-tuning</th>
 
    <th>AI Embeddings</th>
 
  </tr>
 
  <tr>
 
    <td>Purpose</td>
 
    <td>Improve the performance of a language model on a specific task</td>
 
    <td>Capture the meaning of text</td>
 
  </tr>
 
  <tr>
 
    <td>Process</td>
 
    <td>Retrain the language model on a new dataset of data</td>
 
    <td>Calculate a numerical representation of the text</td>
 
  </tr>
 
  <tr>
 
    <td>Applications</td>
 
    <td>Text generation, translation, question answering</td>
 
    <td>Search, classification, recommendation</td>
 
  </tr>
 
  <tr>
 
    <td>Advantages</td>
 
    <td>Can improve the performance of a language model significantly</td>
 
    <td>Efficient and easy to use</td>
 
  </tr>
 
  <tr>
 
    <td>Disadvantages</td>
 
    <td>Can be time-consuming and expensive</td>
 
    <td>May not be as accurate as fine-tuning</td>
 
  </tr>
 
</table>
 
  
 +
* <b>General language model pre-training</b>: This step involves training a language model on a large and diverse corpus of text, such as Wikipedia, to learn general linguistic features and patterns. ULMFiT uses a 3-layer AWD-LSTM architecture for its language model.
 +
* <b>Target language model fine-tuning</b>: This step involves fine-tuning the pre-trained language model on the text data of the target task, such as movie reviews or tweets, to adapt it to the specific domain and vocabulary. ULMFiT introduces several techniques to improve fine-tuning, such as discriminative fine-tuning, which adjusts the learning rate for each layer according to its importance; slanted triangular learning rates, which increases and then decreases the learning rate during training; and gradual unfreezing, which unfreezes and trains one layer at a time from the top to the bottom.
 +
* <b>Target classifier fine-tuning</b>: This step involves adding a classifier layer on top of the fine-tuned language model and training it on the labeled data of the target task, such as positive or negative sentiment. ULMFiT uses the same techniques as in the previous step to fine-tune the classifier.
  
<hr><center><b><i>
+
ULMFiT has achieved state-of-the-art results on six text classification tasks, reducing the error by 18-24% on most datasets. It has also shown that it can match the performance of training from scratch on 100 times more data with only 100 labeled examples.
  
The best technique to use will depend on the specific task you want to perform and the resources you have available
+
== Gradient-based ==
 +
Gradient-based fine-tuning is a method of adapting a pre-trained model to a specific task or domain by updating its parameters using gradient descent. Gradient descent is an optimization algorithm that iteratively adjusts the parameters of a model to minimize a loss function.
  
</i></b></center><hr>
+
Gradient-based fine-tuning can be applied to various types of models, such as large language models, object detection models, or image classification models. The main advantage of gradient-based fine-tuning is that it can leverage the knowledge and generalization ability of the pre-trained model and improve its performance on the target task or domain. However, gradient-based fine-tuning also has some challenges and limitations, such as:
  
 +
* It can be computationally expensive and inefficient, especially for large models with many parameters.
 +
* It can cause overfitting or catastrophic forgetting, which means that the fine-tuned model may lose its original capabilities or perform poorly on out-of-distribution data.
 +
* It can be sensitive to the choice of hyperparameters, such as the learning rate, the number of fine-tuning steps, or the regularization techniques.
  
 +
To address these challenges and limitations, researchers have proposed various methods and techniques to improve gradient-based fine-tuning, such as:
  
* <b>Prompting</b>: is the simplest technique. It involves providing the LLM with a text prompt that describes the task you want it to perform. The LLM then generates text that is consistent with the prompt. This is a very efficient technique, as it does not require any retraining of the LLM. However, it can be less accurate than fine-tuning or embeddings, as the LLM may not be able to understand the prompt perfectly.
+
* Using sparse or local attention to reduce the computation cost and [[memory]] consumption of fine-tuning large language models with long context sizes.
 +
* Learning trainable constraints or projection radii for each layer of the model to control the distance between the fine-tuned model and the pre-trained model.
 +
* Meta-learning dedicated meta-models or hypermodels to generate task-specific parameters or loss functions for the downstream model.
 +
* Customizing different learning rates or data augmentation strategies for each layer or sample of the model.
  
* <b>Fine-tuning</b>: is a more powerful technique than prompting. It involves retraining the LLM on a dataset of examples for the specific task you want it to perform. This can improve the accuracy of the LLM, but it also requires more training data and compute resources.
+
= <span id="Large Language Model (LLM) Ecosystem Explained"></span>Large Language Model (LLM) Ecosystem Explained =
  
* <b>Embeddings</b>: are a middle ground between prompting and fine-tuning. They involve using a small model to learn a representation of the input text. This representation is then used to initialize the LLM, which can then be fine-tuned on a dataset of examples for the specific task you want it to perform. This can improve the accuracy of the LLM over prompting, without requiring as much training data or compute resources as fine-tuning.
+
The Large Language Model (LLM) ecosystem refers to the various commercial and open-source LLM providers, their offerings, and the tooling that helps accelerate their wide adoption. The functionality of LLMs can be segmented into five areas: Knowledge Answering, Translation, Text Generation, Response Generation, and Classification. There are many options to choose from for all types of language tasks.
  
  
<table>
+
<img src="https://PRIMO.AI/B_Roll/LLMEcosystemApril2023.png" width="900">
    <tr>
 
      <th>Technique</th>
 
      <th>Performance</th>
 
      <th>Efficiency</th>
 
      <th>Flexibility</th>
 
    </tr>
 
    <tr>
 
      <td>Prompting</td>
 
      <td>Low</td>
 
      <td>High</td>
 
      <td>High</td>
 
    </tr>
 
    <tr>
 
      <td>Fine-tuning</td>
 
      <td>High</td>
 
      <td>Low</td>
 
      <td>Low</td>
 
    </tr>
 
    <tr>
 
      <td>Embeddings</td>
 
      <td>Medium</td>
 
      <td>Medium</td>
 
      <td>Medium</td>
 
    </tr>
 
</table>
 
  
 +
{| class="wikitable" style="width: 550px;"
 +
||
 +
<youtube>qu-vXAFUpLE</youtube>
 +
<b>LLM Ecosystem explained: Your ultimate Guide to AI | [https://www.youtube.com/@code4AI code_your_own_AI]
 +
</b><br>Introduction to the world of LLM (Large Language Models) in April 2023. With detailed explanation of GPT-3.5, GPT-4, T5, Flan-T5 to LLama, Alpaca and KOALA LLM, plus dataset sources and configurations. Including ICL (in-context learning), adapter fine-tuning, PEFT LoRA and classical fine-tuning of LLM explained. When to choose what type of data set for what LLM job?
  
<youtube>7rwiewDGHsk</youtube>
+
A comprehensive LLM /AI ecosystem is essential for the creation and implementation of sophisticated AI applications. It facilitates the efficient processing of large-scale data, the development of complex machine learning models, and the deployment of intelligent systems capable of performing complex tasks.
<youtube>_wtfszFzqt0</youtube>
 
  
= OpenAI Note =
+
As the field of AI continues to evolve and expand, the importance of a well-integrated and cohesive AI ecosystem cannot be overstated.
* [https://openai.com/blog/new-and-improved-embedding-model New and improved embedding model] ... We are excited to announce a new embedding model which is significantly more capable, cost effective, and simpler to use.
 
* [https://platform.openai.com/docs/guides/embeddings Embeddings]
 
  
Embeddings are a numerical representation of text that can be used to measure the relateness between two pieces of text. Our second generation embedding model, text-embedding-ada-002 is a designed to replace the previous 16 first-generation embedding models at a fraction of the cost. An embedding is a [[Math for Intelligence#Vector|vector]] (list) of floating point numbers. The distance between two [[Math for Intelligence#Vector|vectors]] measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
+
A complete overview of today's LLM and how you can train them for your needs.
  
<youtube>ySus5ZS0b94</youtube>
+
* [https://www.youtube.com/hashtag/naturallanguageprocessing #naturallanguageprocessing]
 +
* [https://www.youtube.com/hashtag/largelanguagemodels #LargeLanguageModels]
 +
* [https://www.youtube.com/hashtag/chatgpttutorial #chatgpttutorial]
 +
* [https://www.youtube.com/hashtag/finetuning #finetuning]
 +
* [https://www.youtube.com/hashtag/finetune #finetune]
 +
* [https://www.youtube.com/hashtag/ai #ai]
 +
* [https://www.youtube.com/hashtag/chatgpt #chatgpt]
 +
|}

Latest revision as of 09:04, 23 March 2024

YouTube ... Quora ...Google search ...Google News ...Bing News


A process of retraining a language model on a new dataset of data. This can be used to improve the model's performance on a specific task, such as generating text, translating languages, or answering questions. Fine-tuning is a way to add new knowledge to an existing AI model. It’s a simple upgrade that allows the model to learn new information.

Here are some more detailed information on fine-tuning:

  • Fine-tuning is a relatively simple process. The first step is to select a pre-trained language model. There are many pre-trained language models available, such as GPT-3, RoBERTa, and XLNet. Once you have selected a pre-trained language model, you need to gather a dataset of data for fine-tuning. This dataset should be relevant to the task that you want the model to perform. For example, if you want to fine-tune a language model for question answering, you would need to gather a dataset of questions and answers.
  • The next step is to fine-tune the language model on the dataset of data. This is done by using a technique called supervised learning. In supervised learning, the model is given a set of labeled examples. In the case of fine-tuning, the labels are the answers to the questions in the dataset. The model is then trained to predict the labels for the unlabeled examples in the dataset.
  • Fine-tuning can be a time-consuming process, but it can significantly improve the performance of a language model on a specific task. For example, fine-tuning a language model on a dataset of question and answers can improve the model's ability to answer new questions.



Methods For Fine-tuning an LLM


Fine-tuning can be applied to various types of models, such as convolutional neural networks, recurrent neural networks, and large language models. There are different ways to fine-tune a model, depending on the amount and similarity of the data available for the new task, the complexity and size of the model, and the computational resources and time constraints. Here are some examples of Fine-tuning:

  • Fine-tuning OpenAI's base models such as Davinc, Curie, Babbage, and Ada to improve their performance on a variety of tasks, such as generating text, translating languages, and answering questions.
  • Fine-tuning a binary classifier to rate each completion for truthfulness based on expert-labeled examples.
  • Incorporating proprietary content into a language model to improve its ability to provide relevant answers to questions.
  • Full-model fine-tuning: This method involves updating all the parameters of the pre-trained model on the new task. This method can achieve high performance, but it is also computationally expensive and prone to overfitting if the new data is small or noisy.
  • Partial-model fine-tuning: This method involves updating only a subset of the parameters of the pre-trained model, while keeping the rest fixed or frozen. This method can reduce the computational cost and prevent overfitting, but it also requires choosing which layers or modules to fine-tune and which ones to freeze. A common heuristic is to freeze the earlier layers that capture general features and fine-tune the later layers that capture task-specific features.
  • Adapter-based fine-tuning: This method involves adding small neural networks, called adapters, to each layer or module of the pre-trained model, and updating only the parameters of the adapters on the new task, while keeping the original parameters frozen. This method can achieve parameter-efficient fine-tuning, as adapters have much fewer parameters than the original model, and preserve the performance and robustness of the pre-trained model.

Fine-tuning is a powerful technique that can be used to improve the performance of language models on a variety of tasks. If you are looking to improve the performance of a language model on a specific task, fine-tuning is a good option to consider.

Instruction Tuning


Instructional tuning is a technique that aims to teach Large Language Model (LLM) to follow natural language instructions, such as prompts, examples, and constraints, to perform better on various Natural Language Processing (NLP) tasks. Instructional tuning can improve the capabilities and controllability of LLMs across different tasks, domains, and modalities. It can also enable LLMs to generalize to unseen tasks by using instructions as a bridge between the pretraining objective and the user’s objective.

Instructional tuning involves fine-tuning LLMs with instructional data, which consists of pairs of human-written instructions and desired outputs. For example, an instruction could be “Write a summary of the following article in three sentences” and an output could be “The article discusses the benefits of instructional tuning for large language models. It presents a survey paper that covers the fundamentals, challenges, and applications of this technique. It also introduces a new method called LoRA that leverages LLMs to generate instructional data for themselves.” Instructional data can be collected from various sources, such as existing NLP datasets, expert annotations, or even LLMs themselves.

Implementing Chain of Thought (CoT)

Provide context, examples and prompt to ask model for Chain of Thought (CoT) output specific to task and use this output to tune a model; which is smaller and/or cheaper to inference.

To use CoT Large Language Model (LLM) output to fine-tune a model, you can follow these steps:

  1. Generate CoT demonstrations. This can be done by prompting a large LLM to solve complex questions via zero-shot CoT reasoning. For example, you could prompt the LLM to solve a math problem, and then ask it to explain its reasoning step by step.
  2. Collect a dataset of CoT demonstrations. Once you have generated a set of CoT demonstrations, you can collect them into a dataset. This dataset will be used to fine-tune your model.
  3. Fine-tune your model on the CoT dataset. You can use any standard fine-tuning technique to fine-tune your model on the CoT dataset. For example, you could use supervised learning to train your model to predict the next step in a CoT demonstration sequence.
  4. Evaluate your fine-tuned model. Once you have fine-tuned your model, you can evaluate its performance on a held-out test set of CoT demonstrations.

LoRA

Low-Rank Adaptation (LoRA) is a technique that leverages Large Language Model (LLM)s to generate instructional data for themselves. Instructional data consists of pairs of human-written instructions and desired outputs, which can be used to fine-tune LLMs to follow natural language instructions. LoRA involves prompting LLMs to generate instructions and instances for various Natural Language Processing (NLP) tasks, such as Summarization, Sentiment Analysis, question answering, etc. The generated instructional data can then be used to fine-tune the LLMs on the same or different tasks, improving their capabilities and controllability.

LoRA was proposed by a team of researchers from Microsoft Research in a paper titled “LoRA: Self-Supervised Generation of Instructional Data for Large Language Models” . The paper introduces a framework for Self-Supervised generation of instructional data using LLMs, such as GPT-4. The paper also demonstrates that LoRA can enable LLMs to learn new tasks and skills by generating instructions and instances for them.

LoRA generates instructions and instances by using a two-step process:

  • First, LoRA prompts the LLM to generate instructions for various NLP tasks, such as summarization, sentiment analysis, question answering, etc. The instructions are natural language descriptions of what the LLM should do given an input text. For example, an instruction for summarization could be “Write a summary of the following article in three sentences”. LoRA uses a set of templates and heuristics to guide the LLM to generate diverse and valid instructions.
  • Second, LoRA prompts the LLM to generate instances for each instruction, which are pairs of input texts and desired outputs. The input texts are sampled from a large corpus of text, such as Wikipedia or Common Crawl. The desired outputs are generated by the LLM itself, following the instruction. For example, an instance for summarization could be:

Input: The 2023 FIFA World Cup is scheduled to be the 23rd edition of the FIFA World Cup, the quadrennial international men’s association football championship contested by the national teams of the member associations of FIFA. It is scheduled to take place in Qatar from 21 November to 18 December 2023. This will be the first World Cup ever to be held in the Arab world and the first in a Muslim-majority country. This will be the second World Cup held entirely in Asia after the 2002 tournament in South Korea and Japan. In addition, the tournament will be the last to involve 32 teams, with an increase to 48 teams scheduled for the 2026 tournament in Canada, Mexico and United States.

Output: The 2023 FIFA World Cup is a global football tournament that will take place in Qatar from November to December 2023. It will be the first World Cup in the Arab world and a Muslim-majority country, and the second in Asia. The tournament will feature 32 teams for the last time before expanding to 48 teams in 2026.

LoRA ensures the quality of generated instructions and instances by using several techniques, such as:

  • Filtering out invalid or duplicate instructions based on syntactic and semantic criteria, such as length, readability, specificity, and uniqueness.
  • Evaluating the quality of generated instances based on metrics such as fluency, coherence, relevance, and accuracy.
  • Comparing the generated instances with human-written outputs from existing NLP datasets or expert annotations, and selecting the ones that have high similarity or agreement.
  • Applying post-processing steps such as spelling correction, punctuation normalization, and capitalization to improve the readability and consistency of the generated instances.

QLoRA

Quantized Low Rank Adapters (QLoRA), a method for efficient fine-tuning of quantized large language models (LLMs). Here is a summary of what I have learned from the web search results:

  • QLoRA is a method that combines low-rank matrix factorization and 4-bit quantization to compress the weights of the LLM and the adapters. Adapters are small neural networks that are added to each layer of the LLM and are trained on a specific task, while the LLM itself is frozen12.
  • QLoRA reduces the memory usage of fine-tuning LLMs by up to 98%, enabling fine-tuning LLMs with billions of parameters on a single GPU, which would otherwise require hundreds of GBs of memory.
  • QLoRA preserves the performance of full 16-bit fine-tuning on various tasks, such as instruction following and chatbot generation. QLoRA has been applied to fine-tune LLMs such as LLaMA and T5 on these tasks and has achieved state-of-the-art results.
  • QLoRA introduces several innovations to save memory and improve speed, such as:
    • NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights12.
    • Double Quantization, a technique that reduces the average memory footprint by quantizing the quantization constants12.
    • Paged Optimizers, a method that manages memory spikes by paging out optimizer states12.

ULMFiT

Universal Language Model Fine-tuning (ULMFiT) is a method for fine-tuning a pre-trained language model for a specific downstream task, such as text classification, sentiment analysis, or hate speech detection. ULMFiT can be applied to any task in natural language processing (NLP). ULMFiT has three main steps:

  • General language model pre-training: This step involves training a language model on a large and diverse corpus of text, such as Wikipedia, to learn general linguistic features and patterns. ULMFiT uses a 3-layer AWD-LSTM architecture for its language model.
  • Target language model fine-tuning: This step involves fine-tuning the pre-trained language model on the text data of the target task, such as movie reviews or tweets, to adapt it to the specific domain and vocabulary. ULMFiT introduces several techniques to improve fine-tuning, such as discriminative fine-tuning, which adjusts the learning rate for each layer according to its importance; slanted triangular learning rates, which increases and then decreases the learning rate during training; and gradual unfreezing, which unfreezes and trains one layer at a time from the top to the bottom.
  • Target classifier fine-tuning: This step involves adding a classifier layer on top of the fine-tuned language model and training it on the labeled data of the target task, such as positive or negative sentiment. ULMFiT uses the same techniques as in the previous step to fine-tune the classifier.

ULMFiT has achieved state-of-the-art results on six text classification tasks, reducing the error by 18-24% on most datasets. It has also shown that it can match the performance of training from scratch on 100 times more data with only 100 labeled examples.

Gradient-based

Gradient-based fine-tuning is a method of adapting a pre-trained model to a specific task or domain by updating its parameters using gradient descent. Gradient descent is an optimization algorithm that iteratively adjusts the parameters of a model to minimize a loss function.

Gradient-based fine-tuning can be applied to various types of models, such as large language models, object detection models, or image classification models. The main advantage of gradient-based fine-tuning is that it can leverage the knowledge and generalization ability of the pre-trained model and improve its performance on the target task or domain. However, gradient-based fine-tuning also has some challenges and limitations, such as:

  • It can be computationally expensive and inefficient, especially for large models with many parameters.
  • It can cause overfitting or catastrophic forgetting, which means that the fine-tuned model may lose its original capabilities or perform poorly on out-of-distribution data.
  • It can be sensitive to the choice of hyperparameters, such as the learning rate, the number of fine-tuning steps, or the regularization techniques.

To address these challenges and limitations, researchers have proposed various methods and techniques to improve gradient-based fine-tuning, such as:

  • Using sparse or local attention to reduce the computation cost and memory consumption of fine-tuning large language models with long context sizes.
  • Learning trainable constraints or projection radii for each layer of the model to control the distance between the fine-tuned model and the pre-trained model.
  • Meta-learning dedicated meta-models or hypermodels to generate task-specific parameters or loss functions for the downstream model.
  • Customizing different learning rates or data augmentation strategies for each layer or sample of the model.

Large Language Model (LLM) Ecosystem Explained

The Large Language Model (LLM) ecosystem refers to the various commercial and open-source LLM providers, their offerings, and the tooling that helps accelerate their wide adoption. The functionality of LLMs can be segmented into five areas: Knowledge Answering, Translation, Text Generation, Response Generation, and Classification. There are many options to choose from for all types of language tasks.


LLM Ecosystem explained: Your ultimate Guide to AI | code_your_own_AI
Introduction to the world of LLM (Large Language Models) in April 2023. With detailed explanation of GPT-3.5, GPT-4, T5, Flan-T5 to LLama, Alpaca and KOALA LLM, plus dataset sources and configurations. Including ICL (in-context learning), adapter fine-tuning, PEFT LoRA and classical fine-tuning of LLM explained. When to choose what type of data set for what LLM job?

A comprehensive LLM /AI ecosystem is essential for the creation and implementation of sophisticated AI applications. It facilitates the efficient processing of large-scale data, the development of complex machine learning models, and the deployment of intelligent systems capable of performing complex tasks.

As the field of AI continues to evolve and expand, the importance of a well-integrated and cohesive AI ecosystem cannot be overstated.

A complete overview of today's LLM and how you can train them for your needs.