Large Language Model (LLM) - Revision history

BPeat: /* Large Language Models (LLM) */

2024-05-19T11:23:42Z

‎Large Language Models (LLM)

BPeat: /* Large Language Models (LLM) */

2024-05-19T11:21:34Z

‎Large Language Models (LLM)

BPeat: /* Large Language Models (LLM) */

2024-05-12T22:24:38Z

‎Large Language Models (LLM)

BPeat: /* Sharing */

2024-04-28T13:08:58Z

‎Sharing

BPeat: /* Sharing */

2024-04-28T13:07:09Z

‎Sharing

BPeat: /* Weight */

2024-04-28T11:17:49Z

‎Weight

BPeat at 10:59, 28 April 2024

2024-04-28T10:59:36Z

BPeat at 02:06, 27 April 2024

2024-04-27T02:06:12Z

BPeat at 20:26, 31 March 2024

2024-03-31T20:26:26Z

BPeat at 13:41, 23 March 2024

2024-03-23T13:41:44Z

@@ Line 128: / Line 128: @@
 ** [https://huggingface.co/Writer/palmyra-base  Palmyra |] [[Hugging Face]] ... a privacy-first LLM for enterprises
 ** [[PaLM]] | [[Google]] ... Pathways Language Model  ... 540B
-** [https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/ Phi-3] | [[Microsoft]] ... outperform much larger models in math and computer science ... small language model
+** [https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/ Phi-3] | [[Microsoft]] ... outperform much larger models in math and computer science ... Phi-3-mini 3.8B
 ** [https://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu]  ... 11B
 ** [[RETRO]] | [[Google | DeepMind]]

@@ Line 128: / Line 128: @@
 ** [https://huggingface.co/Writer/palmyra-base  Palmyra |] [[Hugging Face]] ... a privacy-first LLM for enterprises
 ** [[PaLM]] | [[Google]] ... Pathways Language Model  ... 540B
 ** [https://research.baidu.com/Blog/index-view?id=163 PLATO-XL | Baidu]  ... 11B
 ** [[RETRO]] | [[Google | DeepMind]]

@@ Line 123: / Line 123: @@
 ** [https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation/ NLLB |] [[Meta]]  54.5B & 200B parameters; NLLB-200
 ** [https://www.together.xyz/blog/openchatkit OpenChatKit | TogetherCompute] ... The first open-source [[ChatGPT]] alternative released; a 20B chat-GPT model under the Apache-2.0 license, which is available for free on Hugging Face.
 ** [https://idw-online.de/en/news786967 OpenGPT-X]  ... model for Europe
 ** [https://www.reuters.com/technology/facebook-owner-meta-opens-access-ai-large-language-model-2022-05-03/ OPT-175B]...[[Meta|Facebook]]-owner Meta opens access to AI large language model | Elizabeth Culliford - Reuters ... [[Meta|Facebook]] 175B  ... BlenderBot   175B

← Older revision		Revision as of 13:08, 28 April 2024
Line 182:		Line 182:

	=== <span id="Sharing"></span>Sharing ===		=== <span id="Sharing"></span>Sharing ===
		+	* [[LLaMA]] \| [[Meta]]

	Sharing "weights" refers to the distribution of the parameters that determine the strength of connections between neurons in different layers of a neural network. These weights are crucial for the model's ability to process and generate information. During the training phase, these weights are adjusted to optimize the model's performance, allowing it to learn and understand the relationships between different tokens. This process involves using a learning rate, which is a hyperparameter that controls the size of the steps taken to update the weights. Additionally, techniques like weight pruning can be used to simplify the model by removing weights that have minimal impact on the output. Regularization methods such as L1 and L2 are also employed to prevent overfitting by adding a penalty term to the loss function based on the magnitude of the weights. When [[Meta]] shares the "weights" of the [[LLaMA]] model, they are providing the parameters that have been learned during the training process, which include embedding, self-attention, feedforward, and bias weights.		Sharing "weights" refers to the distribution of the parameters that determine the strength of connections between neurons in different layers of a neural network. These weights are crucial for the model's ability to process and generate information. During the training phase, these weights are adjusted to optimize the model's performance, allowing it to learn and understand the relationships between different tokens. This process involves using a learning rate, which is a hyperparameter that controls the size of the steps taken to update the weights. Additionally, techniques like weight pruning can be used to simplify the model by removing weights that have minimal impact on the output. Regularization methods such as L1 and L2 are also employed to prevent overfitting by adding a penalty term to the loss function based on the magnitude of the weights. When [[Meta]] shares the "weights" of the [[LLaMA]] model, they are providing the parameters that have been learned during the training process, which include embedding, self-attention, feedforward, and bias weights.

@@ Line 183: / Line 183: @@
 === <span id="Sharing"></span>Sharing ===
-Sharing "weights" refers to the distribution of the parameters that determine the strength of connections between neurons in different layers of a neural network. These weights are crucial for the model's ability to process and generate information. During the training phase, these weights are adjusted to optimize the model's performance, allowing it to learn and understand the relationships between different tokens. This process involves using a learning rate, which is a hyperparameter that controls the size of the steps taken to update the weights. Additionally, techniques like weight pruning can be used to simplify the model by removing weights that have minimal impact on the output. Regularization methods such as L1 and L2 are also employed to prevent overfitting by adding a penalty term to the loss function based on the magnitude of the weights
+Sharing "weights" refers to the distribution of the parameters that determine the strength of connections between neurons in different layers of a neural network. These weights are crucial for the model's ability to process and generate information. During the training phase, these weights are adjusted to optimize the model's performance, allowing it to learn and understand the relationships between different tokens. This process involves using a learning rate, which is a hyperparameter that controls the size of the steps taken to update the weights. Additionally, techniques like weight pruning can be used to simplify the model by removing weights that have minimal impact on the output. Regularization methods such as L1 and L2 are also employed to prevent overfitting by adding a penalty term to the loss function based on the magnitude of the weights. When [[Meta]] shares the "weights" of the [[LLaMA]] model, they are providing the parameters that have been learned during the training process, which include embedding, self-attention, feedforward, and bias weights.
 = Risks =

@@ Line 176: / Line 176: @@
 == <span id="Weight"></span>Weight ==
 A weight is a type of parameter that defines the strength of connections between neurons across different layers in the model. Weights are adjusted during training to optimize the model's ability to learn relationships between different tokens. For example, a weight might define the strength of the connection between the neuron that represents the token "the" and the neuron that represents the token "quick".
-* <b>[[Embedding]]</b> weights: These weights are associated with each token in the vocabulary and are used to represent the meaning of the token.
+* <b>[[Embedding]]</b> weights: These weights are associated with each token in the vocabulary and are used to represent the semantic meaning of the tokens.
-* <b>Self-attention</b> weights: used to calculate the attention weights between each token in a sequence.
+* <b>Self-attention</b> weights: These weights are used to determine the influence of different tokens on each other within a sequence
-* <b>Feedforward</b> weights: used to calculate the output of the feedforward layer in each block of the LLM.
+* <b>Feedforward</b> weights: These weights are used in the feedforward layers of the model to compute the layer's output, which is a part of each block in the large language model (LLM)
-* <b>Bias</b> weights: added to the outputs of the [[embedding]] layer, the self-attention layer, and the feedforward layer.
+* <b>Bias</b> weights: Bias weights are added to the outputs of various layers, including the [[embedding]], self-attention, and feedforward layers, to help the model make more accurate predictions
 = Risks =

@@ Line 118: / Line 118: @@
 *** [https://arxiv.org/abs/2201.11990 Megatron Turing (MT-NLG)]  530B
 ** [https://github.com/karpathy/minGPT minGPT | Andrej Karpathy - GitHub]
-** [[Mistral]] ... Mixtral 8x7b
+** [[Mistral]] ... Mixtral 8x7b ... [[Mixture-of-Experts (MoE)]]
 ** [https://muse.lighton.ai/home Muse] ... VLM-4, a set of natively trained large Language Models in French, Italian, Spanish, German, and English
 ** [https://github.com/karpathy/nanoGPT nanoGPT] ... for training/finetuning medium-sized GPTs

@@ Line 31: / Line 31: @@
 * [[End-to-End Speech]] ... [[Synthesize Speech]] ... [[Speech Recognition]] ... [[Music]]
 * [[Analytics]] ... [[Visualization]] ... [[Graphical Tools for Modeling AI Components|Graphical Tools]] ... [[Diagrams for Business Analysis|Diagrams]] & [[Generative AI for Business Analysis|Business Analysis]] ... [[Requirements Management|Requirements]] ... [[Loop]] ... [[Bayes]] ... [[Network Pattern]]
-* [[Development]] ... [[Notebooks]] ... [[Development#AI Pair Programming Tools|AI Pair Programming]] ... [[Codeless Options, Code Generators, Drag n' Drop|Codeless, Generators, Drag n' Drop]] ... [[Algorithm Administration#AIOps/MLOps|AIOps/MLOps]] ... [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)|AIaaS/MLaaS]]
+* [[Development]] ... [[Notebooks]] ... [[Development#AI Pair Programming Tools|AI Pair Programming]] ... [[Codeless Options, Code Generators, Drag n' Drop|Codeless]] ... [[Hugging Face]] ... [[Algorithm Administration#AIOps/MLOps|AIOps/MLOps]] ... [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)|AIaaS/MLaaS]]
 * [[Prompt Engineering (PE)]] ... [[Prompt Engineering (PE)#PromptBase|PromptBase]] ... [[Prompt Injection Attack]]
 * [[Artificial General Intelligence (AGI) to Singularity]] ... [[Inside Out - Curious Optimistic Reasoning| Curious Reasoning]] ... [[Emergence]] ... [[Moonshots]] ... [[Explainable / Interpretable AI|Explainable AI]] ...  [[Algorithm Administration#Automated Learning|Automated Learning]]

@@ Line 26: / Line 26: @@
 * [[Conversational AI]] ... [[ChatGPT]] | [[OpenAI]] ... [[Bing/Copilot]] | [[Microsoft]] ... [[Gemini]] | [[Google]] ... [[Claude]] | [[Anthropic]] ... [[Perplexity]] ... [[You]] ... [[phind]] ... [[Ernie]] | [[Baidu]]
 * [[Cohere]]
-* [[Assistants]] ... [[Personal Companions]] ... [[Agents]]  ... [[Negotiation]] ... [[LangChain]]
+* [[Agents]] ... [[Robotic Process Automation (RPA)|Robotic Process Automation]] ... [[Assistants]] ... [[Personal Companions]] ... [[Personal Productivity|Productivity]] ... [[Email]] ... [[Negotiation]] ... [[LangChain]]
 * [[Excel]] ... [[LangChain#Documents|Documents]] ... [[Database|Database; Vector & Relational]] ... [[Graph]] ... [[LlamaIndex]]
 * [[Video/Image]] ... [[Vision]] ... [[Colorize]] ... [[Image/Video Transfer Learning]]