Difference between revisions of "LLaMA"

From
Jump to: navigation, search
m
m
 
(17 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
|title=PRIMO.ai
 
|title=PRIMO.ai
 
|titlemode=append
 
|titlemode=append
|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools   
+
|keywords=ChatGPT, artificial, intelligence, machine, learning, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools   
  
 
<!-- Google tag (gtag.js) -->
 
<!-- Google tag (gtag.js) -->
Line 21: Line 21:
 
** [https://research.fb.com/category/machine-learning/ Facebook Research]
 
** [https://research.fb.com/category/machine-learning/ Facebook Research]
 
*** [https://ai.facebook.com/tools/ Tools for Advancing the World's AI]
 
*** [https://ai.facebook.com/tools/ Tools for Advancing the World's AI]
* [[Large Language Model (LLM)]] ... [[Large Language Model (LLM)#Multimodal|Multimodal]] ... [[Foundation Models (FM)]] ... [[Generative Pre-trained Transformer (GPT)|Generative Pre-trained]] ... [[Transformer]] ... [[GPT-4]] ... [[GPT-5]] ... [[Attention]] ... [[Generative Adversarial Network (GAN)|GAN]] ... [[Bidirectional Encoder Representations from Transformers (BERT)|BERT]]
+
* [[Large Language Model (LLM)]] ... [[Large Language Model (LLM)#Multimodal|Multimodal]] ... [[Foundation Models (FM)]] ... [[Generative Pre-trained Transformer (GPT)|Generative Pre-trained]] ... [[Transformer]] ... [[Attention]] ... [[Generative Adversarial Network (GAN)|GAN]] ... [[Bidirectional Encoder Representations from Transformers (BERT)|BERT]]
 
* [[Alpaca]]
 
* [[Alpaca]]
 
* [[What is Artificial Intelligence (AI)? | Artificial Intelligence (AI)]] ... [[Generative AI]] ... [[Machine Learning (ML)]] ... [[Deep Learning]] ... [[Neural Network]] ... [[Reinforcement Learning (RL)|Reinforcement]] ... [[Learning Techniques]]
 
* [[What is Artificial Intelligence (AI)? | Artificial Intelligence (AI)]] ... [[Generative AI]] ... [[Machine Learning (ML)]] ... [[Deep Learning]] ... [[Neural Network]] ... [[Reinforcement Learning (RL)|Reinforcement]] ... [[Learning Techniques]]
Line 30: Line 30:
 
* [https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/ You can now run a GPT-3 level AI model on your laptop, phone, and Raspberry Pi | Benj Edwards - Ars Technica]  ... On Friday, a software developer named Georgi Gerganov created a tool called [https://github.com/ggerganov/llama.cpp "llama.cpp"] that can run Meta's new GPT-3-class AI large language model, LLaMA, locally on a Mac laptop. Soon thereafter, people worked out how to run LLaMA on Windows as well. Then someone showed it running on a Pixel 6 phone, and next came a [https://www.raspberrypi.org/ Raspberry Pi] (albeit running very slowly).
 
* [https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/ You can now run a GPT-3 level AI model on your laptop, phone, and Raspberry Pi | Benj Edwards - Ars Technica]  ... On Friday, a software developer named Georgi Gerganov created a tool called [https://github.com/ggerganov/llama.cpp "llama.cpp"] that can run Meta's new GPT-3-class AI large language model, LLaMA, locally on a Mac laptop. Soon thereafter, people worked out how to run LLaMA on Windows as well. Then someone showed it running on a Pixel 6 phone, and next came a [https://www.raspberrypi.org/ Raspberry Pi] (albeit running very slowly).
 
* [https://ai.meta.com/blog/meta-llama-3/ Introducing Meta Llama 3: The most capable openly available LLM to date | Meta]
 
* [https://ai.meta.com/blog/meta-llama-3/ Introducing Meta Llama 3: The most capable openly available LLM to date | Meta]
 +
* [https://github.com/TIBHannover/llama_index_mediawiki-service llama_index_mediawiki-service github] ... a container-virtualised service that aims to run a local Large Language Model (LLM) to assist [[wiki]] users.
 +
* [https://techcrunch.com/2024/04/18/meta-releases-llama-3-claims-its-among-the-best-open-models-available/ Meta releases Llama 3, claims it’s among the best open models available | Kyle Wiggers - Techcrunch]
 +
* [https://zapier.com/blog/llama-meta/ Meta AI: What is Llama 3 and why does it matter? | Harry Guinness - Zapier]
 +
* [https://huggingface.co/meta-llama/Meta-Llama-3-8B Meta-Llama-3-8B | Hugging Face]
  
  
LLaMA is a [[Large Language Model (LLM)]] released by Meta Platforms Inc. (formerly Facebook Inc.).
+
<b>LLaMA</b> is a [[Large Language Model (LLM)]] released by Meta Platforms Inc. (formerly Facebook Inc.). LLaMA represents a significant advancement in open-source large language models from Meta, establishing them as a leader in this space with highly capable and scalable models that are now widely accessible. The key points are:
  
= LLaMA 2 long =
+
* LLaMA 3 models are now available in 8 billion and 70 billion parameter sizes, representing a significant increase in scale and capability compared to the previous LLaMA 2 models.
* [https://venturebeat.com/ai/meta-quietly-releases-llama-2-long-ai-that-outperforms-gpt-3-5-and-claude-2-on-some-tasks/ Meta quietly unveils Llama 2 Long AI that beats GPT-3.5 Turbo and Claude 2 on some tasks | Carl Franzen - VentureBeat]
 
  
Llama 2 Long is a 137B parameter model that was trained on a massive dataset of text and code. It is able to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way. On some benchmarks, Llama 2 Long outperforms GPT-3.5 Turbo and Claude 2 by a significant margin. For example, on the BLEU-4 benchmark, which measures the fluency of machine-generated text, Llama 2 Long scored 104.5, while GPT-3.5 Turbo scored 102.1 and Claude 2 scored 103.2. Meta is making Llama 2 Long available for free for research and commercial use. This is a significant step forward in the development of open source AI models, and it is likely to lead to new and innovative applications for large language models.
+
* The LLaMA 3 models have been trained on over 15 trillion tokens of data, 7 times more than the LLaMA 2 models, including 4 times more code data. This has resulted in major improvements in performance on benchmarks like MMLU, GSM-K, and HumanEval.
  
= LLaMA 2 =
+
* Key new capabilities of LLaMA 3 include enhanced reasoning, code generation, and instruction following, as well as improved safety features like reduced false refusal rates and increased response diversity.
* [https://about.fb.com/news/2023/07/llama-2/ Meta and Microsoft Introduce the Next Generation of Llama | Meta]
 
[[Meta]] released LLaMA in July 2023. The company said that it hopes that by making LLaMA 2 open-source, it will be able to improve the model by getting feedback from the wider community of developers. [[Microsoft]] and [[Meta]] are expanding their longstanding partnership, with [[Microsoft]] as the preferred partner for Llama 2. LLaMA is still under development, but it has already been used to create some impressive chatbots. For example, one chatbot called Allie can be used to provide customer support. Another chatbot called Galactica is designed for scientific research. LLaMA-2-7B, LLaMA-2-13B, and LLaMA-2-70B.
 
  
 +
* Meta is also currently training even larger LLaMA 3 models over 400 billion parameters, which will add multimodal and multilingual capabilities.
  
<youtube>zJBpRn2zTco</youtube>
+
* The LLaMA 3 models are being made openly available by Meta to the developer community, with support from major hardware providers like Intel, Qualcomm, and AMD. This establishes LLaMA 3 as a leading open-source AI model.
<youtube>kXuHxI5ZcG0</youtube>
 
<youtube>6iHVJyX2e50</youtube>
 
<youtube>UMK_NTH4vwU</youtube>
 
  
= LLaMA <i>(initial)</i> =
+
   
LLaMA on February 24, 20231. Meta says it is democratizing access to LLMs, which are seen as one of the most important and beneficial forms of AI. The four foundation models of LLaMA are LLaMA-7B, LLaMA-13B, LLaMA-33B, and LLaMA-65B12. They have 7 billion, 13 billion, 33 billion, and 65 billion parameters respectively. The models are all based on the [[transformer]] architecture and trained on publicly available datasets. LLaMA-13B is remarkable because it can run on a single GPU and outperform GPT-3 (175 billion parameters) on most common sense reasoning benchmarks. LLaMA-65B is competitive with the best models from other AI labs, such as Chinchilla 70B and [[PaLM]] 540B.
 
  
 +
<youtube>r3DC_gjFCSA</youtube>
 +
<youtube>RfIXVlMEi4c</youtube>
  
<youtube>GvPpTIQPWto</youtube>
+
= <span id="Implement a Chat with LLaMA"></span>Implement a Chat with LLaMA =
<youtube>E5OnoYF2oAk</youtube>
 
<youtube>NN1TsKZ4D24</youtube>
 
<youtube>0kRDs9BW2NU</youtube>
 
  
= <span id="Sharing with LLaMA"></span>Sharing with LLaMA =
+
When [[Meta]] shares the "weights" of the LLaMA model, they are providing the parameters that have been learned during the training process, which include embedding, self-attention, feedforward, and bias weights. These weights are essential for the model to function correctly and are what enable the model to process natural language and generate coherent and contextually relevant text.
  
When [[Meta]] shares the "weights" of the [[LLaMA]] model, they are providing the parameters that have been learned during the training process, which include embedding, self-attention, feedforward, and bias weights. These weights are essential for the model to function correctly and are what enable the model to process natural language and generate coherent and contextually relevant text.
+
To implement a chat locally on your machine using the weights from Meta for LLaMA, you will need to follow these steps. By following these steps, you can set up and run a LLaMA-based chat locally on your machine, allowing you to interact with the model and develop applications that leverage its natural language processing capabilities:
  
# Researching
+
1. <b>Download the Pretrained Model Weights: </b>Obtain the pretrained model weights from official sources such as [[Meta]]'s webpage, GitHub, [[Hugging Face]], or Ollama.
Searching for **Implementing chatbot locally using Meta's LLaMA weights**
 
  
Searching for **Steps to deploy chatbot with Meta's LLaMA weights on local machine**
+
2. <b>Set Up Your Local Environment: </b>Ensure that your local machine has the necessary hardware specifications, such as a strong CPU and a significant amount of GPU memory, to run a large language model like LLaMA. If you have enough GPU memory, you can run larger models with full precision.
  
Searching for **Components required for setting up a chatbot using Meta's LLaMA weights locally**
+
3. <b>Install Required Libraries and Dependencies: </b>Use Python to write the script for setting up and running the model. Install the `transformers` and `accelerate` libraries from Hugging Face using the commands `pip install transformers` and `pip install accelerate`.
  
 +
4. <b>Write Your Python Script: </b>Import necessary modules such as `LlamaForCausalLM`, `LlamaTokenizer`, `pipeline`, and `torch`. Load the LLaMA model with the downloaded weights, define and instantiate the tokenizer and pipeline, and run the pipeline to generate responses based on input prompts.
  
To implement a chat locally on your machine using the weights from Meta for LLaMA, you will need to follow these steps:
+
5. <b>Run the Model Locally: </b>Save your Python script and execute it using the command `python <name of script>.py`. Provide different prompts as input to generate responses and test the model's performance.
  
1. **Download the Pretrained Model Weights**: Obtain the pretrained model weights from official sources such as Meta's webpage, GitHub, Huggingface, or Ollama.
+
6. <b>Use Open-Source Tools for Local Execution: </b>Utilize open-source tools like [[Hugging Face]]'s Transformers library to pull the models from the [[Hugging Face]] Hub. After installing the necessary libraries and upgrading the transformers library, you can install the model and start querying using the provided code snippet.
  
2. **Set Up Your Local Environment**: Ensure that your local machine has the necessary hardware specifications, such as a strong CPU and a significant amount of GPU memory, to run a large language model like LLaMA. If you have enough GPU memory, you can run larger models with full precision.
+
7. <b>Interactive Chat Interface: </b>For an interactive chat interface, you can wrap the model inside Gradio. Install Gradio and run the provided code to create a demo of the Gradio app and LLaMA in action.  
  
3. **Install Required Libraries and Dependencies**: Use Python to write the script for setting up and running the model. Install the `transformers` and `accelerate` libraries from Hugging Face using the commands `pip install transformers` and `pip install accelerate`.
+
8. <b>Running LLaMA through Ollama: </b>For Linux/MacOS users, Ollama is recommended for running LLaMA models locally. You can use the CLI command `ollama run llama3` or the API command `curl -X POST http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt":"Why is the sky blue?" }'` to interact with the model.
  
4. **Write Your Python Script**: Import necessary modules such as `LlamaForCausalLM`, `LlamaTokenizer`, `pipeline`, and `torch`. Load the LLaMA model with the downloaded weights, define and instantiate the tokenizer and pipeline, and run the pipeline to generate responses based on input prompts.
+
9. <b>Quantization for Reduced Model Size: </b>If necessary, you can reduce the size of the LLM models while maintaining performance by quantizing the model's parameters, which can result in a significant reduction in model size.
  
5. **Run the Model Locally**: Save your Python script and execute it using the command `python <name of script>.py`. Provide different prompts as input to generate responses and test the model's performance.
 
  
6. **Use Open-Source Tools for Local Execution**: Utilize open-source tools like HuggingFace's Transformers library to pull the models from the HuggingFace Hub. After installing the necessary libraries and upgrading the transformers library, you can install the model and start querying using the provided code snippet.
+
= AI Agents with LLaMA =
  
7. **Interactive Chat Interface**: For an interactive chat interface, you can wrap the model inside Gradio. Install Gradio and run the provided code to create a demo of the Gradio app and LLaMA in action. Here's an example of what a Gradio interface might look like:
+
<youtube>i-txsBoTJtI</youtube>
 
+
<youtube>JLmI0GJuGlY</youtube>
  ![Gradio Example](https://upload.wikimedia.org/wikipedia/commons/thumb/2/29/Gradio_example.png/500px-Gradio_example.png)
+
<youtube>lvQ96Ssesfk</youtube>
 
+
<youtube>-ROS6gfYIts</youtube>
8. **Running LLaMA through Ollama**: For Linux/MacOS users, Ollama is recommended for running LLaMA models locally. You can use the CLI command `ollama run llama3` or the API command `curl -X POST http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt":"Why is the sky blue?" }'` to interact with the model.
 
 
 
9. **Quantization for Reduced Model Size**: If necessary, you can reduce the size of the LLM models while maintaining performance by quantizing the model's parameters, which can result in a significant reduction in model size.
 
 
 
10. **Further Exploration**: To deepen your understanding of LLaMA, you can explore resources such as the paper on LLaMA 2, the model source from the LLaMA 2 GitHub repo, and the Meta AI website for more information on the model, benchmarks, technical specifications, and responsible use considerations.
 
 
 
By following these steps, you can set up and run a LLaMA-based chat locally on your machine, allowing you to interact with the model and develop applications that leverage its natural language processing capabilities.
 

Latest revision as of 10:16, 28 May 2025

YouTube search... ...Google search


LLaMA is a Large Language Model (LLM) released by Meta Platforms Inc. (formerly Facebook Inc.). LLaMA represents a significant advancement in open-source large language models from Meta, establishing them as a leader in this space with highly capable and scalable models that are now widely accessible. The key points are:

  • LLaMA 3 models are now available in 8 billion and 70 billion parameter sizes, representing a significant increase in scale and capability compared to the previous LLaMA 2 models.
  • The LLaMA 3 models have been trained on over 15 trillion tokens of data, 7 times more than the LLaMA 2 models, including 4 times more code data. This has resulted in major improvements in performance on benchmarks like MMLU, GSM-K, and HumanEval.
  • Key new capabilities of LLaMA 3 include enhanced reasoning, code generation, and instruction following, as well as improved safety features like reduced false refusal rates and increased response diversity.
  • Meta is also currently training even larger LLaMA 3 models over 400 billion parameters, which will add multimodal and multilingual capabilities.
  • The LLaMA 3 models are being made openly available by Meta to the developer community, with support from major hardware providers like Intel, Qualcomm, and AMD. This establishes LLaMA 3 as a leading open-source AI model.


Implement a Chat with LLaMA

When Meta shares the "weights" of the LLaMA model, they are providing the parameters that have been learned during the training process, which include embedding, self-attention, feedforward, and bias weights. These weights are essential for the model to function correctly and are what enable the model to process natural language and generate coherent and contextually relevant text.

To implement a chat locally on your machine using the weights from Meta for LLaMA, you will need to follow these steps. By following these steps, you can set up and run a LLaMA-based chat locally on your machine, allowing you to interact with the model and develop applications that leverage its natural language processing capabilities:

1. Download the Pretrained Model Weights: Obtain the pretrained model weights from official sources such as Meta's webpage, GitHub, Hugging Face, or Ollama.

2. Set Up Your Local Environment: Ensure that your local machine has the necessary hardware specifications, such as a strong CPU and a significant amount of GPU memory, to run a large language model like LLaMA. If you have enough GPU memory, you can run larger models with full precision.

3. Install Required Libraries and Dependencies: Use Python to write the script for setting up and running the model. Install the `transformers` and `accelerate` libraries from Hugging Face using the commands `pip install transformers` and `pip install accelerate`.

4. Write Your Python Script: Import necessary modules such as `LlamaForCausalLM`, `LlamaTokenizer`, `pipeline`, and `torch`. Load the LLaMA model with the downloaded weights, define and instantiate the tokenizer and pipeline, and run the pipeline to generate responses based on input prompts.

5. Run the Model Locally: Save your Python script and execute it using the command `python <name of script>.py`. Provide different prompts as input to generate responses and test the model's performance.

6. Use Open-Source Tools for Local Execution: Utilize open-source tools like Hugging Face's Transformers library to pull the models from the Hugging Face Hub. After installing the necessary libraries and upgrading the transformers library, you can install the model and start querying using the provided code snippet.

7. Interactive Chat Interface: For an interactive chat interface, you can wrap the model inside Gradio. Install Gradio and run the provided code to create a demo of the Gradio app and LLaMA in action.

8. Running LLaMA through Ollama: For Linux/MacOS users, Ollama is recommended for running LLaMA models locally. You can use the CLI command `ollama run llama3` or the API command `curl -X POST http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt":"Why is the sky blue?" }'` to interact with the model.

9. Quantization for Reduced Model Size: If necessary, you can reduce the size of the LLM models while maintaining performance by quantizing the model's parameters, which can result in a significant reduction in model size.


AI Agents with LLaMA