Difference between revisions of "Constitutional AI"

From
Jump to: navigation, search
m
m
 
(9 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
|title=PRIMO.ai
 
|title=PRIMO.ai
 
|titlemode=append
 
|titlemode=append
|keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Facebook, Meta, Google, Nvidia, Microsoft, Azure, Amazon, AWS  
+
|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
+
 
 +
<!-- Google tag (gtag.js) -->
 +
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script>
 +
<script>
 +
  window.dataLayer = window.dataLayer || [];
 +
  function gtag(){dataLayer.push(arguments);}
 +
  gtag('js', new Date());
 +
 
 +
  gtag('config', 'G-4GCWLBVJ7T');
 +
</script>
 
}}
 
}}
 
[https://www.youtube.com/results?search_query=Constitutional+AI YouTube]
 
[https://www.youtube.com/results?search_query=Constitutional+AI YouTube]
Line 11: Line 20:
 
[https://www.bing.com/news/search?q=Constitutional+AI&qft=interval%3d%228%22 ...Bing News]
 
[https://www.bing.com/news/search?q=Constitutional+AI&qft=interval%3d%228%22 ...Bing News]
  
* [[Reinforcement Learning (RL)]]
+
* [[Policy]] ... [[Policy vs Plan]] ... [[Constitutional AI]] ... [[Trust Region Policy Optimization (TRPO)]] ... [[Policy Gradient (PG)]] ... [[Proximal Policy Optimization (PPO)]]
* [[Assistants]] ... [[Agents]] ... [[Negotiation]] ... [[LangChain]]
+
* [[What is Artificial Intelligence (AI)? | Artificial Intelligence (AI)]] ... [[Machine Learning (ML)]] ... [[Deep Learning]] ... [[Neural Network]] ... [[Reinforcement Learning (RL)|Reinforcement]] ... [[Learning Techniques]]
* [[Generative AI]] ... [[Conversational AI]] ... [[OpenAI]]'s [[ChatGPT]] ... [[Perplexity]] ... [[Microsoft]]'s [[Bing]] ... [[You]] ...[[Google]]'s [[Bard]] ... [[Baidu]]'s [[Ernie]]
+
* [[Agents]] ... [[Robotic Process Automation (RPA)|Robotic Process Automation]] ... [[Assistants]] ... [[Personal Companions]] ... [[Personal Productivity|Productivity]] ... [[Email]] ... [[Negotiation]] ... [[LangChain]]
 +
* [[What is Artificial Intelligence (AI)? | Artificial Intelligence (AI)]] ... [[Generative AI]] ... [[Machine Learning (ML)]] ... [[Deep Learning]] ... [[Neural Network]] ... [[Reinforcement Learning (RL)|Reinforcement]] ... [[Learning Techniques]]
 +
* [[Conversational AI]] ... [[ChatGPT]] | [[OpenAI]] ... [[Bing/Copilot]] | [[Microsoft]] ... [[Gemini]] | [[Google]] ... [[Claude]] | [[Anthropic]] ... [[Perplexity]] ... [[You]] ... [[phind]] ... [[Ernie]] | [[Baidu]]
 +
* [[Claude]] | [https://www.anthropic.com/ Anthropic]
 
* [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
 
* [[Reinforcement Learning (RL) from Human Feedback (RLHF)]]
* [[Policy]]  ... [[Policy vs Plan]] ... [[Constitutional AI]] ... [[Trust Region Policy Optimization (TRPO)]] ... [[Policy Gradient (PG)]] ... [[Proximal Policy Optimization (PPO)]]
 
 
* [https://medium.com/mlearning-ai/paper-review-constituional-ai-training-llms-using-principles-16c68cfffaef Paper Review: Constitutional AI, Training LLMs using Principles]  
 
* [https://medium.com/mlearning-ai/paper-review-constituional-ai-training-llms-using-principles-16c68cfffaef Paper Review: Constitutional AI, Training LLMs using Principles]  
  
 +
Constitutional AI is a method for training AI systems using a set of rules or principles that act as a “constitution” for the AI system. This approach allows the AI system to operate within a societally accepted framework and aligns it with human intentions. Some benefits of using Constitutional AI include allowing a model to explain why it is refusing to provide an answer, improving transparency of AI decision making, and controlling AI behavior more precisely with fewer human labels.
 +
* Constitutional AI is a technique that aims to imbue systems with “values” defined by a “constitution”³.
 +
* This makes the behavior of systems both easier to understand and simpler to adjust as needed³.
 +
* The system uses a set of principles to make judgments about outputs, hence the term “Constitutional”⁴.
 +
* This approach makes the values of the AI system easier to understand and easier to adjust as needed⁴.
  
Constitutional AI is a method for training AI systems using a set of rules or principles that act as a “constitution” for the AI system. This approach allows the AI system to operate within a societally accepted framework and aligns it with human intentions1.
 
 
Some benefits of using Constitutional AI include allowing a model to explain why it is refusing to provide an answer, improving transparency of AI decision making, and controlling AI behavior more precisely with fewer human labels.
 
  
 
<youtube>5GqtRXY-80k</youtube>
 
<youtube>5GqtRXY-80k</youtube>
Line 28: Line 41:
 
<youtube>fqC3D-zNJUM</youtube>
 
<youtube>fqC3D-zNJUM</youtube>
  
= Claude | Anthropic =
 
* [https://www.anthropic.com/ Claude | Anthropic]
 
** [https://scale.com/blog/chatgpt-vs-claude#What%20is%20%E2%80%9CConstitutional%20AI%E2%80%9D? Meet Claude: Anthropic’s Rival to ChatGPT | Riley Goodside - Scale]
 
** [https://arstechnica.com/information-technology/2023/03/anthropic-introduces-claude-a-more-steerable-ai-competitor-to-chatgpt/ Anthropic introduces Claude, a “more steerable” AI competitor to ChatGPT | Benj Edwards - ARS Technica] ... Anthropic aims for "safer" and "less harmful" AI, but at a higher price.
 
 
As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as 'Constitutional AI'. The process involves both a [[Supervised]] Learning and a [[Reinforcement Learning (RL)]] phase. In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences. We then train with [[Reinforcement Learning (RL)|RL]] using the preference model as the reward signal, i.e. we use 'RL from AI Feedback' (RLAIF). As a result we are able to train a harmless but non-evasive AI assistant that engages with harmful queries by explaining its objections to them. Both the SL and RL methods can leverage chain-of-thought style reasoning to improve the human-judged performance and transparency of AI decision making. These methods make it possible to control AI behavior more precisely and with far fewer human labels. - [https://www.anthropic.com/index/measuring-progress-on-scalable-oversight-for-large-language-models Anthropic]
 
 
The Constitutional AI methodology has two phases, similar to  [[Reinforcement Learning (RL) from Human Feedback (RLHF)]].
 
 
1. The Supervised Learning Phase.
 
 
 
<img src="https://miro.medium.com/v2/resize:fit:828/format:webp/1*6zHXwFeUiwK3WeUyIQKktw.png" width="740">
 
 
2. The Reinforcement Learning Phase.
 
 
 
<img src="https://miro.medium.com/v2/resize:fit:828/format:webp/1*thP_MQQ-pLmZn_s4nsnfeg.png" width="1000">
 
 
 
<youtube>KB5r9xmrQBY</youtube>
 
<youtube>_TAWaueEmoY</youtube>
 
<youtube>Us-OAs9hDI4</youtube>
 
<youtube>B7Mg8Hbcc0w</youtube>
 
  
 
== RL from AI Feedback' (RLAIF) ==
 
== RL from AI Feedback' (RLAIF) ==
 
It is a process that involves training a preference model from a dataset of AI preferences and then using that preference model as the reward signal for training with reinforcement learning. RLAIF is a method for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so the method is referred to as ‘Constitutional AI’. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase, an initial model is sampled from, then self-critiques and revisions are generated, and then the original model is finetuned on revised responses. In the [[Reinforcement Learning (RL)|RL]] phase, samples are taken from the finetuned model and a model is used to evaluate which of the two samples is better. A preference model is then trained from this dataset of AI preferences. The preference model is used as the reward signal for training with [[Reinforcement Learning (RL)|RL]].
 
It is a process that involves training a preference model from a dataset of AI preferences and then using that preference model as the reward signal for training with reinforcement learning. RLAIF is a method for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so the method is referred to as ‘Constitutional AI’. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase, an initial model is sampled from, then self-critiques and revisions are generated, and then the original model is finetuned on revised responses. In the [[Reinforcement Learning (RL)|RL]] phase, samples are taken from the finetuned model and a model is used to evaluate which of the two samples is better. A preference model is then trained from this dataset of AI preferences. The preference model is used as the reward signal for training with [[Reinforcement Learning (RL)|RL]].

Latest revision as of 08:55, 23 March 2024

YouTube ... Quora ...Google search ...Google News ...Bing News

Constitutional AI is a method for training AI systems using a set of rules or principles that act as a “constitution” for the AI system. This approach allows the AI system to operate within a societally accepted framework and aligns it with human intentions. Some benefits of using Constitutional AI include allowing a model to explain why it is refusing to provide an answer, improving transparency of AI decision making, and controlling AI behavior more precisely with fewer human labels.

  • Constitutional AI is a technique that aims to imbue systems with “values” defined by a “constitution”³.
  • This makes the behavior of systems both easier to understand and simpler to adjust as needed³.
  • The system uses a set of principles to make judgments about outputs, hence the term “Constitutional”⁴.
  • This approach makes the values of the AI system easier to understand and easier to adjust as needed⁴.



RL from AI Feedback' (RLAIF)

It is a process that involves training a preference model from a dataset of AI preferences and then using that preference model as the reward signal for training with reinforcement learning. RLAIF is a method for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so the method is referred to as ‘Constitutional AI’. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase, an initial model is sampled from, then self-critiques and revisions are generated, and then the original model is finetuned on revised responses. In the RL phase, samples are taken from the finetuned model and a model is used to evaluate which of the two samples is better. A preference model is then trained from this dataset of AI preferences. The preference model is used as the reward signal for training with RL.