Constitutional AI
YouTube ... Quora ...Google search ...Google News ...Bing News
- Policy ... Policy vs Plan ... Constitutional AI ... Trust Region Policy Optimization (TRPO) ... Policy Gradient (PG) ... Proximal Policy Optimization (PPO)
- Artificial Intelligence (AI) ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
- Assistants ... Personal Companions ... Agents ... Negotiation ... LangChain
- Artificial Intelligence (AI) ... Generative AI ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
- Conversational AI ... ChatGPT | OpenAI ... Bing/Copilot | Microsoft ... Gemini | Google ... Claude | Anthropic ... Perplexity ... You ... phind ... Ernie | Baidu
- Claude | Anthropic
- Reinforcement Learning (RL) from Human Feedback (RLHF)
- Paper Review: Constitutional AI, Training LLMs using Principles
Constitutional AI is a method for training AI systems using a set of rules or principles that act as a “constitution” for the AI system. This approach allows the AI system to operate within a societally accepted framework and aligns it with human intentions. Some benefits of using Constitutional AI include allowing a model to explain why it is refusing to provide an answer, improving transparency of AI decision making, and controlling AI behavior more precisely with fewer human labels.
- Constitutional AI is a technique that aims to imbue systems with “values” defined by a “constitution”³.
- This makes the behavior of systems both easier to understand and simpler to adjust as needed³.
- The system uses a set of principles to make judgments about outputs, hence the term “Constitutional”⁴.
- This approach makes the values of the AI system easier to understand and easier to adjust as needed⁴.
RL from AI Feedback' (RLAIF)
It is a process that involves training a preference model from a dataset of AI preferences and then using that preference model as the reward signal for training with reinforcement learning. RLAIF is a method for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so the method is referred to as ‘Constitutional AI’. The process involves both a supervised learning and a reinforcement learning phase. In the supervised phase, an initial model is sampled from, then self-critiques and revisions are generated, and then the original model is finetuned on revised responses. In the RL phase, samples are taken from the finetuned model and a model is used to evaluate which of the two samples is better. A preference model is then trained from this dataset of AI preferences. The preference model is used as the reward signal for training with RL.