- Artificial General Intelligence (AGI) to Singularity ... Curious Reasoning ... Emergence ... Moonshots ... Explainable AI ... Automated Learning
- Large Language Model (LLM) ... Multimodal ... Foundation Models (FM) ... Generative Pre-trained ... Transformer ... GPT-4 ... GPT-5 ... Attention ... GAN ... BERT
- In-Context Learning (ICL) ... Context ... Causation vs. Correlation ... Autocorrelation ... Out-of-Distribution (OOD) Generalization ... Transfer Learning
- Auto-GPT ... autonomously use the results it generates to create new prompts, chaining these operations together to complete complex tasks.
- Immersive Reality ... Metaverse ... Digital Twin ... Internet of Things (IoT) ... Transhumanism
- Embodied AI ... Embodiment Hypothesis: intelligence emerges in the interaction of an agent with an environment
- Artificial Intelligence (AI) ... Generative AI ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
- Conversational AI ... ChatGPT | OpenAI ... Bing | Microsoft ... Bard | Google ... Claude | Anthropic ... Perplexity ... You ... Ernie | Baidu
- Analytics ... Visualization ... Graphical Tools ... Diagrams & Business Analysis ... Requirements ... Loop ... Bayes ... Network Pattern
- Multi-Loop Learning
- Exponential Progression
- Chain of Thought (CoT) ... Tree of Thoughts (ToT)
- Emergence | Wikipedia
- This Strange Rule Is What Makes the Human Brain So Powerful | Shelly Fan - SingularityHub
- A New Capability Maturity Model for Deep Learning | Carlos E. Perez - Intuition Machine
- Google's AI shocks engineers by learning new language without human assistance | Vinay Patel - International Business Times (IBT) ... The AI was able to successfully learn the language of Bangladesh, Bengali although it wasn't trained to do so.
- Emergent Abilities of Large Language Models | Ryan O'Connor - AssembyAI
- 137 emergent abilities of large language models | Jason Wei
- Enablement And Radical Emergence | Stuart Kauffman - NPR
- Google explores emergent abilities in large AI models | Maximilian Schreiner - The Decoder
- The Unpredictable Abilities Emerging From Large AI Models | Stephen Ornes Quanta Magazine
- 'Emergent Abilities': When AI LLMs Learn Stuff They Shouldn't Know | David Ramel - Virtualization Review
Emergent behavior is a phenomenon where a system exhibits new and unexpected properties or behaviors that arise from the interactions of its individual components. In AI, emergent behavior can be seen when large and complex models develop novel and surprising abilities or strategies that were not explicitly programmed or anticipated by their creator; such as some Large Language Model (LLM) can solve simple math problems, generate computer code, compose music, generate fictional stories, or decode movie titles based on emojis.
AI decoded the movie title based on these emojis, can you?
These abilities are surprising and unpredictable because they seem to have little to do with analyzing text, which is the main task of these models. Emergence with AI raises many questions about how and why these abilities occur, and what are the potential benefits and risks of using them. One area of research is to understand the evolutionary patterns of AI emergence across different countries and technologies. Another area of research is to test the performance of large AI models on various tasks and identify the emergent abilities they display.
Merely quantitative differences, beyond a certain point, pass into qualitative changes. - Karl Marx.
- Singularity is the hypothetical point when AI surpasses human intelligence and becomes uncontrollable and unpredictable. Some researchers fear that emergence with AI could lead to singularity, while others doubt that singularity is possible or imminent.
- AGI is the hypothetical state of AI when it can perform any intellectual task that a human can. Some researchers believe that emergence with AI is a sign of approaching AGI, while others argue that emergence with AI is not sufficient or necessary for achieving AGI.
- Emergence could be a sign or a step toward AI Artificial Consciousness / Sentience, which is the ability to experience feelings and sensations.
- Moonshots are ambitious and visionary projects that aim to solve major challenges or create breakthroughs with AI. Some researchers use emergence with AI as a metric or a goal for their AI moonshots, while others focus on more specific or practical applications of AI.
Emergence from Scale
Question: Why is the universe organized such that if you throw big blobs of compute at a wide enough distribution of data, the thing becomes intelligent?
Darlo Amodei: I think the truth is that we still don't know. It's almost entirely an empirical fact. It's a fact that you could sense from the data and from a bunch of different places but we still don't have a satisfying explanation for it. If I were to try to make one and I'm just kind of waving my hands when I say this, there's these ideas in physics around long tail or power law of correlations or effects. When a bunch of stuff happens, when you have a bunch of features, you get a lot of the data in the early fat part of the distribution before the tails. For language, this would be things like — “Oh, I figured out there are parts of speech and nouns follow verbs.” And then there are these more and more subtle correlations. So it kind of makes sense why every log or order of magnitude that you add, you capture more of the distribution. What's not clear at all is why does it scale so smoothly with parameters? Why does it scale so smoothly with the amount of data? You can think up some explanations of why it's linear. The parameters are like a bucket, and the data is like water, and so size of the bucket is proportional to size of the water. But why does it lead to all this very smooth scaling? We still don't know. There's all these explanations. Our chief scientist, Jared Kaplan did some stuff on fractal manifold dimension that you can use to explain it. So there's all kinds of ideas, but I feel like we just don't really know for sure.
Question: And by the way, for the audience who is trying to follow along. By scaling, we're referring to the fact that you can very predictably see how if you go from Claude 1 to Claude 2 that the loss in terms of whether it can predict the next token scales very smoothly. Okay, so we don't know why it's happening, but can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Okay, so we don't know why it's happening, but can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Is that at all predictable or are you just looking at the loss number?
Darlo Amodei: That is much less predictable. What's predictable is this statistical average, this loss, this entropy. And it's super predictable. It's sometimes predictable even to several significant figures which you don't see outside of physics. You don't expect to see it in this messy empirical field. But specific abilities are actually very hard to predict. Back when I was working on GPT-2 and GPT-3, when does arithmetic come in place? When do models learn to code? Sometimes it's very abrupt. It's like how you can predict statistical averages of the weather, but the weather on one particular day is very hard to predict.
Question: Dumb it down for me. I don't understand manifolds, but mechanistically, it doesn't know addition yet and suddenly now it knows addition. What has happened?
Darlo Amodei: This is another question that we don't know the answer to. We're trying to answer this with things like mechanistic interpretability. You can think about these things like circuits snapping into place. Although there is some evidence that when you look at the models being able to add things, its chance of getting the right answer shoots up all of a sudden. But if you look at what's the probability of the right answer? You'll see it climb from like one in a million to one in 100,000 to one in a 1000 long before it actually gets the right answer. In many of these cases there's some continuous process going on behind the scenes. I don't understand it at all.
Question: Does that imply that the circuit or the process for doing addition was pre-existing and it just got increased in salience?
Darlo Amodei: I don't know if there's this circuit that's weak and getting stronger. I don't know if it's something that works, but not very well. I think we don't know and these are some of the questions we're trying to answer with mechanistic interpretability.
Question: Are there abilities that won't emerge with scale?
Darlo Amodei: I definitely think that things like alignment and values are not guaranteed to emerge with scale. One way to think about it is you train the model and it's basically predicting the world, it's understanding the world. Its job is facts not values. It's trying to predict what comes next. But there's free variables here — What should you do? What should you think? What should you value? There aren't bits for that. There's just — if I started with this I should finish with this. If I started with this other thing I should finish with this other thing. And so I think that's not going to emerge.
Question: If it turns out that scaling plateaus before we reach human level intelligence, looking back on it, what would be your explanation? What do you think is likely to be the case if that turns out to be the outcome?
Darlo Amodei: I would distinguish some problem with the fundamental theory with some practical issue.
- Data: One practical issue we could have is we could run out of data. For various reasons, I think that's not going to happen but if you look at it very naively we're not that far from running out of data. So it's like we just don't have the data to continue the scaling curves.
- Compute: Another way it could happen is we just use up all of the compute that was available and that wasn't enough and then progress is slow after that.
I wouldn't bet on either of those things happening but they could. From a fundamental perspective, I personally think it's very unlikely that the scaling laws will just stop.
- Architecture: If they do, another reason could just be that we don't have quite the right architecture. If we tried to do it with an LSTM or an RNN the slope would be different. It still might be that we get there but there are some things that are just very hard to represent when you don't have the ability to attend far in the past that transformers have. If somehow we just hit a wall and it wasn’t about the architecture I'd be very surprised by that.
We're already at the point where to me the things the models can't do don't seem to be different in kind from the things they can do. You could have made a case a few years ago that they can't reason, they can't program. You could have drawn boundaries and said maybe you'll hit a wall. I didn't think we would hit a wall, a few other people didn't think we would hit a wall, but it was a more plausible case then. It's a less plausible case now. It could happen. This stuff is crazy. We could hit a wall tomorrow. If that happens my explanation would be there's something wrong with the loss when you train on next word prediction. If you really want to learn to program at a really high level, it means you care about some tokens much more than others and they're rare enough that the loss function over focuses on the appearance, the things that are responsible for the most bits of entropy, and instead they don't focus on this stuff that's really essential. So you could have the signal drowned out in the noise. I don't think it's going to play out that way for a number of reasons. But if you told me — Yes, you trained your 2024 model. It was much bigger and it just wasn't any better, and you tried every architecture and didn't work, that's the explanation I would reach for.
Question: Is there a candidate for another loss function? If you had to abandon next token prediction.
Darlo Amodei: I think then you would have to go for some kind of RL. There's many different kinds. There's RL from immune feedback, there's RL against an objective, there's things like Constitutional AI. There's things like amplification and debate. These are kind of both alignment methods and ways of training models. You would have to try a bunch of things, but the focus would have to be on what do we actually care about the model doing? In a sense, we're a little bit lucky that predict the next word gets us all these other things we need. There's no guarantee.
Question: From your worldview it seems there's a multitude of different loss functions that it's just a matter of what can allow you to just throw a whole bunch of data at it. Next token prediction itself is not significant.
Darlo Amodei: The thing with RL is you get slowed down a bit because you have to design how the loss function works by some method. The nice thing with the next token prediction is it's there for you. It's the easiest thing in the world. So I think it would slow you down if you couldn't scale in just that very simplest way.
Question: You mentioned that data is likely not to be the constraint. Why do you think that is the case?
Darlo Amodei: There's various possibilities here and for a number of reasons I shouldn't go into the details, but there's many sources of data in the world and there's many ways that you can also generate data. My guess is that this will not be a blocker. Maybe it would be better if it was, but it won't be.
Question: Are you talking about multimodal?
Darlo Amodei: There’s just many different ways to do it.
Question: How did you form your views on scaling? How far back can we go? And then you would be basically saying something similar to this.
Darlo Amodei: This view that I have formed gradually from 2014 to 2017. My first experience with it was my first experience with AI. I saw some of the early stuff around AlexNet in 2012. I always had wanted to study intelligence but before I was just like, this doesn’t seem like it’s actually working. All the way back to 2005. I'd read Ray Kurzweil’s work. I'd read even some of Eliezer (Yudkowsky)’s work on the early Internet back then. And I thought this stuff kind of looks far away. I look at the AI stuff of today and it’s not anywhere close. But with AlexNet I was like, oh, this stuff is actually starting to work. So I joined Andrew Ng’s group at Baidu. I had been in a different field and this was my first experience with AI and it was a bit different from a lot of the academic style research that was going on elsewhere in the world. I kind of got lucky in that the task that was given to me and the other folks there. It was just to make the best speech recognition system that you can. There was a lot of data available, there were a lot of GPUs available. It posed the problem in a way that was amenable to discovering that kind of scaling was a solution. That's very different from being a postdoc whose job is to come up with an idea that seems clever and new and makes your mark as someone who's invented something. I just tried the simplest experiments. I was just fiddling with some dials. I was like, try adding more layers to the RNN, try training it for longer, what happens? How long does it take to overfit? What if I add new data and repeat it less times? And I just saw these very consistent patterns. I didn't really know that this was unusual or that others weren't thinking in this way. This was almost like beginner's luck. It was my first experience with it and I didn't really think about it beyond speech recognition. I was just like, oh, I don't know anything about this field. There are zillions of things people do with machine learning. But I'm like, weird, this seems to be true in the speech recognition field. It was just before OpenAI started that I met Ilya, who you interviewed. One of the first things he said to me was — “Look. The models, they just want to learn. You have to understand this. The models, they just want to learn.” And it was a bit like a Zen Koan. I listened to this and I became enlightened.
Look. The models, they just want to learn. You have to understand this. The models, they just want to learn. - Ilya Sutskever
And over the years, I would be the one who would formalize a lot of these things and kind of put them together, but what that told me was that the phenomenon that I'd seen wasn't just some random thing. It was broad. It was more general. The models just want to learn. You get the obstacles out of their way. You give them good data, you give them enough space to operate in, you don't do something stupid like condition them badly numerically, and they want to learn. They'll do it.
Question: What I find really interesting about what you said is there were many people who were aware that these things are really good at speech recognition or at playing these constrained games. Very few extrapolated from there like you and Ilya did to something that is generally intelligent. What was different about the way you were thinking about it versus how others were thinking about it? What made you think it's getting better at speech in this consistent way, it will get better at everything in this consistent way.
Darlo Amodei: I genuinely don't know. At first when I saw it for speech, I assumed this was just true for speech or for this narrow class of models. I think it was just that over the period between 2014 and 2017, I tried it for a lot of things and saw the same thing over and over again. I watched the same being true with Dota. I watched the same being true with robotics. Many people thought that as a counterexample, but I just thought, well, it's hard to get data for robotics, but if we look within the data that we have, we see the same patterns. I think people were very focused on solving the problem in front of them. It's very hard to explain why one person thinks one way and another person thinks a different way. People just see it through a different lens. They are looking vertically instead of horizontally. They're not thinking about the scaling, they're thinking about how do I solve my problem? And for robotics, there's not enough data. That can easily abstract to — scaling doesn't work because we don't have the data. For some reason, and it may just have been random, I was obsessed with that particular direction.
Emergence from Analogies
- Generative AI ... Conversational AI ... OpenAI's ChatGPT ... Perplexity ... Microsoft's Bing ... You ...Google's Bard ... Baidu's Ernie
- Context ... the next AI frontier
- Transfer Learning
- Analogy-Making as a Complex Adaptive System | Melanie Mitchell - Los Alamos National Laboratory
- Learning to Make Analogies by Contrasting Abstract Relational Structure | F. Hill, A. Santoro, D. Barrett, A. Morcos, and T. Lillicrap - DeepMind
- AI Is Transforming Google Search. The Rest of the Web Is Next | Craig G. Karl - Wired
- AI analyzed 3.3 million scientific abstracts and discovered possible new materials | Karen Hao - MIT Technology Review
- Learning by understanding analogies | Russell Greiner - ScienceDirect
- Emergence of analogy from relation learning | H. Lu, Y. Wu, and K. Holyoak - PNAS
- Learning to Make Analogies by Contrasting Abstract Relational Structure | F. Hill, A. Santoro, D. Barrett, A. Morcos, and T. Lillicrap - DeepMond
- To Spur Innovation, Teach A.I. To Find Analogies | Byron Spice - Futurity ...A method for teaching artificial intelligence analogies through crowdsourcing could allow a computer to search data for comparisons between disparate problems and solutions, highlighting important—but potentially unrecognized—underlying similarities.
Principles of analogical reasoning have recently been applied in the context of machine learning, for example to develop new methods for classification and preference learning. In this paper, we argue that, while analogical reasoning is certainly useful for constructing new learning algorithms with high predictive accuracy, is is arguably not less interesting from an interpretability and explainability point of view. More specifically, we take the view that an analogy-based approach is a viable alternative to existing approaches in the realm of explainable AI and interpretable machine learning, and that analogy-based explanations of the predictions produced by a machine learning algorithm can complement similarity-based explanations in a meaningful way. Towards Analogy-Based Explanations in Machine Learning | Eyke Hüllermeier
Emergence & Reductionism
- Guess movie title by providing icons or emojis
- Chain of Thought: AI can generate text that follows a logical and coherent sequence of ideas, building on previous statements to form a chain of thought.
- Performing arithmetic: AI can perform basic arithmetic operations such as addition, subtraction, multiplication, and division, and can also solve more complex mathematical problems.
- Answering questions: AI can answer questions on a wide range of topics, drawing on its knowledge base to provide accurate and relevant responses.
- Summarizing passages: AI can summarize long texts, condensing the most important information into a shorter, more easily digestible form.
- Reasoning: AI can reason and make logical deductions, using its knowledge of the world and its ability to understand language to draw conclusions.
- Translating between languages: AI can translate between different languages, allowing people who speak different languages to communicate more easily.
- Generating creative content: AI can generate creative content such as poems, stories, and music, using its understanding of language and its ability to generate text that is stylistically and thematically coherent.
- Generating code: AI can generate code for different programming languages, using its understanding of programming concepts and its ability to generate syntactically correct code.
- Generating dialogue: AI can generate text that simulates a conversation between two or more people, responding to prompts in a natural and engaging way.
- Predicting the next word in a sentence: AI can predict the most likely next word in a sentence, based on its understanding of language and its analysis of the context.
- Generating text in a specific style or tone: AI can generate text that is tailored to a specific style or tone, such as formal or informal, academic or conversational.
- Generating text based on a given prompt: AI can generate text in response to a given prompt, using its knowledge of language and its ability to generate text that is relevant and informative.
- Generating text that is informative and accurate: AI can generate text that is informative and accurate, drawing on its knowledge base to provide detailed and accurate information on a wide range of topics.
- Generating text that is engaging and interesting: AI can generate text that is engaging and interesting, using its ability to generate text that is coherent and compelling.
- Generating text that is persuasive or argumentative: AI can generate text that is persuasive or argumentative, using its ability to construct arguments and present them in a convincing way.
- Generating text that is humorous or entertaining: AI can generate text that is humorous or entertaining, using its understanding of language and its ability to generate text that is witty and engaging.