Emergence

From
Revision as of 10:57, 27 November 2024 by BPeat (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

YouTube ... Quora ...Google search ...Google News ...Bing News


Emergent behavior is a phenomenon where a system exhibits new and unexpected properties or behaviors that arise from the interactions of its individual components. In AI, emergent behavior can be seen when large and complex models develop novel and surprising abilities or strategies that were not explicitly programmed or anticipated by their creator; such as some Large Language Model (LLM) can solve simple math problems, generate computer code, compose music, generate fictional stories, or decode movie titles based on emojis.



AI decoded the movie title based on these emojis, can you?


These abilities are surprising and unpredictable because they seem to have little to do with analyzing text, which is the main task of these models. Emergence with AI raises many questions about how and why these abilities occur, and what are the potential benefits and risks of using them. One area of research is to understand the evolutionary patterns of AI emergence across different countries and technologies. Another area of research is to test the performance of large AI models on various tasks and identify the emergent abilities they display.



Merely quantitative differences, beyond a certain point, pass into qualitative changes. - Karl Marx.



Emergence is related to Singularity, Artificial Consciousness / Sentience, Artificial General Intelligence (AGI), & Moonshots ...

  • Singularity is the hypothetical point when AI surpasses human intelligence and becomes uncontrollable and unpredictable. Some researchers fear that emergence with AI could lead to singularity, while others doubt that singularity is possible or imminent.
  • AGI is the hypothetical state of AI when it can perform any intellectual task that a human can. Some researchers believe that emergence with AI is a sign of approaching AGI, while others argue that emergence with AI is not sufficient or necessary for achieving AGI.
  • Emergence could be a sign or a step toward AI Artificial Consciousness / Sentience, which is the ability to experience feelings and sensations.
  • Moonshots are ambitious and visionary projects that aim to solve major challenges or create breakthroughs with AI. Some researchers use emergence with AI as a metric or a goal for their AI moonshots, while others focus on more specific or practical applications of AI.



Emergence from Scale

Excerpt from Dwarkesh Patel; The Lunar Society interview with Darlo Amodei, the CEO of Anthropic:

Question: Why is the universe organized such that if you throw big blobs of compute at a wide enough distribution of data, the thing becomes intelligent?

Darlo Amodei: I think the truth is that we still don't know. It's almost entirely an empirical fact. It's a fact that you could sense from the data and from a bunch of different places but we still don't have a satisfying explanation for it. If I were to try to make one and I'm just kind of waving my hands when I say this, there's these ideas in physics around long tail or power law of correlations or effects. When a bunch of stuff happens, when you have a bunch of features, you get a lot of the data in the early fat part of the distribution before the tails. For language, this would be things like — “Oh, I figured out there are parts of speech and nouns follow verbs.” And then there are these more and more subtle correlations. So it kind of makes sense why every log or order of magnitude that you add, you capture more of the distribution. What's not clear at all is why does it scale so smoothly with parameters? Why does it scale so smoothly with the amount of data? You can think up some explanations of why it's linear. The parameters are like a bucket, and the data is like water, and so size of the bucket is proportional to size of the water. But why does it lead to all this very smooth scaling? We still don't know. There's all these explanations. Our chief scientist, Jared Kaplan did some stuff on fractal manifold dimension that you can use to explain it. So there's all kinds of ideas, but I feel like we just don't really know for sure.

Question: And by the way, for the audience who is trying to follow along. By scaling, we're referring to the fact that you can very predictably see how if you go from Claude 1 to Claude 2 that the loss in terms of whether it can predict the next token scales very smoothly. Okay, so we don't know why it's happening, but can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Okay, so we don't know why it's happening, but can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Is that at all predictable or are you just looking at the loss number?

Darlo Amodei: That is much less predictable. What's predictable is this statistical average, this loss, this entropy. And it's super predictable. It's sometimes predictable even to several significant figures which you don't see outside of physics. You don't expect to see it in this messy empirical field. But specific abilities are actually very hard to predict. Back when I was working on GPT-2 and GPT-3, when does arithmetic come in place? When do models learn to code? Sometimes it's very abrupt. It's like how you can predict statistical averages of the weather, but the weather on one particular day is very hard to predict.

Question: Dumb it down for me. I don't understand manifolds, but mechanistically, it doesn't know addition yet and suddenly now it knows addition. What has happened?

Darlo Amodei: This is another question that we don't know the answer to. We're trying to answer this with things like mechanistic interpretability. You can think about these things like circuits snapping into place. Although there is some evidence that when you look at the models being able to add things, its chance of getting the right answer shoots up all of a sudden. But if you look at what's the probability of the right answer? You'll see it climb from like one in a million to one in 100,000 to one in a 1000 long before it actually gets the right answer. In many of these cases there's some continuous process going on behind the scenes. I don't understand it at all.

Question: Does that imply that the circuit or the process for doing addition was pre-existing and it just got increased in salience?

Darlo Amodei: I don't know if there's this circuit that's weak and getting stronger. I don't know if it's something that works, but not very well. I think we don't know and these are some of the questions we're trying to answer with mechanistic interpretability.

Question: Are there abilities that won't emerge with scale?

Darlo Amodei: I definitely think that things like alignment and values are not guaranteed to emerge with scale. One way to think about it is you train the model and it's basically predicting the world, it's understanding the world. Its job is facts not values. It's trying to predict what comes next. But there's free variables here — What should you do? What should you think? What should you value? There aren't bits for that. There's just — if I started with this I should finish with this. If I started with this other thing I should finish with this other thing. And so I think that's not going to emerge.

Question: If it turns out that scaling plateaus before we reach human level intelligence, looking back on it, what would be your explanation? What do you think is likely to be the case if that turns out to be the outcome?

Darlo Amodei: I would distinguish some problem with the fundamental theory with some practical issue.

  • Data: One practical issue we could have is we could run out of data. For various reasons, I think that's not going to happen but if you look at it very naively we're not that far from running out of data. So it's like we just don't have the data to continue the scaling curves.
  • Compute: Another way it could happen is we just use up all of the compute that was available and that wasn't enough and then progress is slow after that.


I wouldn't bet on either of those things happening but they could. From a fundamental perspective, I personally think it's very unlikely that the scaling laws will just stop.


  • Architecture: If they do, another reason could just be that we don't have quite the right architecture. If we tried to do it with an LSTM or an RNN the slope would be different. It still might be that we get there but there are some things that are just very hard to represent when you don't have the ability to attend far in the past that transformers have. If somehow we just hit a wall and it wasn’t about the architecture I'd be very surprised by that.

We're already at the point where to me the things the models can't do don't seem to be different in kind from the things they can do. You could have made a case a few years ago that they can't reason, they can't program. You could have drawn boundaries and said maybe you'll hit a wall. I didn't think we would hit a wall, a few other people didn't think we would hit a wall, but it was a more plausible case then. It's a less plausible case now. It could happen. This stuff is crazy. We could hit a wall tomorrow. If that happens my explanation would be there's something wrong with the loss when you train on next word prediction. If you really want to learn to program at a really high level, it means you care about some tokens much more than others and they're rare enough that the loss function over focuses on the appearance, the things that are responsible for the most bits of entropy, and instead they don't focus on this stuff that's really essential. So you could have the signal drowned out in the noise. I don't think it's going to play out that way for a number of reasons. But if you told me — Yes, you trained your 2024 model. It was much bigger and it just wasn't any better, and you tried every architecture and didn't work, that's the explanation I would reach for.

Question: Is there a candidate for another loss function? If you had to abandon next token prediction.

Darlo Amodei: I think then you would have to go for some kind of RL. There's many different kinds. There's RL from immune feedback, there's RL against an objective, there's things like Constitutional AI. There's things like amplification and debate. These are kind of both alignment methods and ways of training models. You would have to try a bunch of things, but the focus would have to be on what do we actually care about the model doing? In a sense, we're a little bit lucky that predict the next word gets us all these other things we need. There's no guarantee.

Question: From your worldview it seems there's a multitude of different loss functions that it's just a matter of what can allow you to just throw a whole bunch of data at it. Next token prediction itself is not significant.

Darlo Amodei: The thing with RL is you get slowed down a bit because you have to design how the loss function works by some method. The nice thing with the next token prediction is it's there for you. It's the easiest thing in the world. So I think it would slow you down if you couldn't scale in just that very simplest way.

Question: You mentioned that data is likely not to be the constraint. Why do you think that is the case?

Darlo Amodei: There's various possibilities here and for a number of reasons I shouldn't go into the details, but there's many sources of data in the world and there's many ways that you can also generate data. My guess is that this will not be a blocker. Maybe it would be better if it was, but it won't be.

Question: Are you talking about multimodal?

Darlo Amodei: There’s just many different ways to do it.

Question: How did you form your views on scaling? How far back can we go? And then you would be basically saying something similar to this.

Darlo Amodei: This view that I have formed gradually from 2014 to 2017. My first experience with it was my first experience with AI. I saw some of the early stuff around AlexNet in 2012. I always had wanted to study intelligence but before I was just like, this doesn’t seem like it’s actually working. All the way back to 2005. I'd read Ray Kurzweil’s work. I'd read even some of Eliezer (Yudkowsky)’s work on the early Internet back then. And I thought this stuff kind of looks far away. I look at the AI stuff of today and it’s not anywhere close. But with AlexNet I was like, oh, this stuff is actually starting to work. So I joined Andrew Ng’s group at Baidu. I had been in a different field and this was my first experience with AI and it was a bit different from a lot of the academic style research that was going on elsewhere in the world. I kind of got lucky in that the task that was given to me and the other folks there. It was just to make the best speech recognition system that you can. There was a lot of data available, there were a lot of GPUs available. It posed the problem in a way that was amenable to discovering that kind of scaling was a solution. That's very different from being a postdoc whose job is to come up with an idea that seems clever and new and makes your mark as someone who's invented something. I just tried the simplest experiments. I was just fiddling with some dials. I was like, try adding more layers to the RNN, try training it for longer, what happens? How long does it take to overfit? What if I add new data and repeat it less times? And I just saw these very consistent patterns. I didn't really know that this was unusual or that others weren't thinking in this way. This was almost like beginner's luck. It was my first experience with it and I didn't really think about it beyond speech recognition. I was just like, oh, I don't know anything about this field. There are zillions of things people do with machine learning. But I'm like, weird, this seems to be true in the speech recognition field. It was just before OpenAI started that I met Ilya, who you interviewed. One of the first things he said to me was — “Look. The models, they just want to learn. You have to understand this. The models, they just want to learn.” And it was a bit like a Zen Koan. I listened to this and I became enlightened.



Look. The models, they just want to learn. You have to understand this. The models, they just want to learn. - Ilya Sutskever



And over the years, I would be the one who would formalize a lot of these things and kind of put them together, but what that told me was that the phenomenon that I'd seen wasn't just some random thing. It was broad. It was more general. The models just want to learn. You get the obstacles out of their way. You give them good data, you give them enough space to operate in, you don't do something stupid like condition them badly numerically, and they want to learn. They'll do it.

Question: What I find really interesting about what you said is there were many people who were aware that these things are really good at speech recognition or at playing these constrained games. Very few extrapolated from there like you and Ilya did to something that is generally intelligent. What was different about the way you were thinking about it versus how others were thinking about it? What made you think it's getting better at speech in this consistent way, it will get better at everything in this consistent way.

Darlo Amodei: I genuinely don't know. At first when I saw it for speech, I assumed this was just true for speech or for this narrow class of models. I think it was just that over the period between 2014 and 2017, I tried it for a lot of things and saw the same thing over and over again. I watched the same being true with Dota. I watched the same being true with robotics. Many people thought that as a counterexample, but I just thought, well, it's hard to get data for robotics, but if we look within the data that we have, we see the same patterns. I think people were very focused on solving the problem in front of them. It's very hard to explain why one person thinks one way and another person thinks a different way. People just see it through a different lens. They are looking vertically instead of horizontally. They're not thinking about the scaling, they're thinking about how do I solve my problem? And for robotics, there's not enough data. That can easily abstract to — scaling doesn't work because we don't have the data. For some reason, and it may just have been random, I was obsessed with that particular direction.

Emergence from Analogies

Youtube search... ...Google search

Principles of analogical reasoning have recently been applied in the context of machine learning, for example to develop new methods for classification and preference learning. In this paper, we argue that, while analogical reasoning is certainly useful for constructing new learning algorithms with high predictive accuracy, is is arguably not less interesting from an interpretability and explainability point of view. More specifically, we take the view that an analogy-based approach is a viable alternative to existing approaches in the realm of explainable AI and interpretable machine learning, and that analogy-based explanations of the predictions produced by a machine learning algorithm can complement similarity-based explanations in a meaningful way. Towards Analogy-Based Explanations in Machine Learning | Eyke Hüllermeier

Analogies
This video is part of the Udacity course "Deep Learning". Watch the full course at https://www.udacity.com/course/ud730

Complexity Concepts, Abstraction, & Analogy in Natural and Artificial Intelligence, Melanie Mitchell
Complexity Concepts, Abstraction, & Analogy in Natural and Artificial Intelligence a talk by Melanie Mitchell at the GoodAI Meta-Learning & Multi-Agent Learning Workshop. See other talks from the workshop

Conceptual Abstraction and Analogy in Natural and Artificial Intelligence
Melanie Mitchell, Santa Fe Institute; Portland State University While AI has made dramatic progress over the last decade in areas such as vision, natural language processing, and game-playing, current AI systems still wholly lack the abilities to create humanlike conceptual abstractions and analogies. It can be argued that the lack of humanlike concepts in AI systems is the cause of their brittleness—the inability to reliably transfer knowledge to new situations—as well as their vulnerability to adversarial attacks. Much AI research on conceptual abstraction and analogy has used visual-IQ-like tests or other idealized domains as arenas for developing and evaluating AI systems, and in several of these tasks AI systems have performed surprisingly well, in some cases outperforming humans. In this talk I will review some very recent (and some much older) work along these lines, and discuss the following questions: Do these domains actually require abilities that will transfer and scale to real-world tasks? And what are the systems that succeed on these idealized domains actually learning?

Melanie Mitchell: "Can Analogy Unlock AI’s Barrier of Meaning?"
UCSB College of Engineering Speaker Bio: Melanie Mitchell is the Davis Professor of Complexity at the Santa Fe Institute and Professor of Computer Science (currently on leave) at Portland State University. Her current research focuses on conceptual abstraction, analogy-making, and visual recognition in artificial intelligence systems. She is the author or editor of six books and numerous scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems. Her latest book is Artificial Intelligence: A Guide for Thinking Humans. Abstract: In 1986, the mathematician and philosopher Gian-Carlo Rota wrote, “I wonder whether or when artificial intelligence will ever crash the barrier of meaning.” Here, the phrase “barrier of meaning” refers to a belief about humans versus machines: humans are able to “actually understand” the situations they encounter, whereas it can be argued that AI systems (at least current ones) do not possess such understanding. Some cognitive scientists have proposed that analogy-making is a central mechanism for concept formation and concept understanding in humans. Douglas Hofstadter called analogy-making “the core of cognition”, and Hofstadter and co-author Emmanuel Sander noted, “Without concepts there can be no thought, and without analogies there can be no concepts.” In this talk I will reflect on the role played by analogy-making at all levels of intelligence, and on how analogy-making abilities will be central in developing AI systems with humanlike intelligence.

Emergence & Reductionism

Youtube search...


d28c9f7e871b310be12f57746c77c7f2.jpg

Examples

  • Guess movie title by providing icons or emojis
  • Chain of Thought: AI can generate text that follows a logical and coherent sequence of ideas, building on previous statements to form a chain of thought.
  • Performing arithmetic: AI can perform basic arithmetic operations such as addition, subtraction, multiplication, and division, and can also solve more complex mathematical problems.
  • Answering questions: AI can answer questions on a wide range of topics, drawing on its knowledge base to provide accurate and relevant responses.
  • Summarizing passages: AI can summarize long texts, condensing the most important information into a shorter, more easily digestible form.
  • Reasoning: AI can reason and make logical deductions, using its knowledge of the world and its ability to understand language to draw conclusions.
  • Translating between languages: AI can translate between different languages, allowing people who speak different languages to communicate more easily.
  • Generating creative content: AI can generate creative content such as poems, stories, and music, using its understanding of language and its ability to generate text that is stylistically and thematically coherent.
  • Generating code: AI can generate code for different programming languages, using its understanding of programming concepts and its ability to generate syntactically correct code.
  • Generating dialogue: AI can generate text that simulates a conversation between two or more people, responding to prompts in a natural and engaging way.
  • Predicting the next word in a sentence: AI can predict the most likely next word in a sentence, based on its understanding of language and its analysis of the context.
  • Generating text in a specific style or tone: AI can generate text that is tailored to a specific style or tone, such as formal or informal, academic or conversational.
  • Generating text based on a given prompt: AI can generate text in response to a given prompt, using its knowledge of language and its ability to generate text that is relevant and informative.
  • Generating text that is informative and accurate: AI can generate text that is informative and accurate, drawing on its knowledge base to provide detailed and accurate information on a wide range of topics.
  • Generating text that is engaging and interesting: AI can generate text that is engaging and interesting, using its ability to generate text that is coherent and compelling.
  • Generating text that is persuasive or argumentative: AI can generate text that is persuasive or argumentative, using its ability to construct arguments and present them in a convincing way.
  • Generating text that is humorous or entertaining: AI can generate text that is humorous or entertaining, using its understanding of language and its ability to generate text that is witty and engaging.