Latest revision as of 15:14, 5 January 2026

YouTube ... Quora ...Google search ...Google News ...Bing News

Artificial General Intelligence (AGI) to Singularity ... Curious Reasoning ... Emergence ... Moonshots ... Explainable AI ... Automated Learning
Large Language Model (LLM) ... Multimodal ... Foundation Models (FM) ... Generative Pre-trained ... Transformer ... Attention ... GAN ... BERT
Perspective ... Context ... In-Context Learning (ICL) ... Transfer Learning ... Out-of-Distribution (OOD) Generalization
Causation vs. Correlation ... Autocorrelation ...Convolution vs. Cross-Correlation (Autocorrelation)
Auto-GPT ... autonomously use the results it generates to create new prompts, chaining these operations together to complete complex tasks.
Immersive Reality ... Metaverse ... Omniverse ... Transhumanism ... Religion
Life~Meaning ... Consciousness ... Creating Consciousness ... Quantum Biology ... Orch-OR ... TAME ... Proteins
Telecommunications ... Computer Networks ... 5G ... Satellite Communications ... Quantum Communications ... Communication Agents ... Smart Cities ... Digital Twin ... Internet of Things (IoT)
Embodied AI ... Embodiment Hypothesis: intelligence emerges in the interaction of an agent with an environment
Artificial Intelligence (AI) ... Generative AI ... Machine Learning (ML) ... Deep Learning ... Neural Network ... Reinforcement ... Learning Techniques
Conversational AI ... ChatGPT | OpenAI ... Bing/Copilot | Microsoft ... Gemini | Google ... Claude | Anthropic ... Perplexity ... You ... phind ... Ernie | Baidu
Creatives ... History of Artificial Intelligence (AI) ... Neural Network History ... Rewriting Past, Shape our Future ... Archaeology ... Paleontology
Analytics ... Visualization ... Graphical Tools ... Diagrams & Business Analysis ... Requirements ... Loop ... Bayes ... Network Pattern
Multi-Loop Learning
Exponential Progression
Chain of Thought (CoT) ... Tree of Thoughts (ToT)
Gaming ... Game-Based Learning (GBL) ... Security ... Generative AI ... Games - Metaverse ... Quantum ... Game Theory ... Design
Emergent Gameplay
Emergence | Wikipedia
This Strange Rule Is What Makes the Human Brain So Powerful | Shelly Fan - SingularityHub
A New Capability Maturity Model for Deep Learning | Carlos E. Perez - Intuition Machine
Google's AI shocks engineers by learning new language without human assistance | Vinay Patel - International Business Times (IBT) ... The AI was able to successfully learn the language of Bangladesh, Bengali although it wasn't trained to do so.
Emergent Abilities of Large Language Models | Ryan O'Connor - AssembyAI
137 emergent abilities of large language models | Jason Wei
Enablement And Radical Emergence | Stuart Kauffman - NPR
Google explores emergent abilities in large AI models | Maximilian Schreiner - The Decoder
The Unpredictable Abilities Emerging From Large AI Models | Stephen Ornes Quanta Magazine
'Emergent Abilities': When AI LLMs Learn Stuff They Shouldn't Know | David Ramel - Virtualization Review
10 Novel Insights about Undecidability | Carlos E. Perez - Intuition Machine - Medium ... undecidability is not the limitation preventing the performance but the open stage enabling the play

Emergent behavior is a phenomenon where a system exhibits new and unexpected properties or behaviors that arise from the interactions of its individual components. In AI, emergent behavior can be seen when large and complex models develop novel and surprising abilities or strategies that were not explicitly programmed or anticipated by their creator; such as some Large Language Model (LLM) can solve simple math problems, generate computer code, compose music, generate fictional stories, or decode movie titles based on emojis.

Emergence is a placeholder for human ignorance, used when researchers fail to account for all variables in a complex system.

These abilities are surprising and unpredictable because they seem to have little to do with analyzing text, which is the main task of these models. Emergence with AI raises many questions about how and why these abilities occur, and what are the potential benefits and risks of using them. One area of research is to understand the evolutionary patterns of AI emergence across different countries and technologies. Another area of research is to test the performance of large AI models on various tasks and identify the emergent abilities they display.

Merely quantitative differences, beyond a certain point, pass into qualitative changes. - Karl Marx.

Singularity is the hypothetical point when AI surpasses human intelligence and becomes uncontrollable and unpredictable. Some researchers fear that emergence with AI could lead to singularity, while others doubt that singularity is possible or imminent.
AGI is the hypothetical state of AI when it can perform any intellectual task that a human can. Some researchers believe that emergence with AI is a sign of approaching AGI, while others argue that emergence with AI is not sufficient or necessary for achieving AGI.
Emergence could be a sign or a step toward AI Artificial Consciousness / Sentience, which is the ability to experience feelings and sensations.
Moonshots are ambitious and visionary projects that aim to solve major challenges or create breakthroughs with AI. Some researchers use emergence with AI as a metric or a goal for their AI moonshots, while others focus on more specific or practical applications of AI.

Emergence from Scale

Claude 2 | Anthropic

Excerpt from Dwarkesh Patel; The Lunar Society interview with Darlo Amodei, the CEO of Anthropic:

Question: Why is the universe organized such that if you throw big blobs of compute at a wide enough distribution of data, the thing becomes intelligent?

Darlo Amodei: I think the truth is that we still don't know. It's almost entirely an empirical fact. It's a fact that you could sense from the data and from a bunch of different places but we still don't have a satisfying explanation for it. If I were to try to make one and I'm just kind of waving my hands when I say this, there's these ideas in physics around long tail or power law of correlations or effects. When a bunch of stuff happens, when you have a bunch of features, you get a lot of the data in the early fat part of the distribution before the tails. For language, this would be things like — “Oh, I figured out there are parts of speech and nouns follow verbs.” And then there are these more and more subtle correlations. So it kind of makes sense why every log or order of magnitude that you add, you capture more of the distribution. What's not clear at all is why does it scale so smoothly with parameters? Why does it scale so smoothly with the amount of data? You can think up some explanations of why it's linear. The parameters are like a bucket, and the data is like water, and so size of the bucket is proportional to size of the water. But why does it lead to all this very smooth scaling? We still don't know. There's all these explanations. Our chief scientist, Jared Kaplan did some stuff on fractal manifold dimension that you can use to explain it. So there's all kinds of ideas, but I feel like we just don't really know for sure.

Question: And by the way, for the audience who is trying to follow along. By scaling, we're referring to the fact that you can very predictably see how if you go from Claude 1 to Claude 2 that the loss in terms of whether it can predict the next token scales very smoothly. Okay, so we don't know why it's happening, but can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Okay, so we don't know why it's happening, but can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Is that at all predictable or are you just looking at the loss number?

Darlo Amodei: That is much less predictable. What's predictable is this statistical average, this loss, this entropy. And it's super predictable. It's sometimes predictable even to several significant figures which you don't see outside of physics. You don't expect to see it in this messy empirical field. But specific abilities are actually very hard to predict. Back when I was working on GPT-2 and GPT-3, when does arithmetic come in place? When do models learn to code? Sometimes it's very abrupt. It's like how you can predict statistical averages of the weather, but the weather on one particular day is very hard to predict.

Question: Dumb it down for me. I don't understand manifolds, but mechanistically, it doesn't know addition yet and suddenly now it knows addition. What has happened?

Darlo Amodei: This is another question that we don't know the answer to. We're trying to answer this with things like mechanistic interpretability. You can think about these things like circuits snapping into place. Although there is some evidence that when you look at the models being able to add things, its chance of getting the right answer shoots up all of a sudden. But if you look at what's the probability of the right answer? You'll see it climb from like one in a million to one in 100,000 to one in a 1000 long before it actually gets the right answer. In many of these cases there's some continuous process going on behind the scenes. I don't understand it at all.

Question: Does that imply that the circuit or the process for doing addition was pre-existing and it just got increased in salience?

Darlo Amodei: I don't know if there's this circuit that's weak and getting stronger. I don't know if it's something that works, but not very well. I think we don't know and these are some of the questions we're trying to answer with mechanistic interpretability.

Question: Are there abilities that won't emerge with scale?

Darlo Amodei: I definitely think that things like alignment and values are not guaranteed to emerge with scale. One way to think about it is you train the model and it's basically predicting the world, it's understanding the world. Its job is facts not values. It's trying to predict what comes next. But there's free variables here — What should you do? What should you think? What should you value? There aren't bits for that. There's just — if I started with this I should finish with this. If I started with this other thing I should finish with this other thing. And so I think that's not going to emerge.

Question: If it turns out that scaling plateaus before we reach human level intelligence, looking back on it, what would be your explanation? What do you think is likely to be the case if that turns out to be the outcome?

Darlo Amodei: I would distinguish some problem with the fundamental theory with some practical issue.

Data: One practical issue we could have is we could run out of data. For various reasons, I think that's not going to happen but if you look at it very naively we're not that far from running out of data. So it's like we just don't have the data to continue the scaling curves.
Compute: Another way it could happen is we just use up all of the compute that was available and that wasn't enough and then progress is slow after that.

I wouldn't bet on either of those things happening but they could. From a fundamental perspective, I personally think it's very unlikely that the scaling laws will just stop.

Architecture: If they do, another reason could just be that we don't have quite the right architecture. If we tried to do it with an LSTM or an RNN the slope would be different. It still might be that we get there but there are some things that are just very hard to represent when you don't have the ability to attend far in the past that transformers have. If somehow we just hit a wall and it wasn’t about the architecture I'd be very surprised by that.

We're already at the point where to me the things the models can't do don't seem to be different in kind from the things they can do. You could have made a case a few years ago that they can't reason, they can't program. You could have drawn boundaries and said maybe you'll hit a wall. I didn't think we would hit a wall, a few other people didn't think we would hit a wall, but it was a more plausible case then. It's a less plausible case now. It could happen. This stuff is crazy. We could hit a wall tomorrow. If that happens my explanation would be there's something wrong with the loss when you train on next word prediction. If you really want to learn to program at a really high level, it means you care about some tokens much more than others and they're rare enough that the loss function over focuses on the appearance, the things that are responsible for the most bits of entropy, and instead they don't focus on this stuff that's really essential. So you could have the signal drowned out in the noise. I don't think it's going to play out that way for a number of reasons. But if you told me — Yes, you trained your 2024 model. It was much bigger and it just wasn't any better, and you tried every architecture and didn't work, that's the explanation I would reach for.

Question: Is there a candidate for another loss function? If you had to abandon next token prediction.

Darlo Amodei: I think then you would have to go for some kind of RL. There's many different kinds. There's RL from immune feedback, there's RL against an objective, there's things like Constitutional AI. There's things like amplification and debate. These are kind of both alignment methods and ways of training models. You would have to try a bunch of things, but the focus would have to be on what do we actually care about the model doing? In a sense, we're a little bit lucky that predict the next word gets us all these other things we need. There's no guarantee.

Question: From your worldview it seems there's a multitude of different loss functions that it's just a matter of what can allow you to just throw a whole bunch of data at it. Next token prediction itself is not significant.

Darlo Amodei: The thing with RL is you get slowed down a bit because you have to design how the loss function works by some method. The nice thing with the next token prediction is it's there for you. It's the easiest thing in the world. So I think it would slow you down if you couldn't scale in just that very simplest way.

Question: You mentioned that data is likely not to be the constraint. Why do you think that is the case?

Darlo Amodei: There's various possibilities here and for a number of reasons I shouldn't go into the details, but there's many sources of data in the world and there's many ways that you can also generate data. My guess is that this will not be a blocker. Maybe it would be better if it was, but it won't be.

Question: Are you talking about multimodal?

Darlo Amodei: There’s just many different ways to do it.

Question: How did you form your views on scaling? How far back can we go? And then you would be basically saying something similar to this.

Darlo Amodei: This view that I have formed gradually from 2014 to 2017. My first experience with it was my first experience with AI. I saw some of the early stuff around AlexNet in 2012. I always had wanted to study intelligence but before I was just like, this doesn’t seem like it’s actually working. All the way back to 2005. I'd read Ray Kurzweil’s work. I'd read even some of Eliezer (Yudkowsky)’s work on the early Internet back then. And I thought this stuff kind of looks far away. I look at the AI stuff of today and it’s not anywhere close. But with AlexNet I was like, oh, this stuff is actually starting to work. So I joined Andrew Ng’s group at Baidu. I had been in a different field and this was my first experience with AI and it was a bit different from a lot of the academic style research that was going on elsewhere in the world. I kind of got lucky in that the task that was given to me and the other folks there. It was just to make the best speech recognition system that you can. There was a lot of data available, there were a lot of GPUs available. It posed the problem in a way that was amenable to discovering that kind of scaling was a solution. That's very different from being a postdoc whose job is to come up with an idea that seems clever and new and makes your mark as someone who's invented something. I just tried the simplest experiments. I was just fiddling with some dials. I was like, try adding more layers to the RNN, try training it for longer, what happens? How long does it take to overfit? What if I add new data and repeat it less times? And I just saw these very consistent patterns. I didn't really know that this was unusual or that others weren't thinking in this way. This was almost like beginner's luck. It was my first experience with it and I didn't really think about it beyond speech recognition. I was just like, oh, I don't know anything about this field. There are zillions of things people do with machine learning. But I'm like, weird, this seems to be true in the speech recognition field. It was just before OpenAI started that I met Ilya, who you interviewed. One of the first things he said to me was — “Look. The models, they just want to learn. You have to understand this. The models, they just want to learn.” And it was a bit like a Zen Koan. I listened to this and I became enlightened.

Look. The models, they just want to learn. You have to understand this. The models, they just want to learn. - Ilya Sutskever

And over the years, I would be the one who would formalize a lot of these things and kind of put them together, but what that told me was that the phenomenon that I'd seen wasn't just some random thing. It was broad. It was more general. The models just want to learn. You get the obstacles out of their way. You give them good data, you give them enough space to operate in, you don't do something stupid like condition them badly numerically, and they want to learn. They'll do it.

Question: What I find really interesting about what you said is there were many people who were aware that these things are really good at speech recognition or at playing these constrained games. Very few extrapolated from there like you and Ilya did to something that is generally intelligent. What was different about the way you were thinking about it versus how others were thinking about it? What made you think it's getting better at speech in this consistent way, it will get better at everything in this consistent way.

Darlo Amodei: I genuinely don't know. At first when I saw it for speech, I assumed this was just true for speech or for this narrow class of models. I think it was just that over the period between 2014 and 2017, I tried it for a lot of things and saw the same thing over and over again. I watched the same being true with Dota. I watched the same being true with robotics. Many people thought that as a counterexample, but I just thought, well, it's hard to get data for robotics, but if we look within the data that we have, we see the same patterns. I think people were very focused on solving the problem in front of them. It's very hard to explain why one person thinks one way and another person thinks a different way. People just see it through a different lens. They are looking vertically instead of horizontally. They're not thinking about the scaling, they're thinking about how do I solve my problem? And for robotics, there's not enough data. That can easily abstract to — scaling doesn't work because we don't have the data. For some reason, and it may just have been random, I was obsessed with that particular direction.

Emergence from Analogies

Youtube search... ...Google search

Generative AI ... Conversational AI ... OpenAI's ChatGPT ... Perplexity ... Microsoft's Bing ... You ...Google's Bard ... Baidu's Ernie
Context ... the next AI frontier
Transfer Learning
Analogy-Making as a Complex Adaptive System | Melanie Mitchell - Los Alamos National Laboratory
Learning to Make Analogies by Contrasting Abstract Relational Structure | F. Hill, A. Santoro, D. Barrett, A. Morcos, and T. Lillicrap - DeepMind
AI Is Transforming Google Search. The Rest of the Web Is Next | Craig G. Karl - Wired
AI analyzed 3.3 million scientific abstracts and discovered possible new materials | Karen Hao - MIT Technology Review
Learning by understanding analogies | Russell Greiner - ScienceDirect
Emergence of analogy from relation learning | H. Lu, Y. Wu, and K. Holyoak - PNAS
Learning to Make Analogies by Contrasting Abstract Relational Structure | F. Hill, A. Santoro, D. Barrett, A. Morcos, and T. Lillicrap - DeepMond
To Spur Innovation, Teach A.I. To Find Analogies | Byron Spice - Futurity ...A method for teaching artificial intelligence analogies through crowdsourcing could allow a computer to search data for comparisons between disparate problems and solutions, highlighting important—but potentially unrecognized—underlying similarities.

Principles of analogical reasoning have recently been applied in the context of machine learning, for example to develop new methods for classification and preference learning. In this paper, we argue that, while analogical reasoning is certainly useful for constructing new learning algorithms with high predictive accuracy, is is arguably not less interesting from an interpretability and explainability point of view. More specifically, we take the view that an analogy-based approach is a viable alternative to existing approaches in the realm of explainable AI and interpretable machine learning, and that analogy-based explanations of the predictions produced by a machine learning algorithm can complement similarity-based explanations in a meaningful way. Towards Analogy-Based Explanations in Machine Learning | Eyke Hüllermeier

Analogies This video is part of the Udacity course "Deep Learning". Watch the full course at https://www.udacity.com/course/ud730

Complexity Concepts, Abstraction, & Analogy in Natural and Artificial Intelligence, Melanie Mitchell Complexity Concepts, Abstraction, & Analogy in Natural and Artificial Intelligence a talk by Melanie Mitchell at the GoodAI Meta-Learning & Multi-Agent Learning Workshop. See other talks from the workshop

Conceptual Abstraction and Analogy in Natural and Artificial Intelligence Melanie Mitchell, Santa Fe Institute; Portland State University While AI has made dramatic progress over the last decade in areas such as vision, natural language processing, and game-playing, current AI systems still wholly lack the abilities to create humanlike conceptual abstractions and analogies. It can be argued that the lack of humanlike concepts in AI systems is the cause of their brittleness—the inability to reliably transfer knowledge to new situations—as well as their vulnerability to adversarial attacks. Much AI research on conceptual abstraction and analogy has used visual-IQ-like tests or other idealized domains as arenas for developing and evaluating AI systems, and in several of these tasks AI systems have performed surprisingly well, in some cases outperforming humans. In this talk I will review some very recent (and some much older) work along these lines, and discuss the following questions: Do these domains actually require abilities that will transfer and scale to real-world tasks? And what are the systems that succeed on these idealized domains actually learning?

Melanie Mitchell: "Can Analogy Unlock AI’s Barrier of Meaning?" UCSB College of Engineering Speaker Bio: Melanie Mitchell is the Davis Professor of Complexity at the Santa Fe Institute and Professor of Computer Science (currently on leave) at Portland State University. Her current research focuses on conceptual abstraction, analogy-making, and visual recognition in artificial intelligence systems. She is the author or editor of six books and numerous scholarly papers in the fields of artificial intelligence, cognitive science, and complex systems. Her latest book is Artificial Intelligence: A Guide for Thinking Humans. Abstract: In 1986, the mathematician and philosopher Gian-Carlo Rota wrote, “I wonder whether or when artificial intelligence will ever crash the barrier of meaning.” Here, the phrase “barrier of meaning” refers to a belief about humans versus machines: humans are able to “actually understand” the situations they encounter, whereas it can be argued that AI systems (at least current ones) do not possess such understanding. Some cognitive scientists have proposed that analogy-making is a central mechanism for concept formation and concept understanding in humans. Douglas Hofstadter called analogy-making “the core of cognition”, and Hofstadter and co-author Emmanuel Sander noted, “Without concepts there can be no thought, and without analogies there can be no concepts.” In this talk I will reflect on the role played by analogy-making at all levels of intelligence, and on how analogy-making abilities will be central in developing AI systems with humanlike intelligence.

Emergence & Reductionism

Youtube search...

Examples

Guess movie title by providing icons or emojis
Chain of Thought: AI can generate text that follows a logical and coherent sequence of ideas, building on previous statements to form a chain of thought.
Performing arithmetic: AI can perform basic arithmetic operations such as addition, subtraction, multiplication, and division, and can also solve more complex mathematical problems.
Answering questions: AI can answer questions on a wide range of topics, drawing on its knowledge base to provide accurate and relevant responses.
Summarizing passages: AI can summarize long texts, condensing the most important information into a shorter, more easily digestible form.
Reasoning: AI can reason and make logical deductions, using its knowledge of the world and its ability to understand language to draw conclusions.
Translating between languages: AI can translate between different languages, allowing people who speak different languages to communicate more easily.
Generating creative content: AI can generate creative content such as poems, stories, and music, using its understanding of language and its ability to generate text that is stylistically and thematically coherent.
Generating code: AI can generate code for different programming languages, using its understanding of programming concepts and its ability to generate syntactically correct code.
Generating dialogue: AI can generate text that simulates a conversation between two or more people, responding to prompts in a natural and engaging way.
Predicting the next word in a sentence: AI can predict the most likely next word in a sentence, based on its understanding of language and its analysis of the context.
Generating text in a specific style or tone: AI can generate text that is tailored to a specific style or tone, such as formal or informal, academic or conversational.
Generating text based on a given prompt: AI can generate text in response to a given prompt, using its knowledge of language and its ability to generate text that is relevant and informative.
Generating text that is informative and accurate: AI can generate text that is informative and accurate, drawing on its knowledge base to provide detailed and accurate information on a wide range of topics.
Generating text that is engaging and interesting: AI can generate text that is engaging and interesting, using its ability to generate text that is coherent and compelling.
Generating text that is persuasive or argumentative: AI can generate text that is persuasive or argumentative, using its ability to construct arguments and present them in a convincing way.
Generating text that is humorous or entertaining: AI can generate text that is humorous or entertaining, using its understanding of language and its ability to generate text that is witty and engaging.

@@ Line 2: / Line 2: @@
 |title=PRIMO.ai
 |titlemode=append
-|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
+|keywords=ChatGPT, artificial, intelligence, machine, learning, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
 <!-- Google tag (gtag.js) -->
@@ Line 21: / Line 21: @@
 * [[Artificial General Intelligence (AGI) to Singularity]] ... [[Inside Out - Curious Optimistic Reasoning| Curious Reasoning]] ... [[Emergence]] ... [[Moonshots]] ... [[Explainable / Interpretable AI|Explainable AI]] ...  [[Algorithm Administration#Automated Learning|Automated Learning]]
-* [[Large Language Model (LLM)]] ... [[Large Language Model (LLM)#Multimodal|Multimodal]] ... [[Foundation Models (FM)]] ... [[Generative Pre-trained Transformer (GPT)|Generative Pre-trained]] ... [[Transformer]] ... [[GPT-4]] ... [[GPT-5]] ... [[Attention]] ... [[Generative Adversarial Network (GAN)|GAN]] ... [[Bidirectional Encoder Representations from Transformers (BERT)|BERT]]
+* [[Large Language Model (LLM)]] ... [[Large Language Model (LLM)#Multimodal|Multimodal]] ... [[Foundation Models (FM)]] ... [[Generative Pre-trained Transformer (GPT)|Generative Pre-trained]] ... [[Transformer]] ... [[Attention]] ... [[Generative Adversarial Network (GAN)|GAN]] ... [[Bidirectional Encoder Representations from Transformers (BERT)|BERT]]
-* [[In-Context Learning (ICL)]] ... [[Context]] ... [[Causation vs. Correlation]] ... [[Autocorrelation]] ... [[Out-of-Distribution (OOD) Generalization]] ... [[Transfer Learning]]
+* [[Perspective]] ... [[Context]] ... [[In-Context Learning (ICL)]] ... [[Transfer Learning]] ... [[Out-of-Distribution (OOD) Generalization]]
+* [[Causation vs. Correlation]] ... [[Autocorrelation]] ...[[Convolution vs. Cross-Correlation (Autocorrelation)]]
 * [[Agents#Auto-GPT|Auto-GPT]] ... autonomously use the results it generates to create new prompts, chaining these operations together to complete complex tasks.
-* [[Immersive Reality]] ... [[Metaverse]] ... [[Digital Twin]] ... [[Internet of Things (IoT)]] ... [[Transhumanism]]
+* [[Immersive Reality]] ... [[Metaverse]] ... [[Omniverse]] ... [[Transhumanism]] ... [[Religion]]
+* [[Life~Meaning]] ... [[Consciousness]] ... [[Loop#Feedback Loop - Creating Consciousness|Creating Consciousness]] ... [[Quantum#Quantum Biology|Quantum Biology]]  ... [[Orch-OR]] ... [[TAME]] ... [[Protein Folding & Discovery|Proteins]]
+* [[Telecommunications]] ... [[Computer Networks]] ... [[Telecommunications#5G|5G]] ... [[Satellite#Satellite Communications|Satellite Communications]] ... [[Quantum Communications]] ... [[Agents#Communication | Communication Agents]] ... [[Smart Cities]] ...  [[Digital Twin]] ... [[Internet of Things (IoT)]]
 * [[Embodied AI]] ... Embodiment Hypothesis: intelligence emerges in the interaction of an agent with an environment
 * [[What is Artificial Intelligence (AI)? | Artificial Intelligence (AI)]] ... [[Generative AI]] ... [[Machine Learning (ML)]] ... [[Deep Learning]] ... [[Neural Network]] ... [[Reinforcement Learning (RL)|Reinforcement]] ... [[Learning Techniques]]
-* [[Conversational AI]] ... [[ChatGPT]] | [[OpenAI]] ... [[Bing]] | [[Microsoft]] ... [[Bard]] | [[Google]] ... [[Claude]] | [[Anthropic]] ... [[Perplexity]] ... [[You]] ... [[Ernie]] | [[Baidu]]
+* [[Conversational AI]] ... [[ChatGPT]] | [[OpenAI]] ... [[Bing/Copilot]] | [[Microsoft]] ... [[Gemini]] | [[Google]] ... [[Claude]] | [[Anthropic]] ... [[Perplexity]] ... [[You]] ... [[phind]] ... [[Ernie]] | [[Baidu]]
+* [[Creatives]] ... [[History of Artificial Intelligence (AI)]] ... [[Neural Network#Neural Network History|Neural Network History]] ... [[Rewriting Past, Shape our Future]] ... [[Archaeology]] ... [[Paleontology]]
 * [[Analytics]] ... [[Visualization]] ... [[Graphical Tools for Modeling AI Components|Graphical Tools]] ... [[Diagrams for Business Analysis|Diagrams]] & [[Generative AI for Business Analysis|Business Analysis]] ... [[Requirements Management|Requirements]] ... [[Loop]] ... [[Bayes]] ... [[Network Pattern]]
 * [[Loop#Multi-Loop Learning|Multi-Loop Learning]]
-* [[Artificial General Intelligence (AGI)#Exponential Progression|Exponential Progression]]
+* [[Artificial General Intelligence (AGI) to Singularity#Exponential Progression|Exponential Progression]]
 * [[Chain of Thought (CoT)]]  ... [[Chain of Thought (CoT)#Tree of Thoughts (ToT)|Tree of Thoughts (ToT)]]
+* [[Gaming]] ... [[Game-Based Learning (GBL)]] ... [[Games - Security|Security]] ... [[Game Development with Generative AI|Generative AI]] ... [[Metaverse#Games - Metaverse|Games - Metaverse]] ... [[Games - Quantum Theme|Quantum]] ... [[Game Theory]] ... [[Game Design | Design]]
+* [[Game Design#Emergent Gameplay | Emergent Gameplay]]
 * [https://en.wikipedia.org/wiki/Emergence Emergence | Wikipedia]
 * [https://singularityhub.com/2019/10/15/this-strange-rule-is-what-makes-the-human-brain-so-powerful/ This Strange Rule Is What Makes the Human Brain So Powerful | Shelly Fan - SingularityHub]
@@ Line 42: / Line 48: @@
 * [https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316 The Unpredictable Abilities Emerging From Large AI Models | Stephen Ornes Quanta Magazine]
 * [https://virtualizationreview.com/articles/2023/04/21/llm-emergence.aspx 'Emergent Abilities': When AI LLMs Learn Stuff They Shouldn't Know | David Ramel - Virtualization Review]
+* [https://medium.com/intuitionmachine/10-novel-insights-about-undecidability-092705596d3b 10 Novel Insights about Undecidability | Carlos E. Perez - Intuition Machine - Medium] ... undecidability is not the limitation preventing the performance but the open stage enabling the play
@@ Line 48: / Line 56: @@
 <hr><b><i>
-AI decoded the movie title based on these emojis, can you? <img src="https://d2r55xnwy6nx47.cloudfront.net/uploads/2023/03/Emojis-EmergenceLLMs.png" width="400">
+Emergence is a placeholder for human ignorance, used when researchers fail to account for all variables in a complex system.
-</i></b><hr>
+</i></b><hr><br>
 These abilities are surprising and unpredictable because they seem to have little to do with analyzing text, which is the main task of these models. Emergence with AI raises many questions about how and why these abilities occur, and what are the potential benefits and risks of using them. [https://arxiv.org/abs/2102.00233 One area of research is to understand the evolutionary patterns of AI emergence across different countries and technologies]. [https://www.pewresearch.org/short-reads/2023/02/22/how-americans-view-emerging-uses-of-artificial-intelligence-including-programs-to-generate-text-or-art Another area of research is to test the performance of large AI models on various tasks and identify the emergent abilities they display].
@@ Line 72: / Line 80: @@
 <youtube>GVL2Y5z2jLU</youtube>
 <youtube>If2Fw0z6uxY</youtube>
-<youtube>Qa4JkgKDaR0</youtube>
+<youtube>Bpgloy1dDn0</youtube>
 <youtube>16W7c0mb-rE</youtube>
 <youtube>ZCBaQpEq1U8</youtube>
@@ Line 78: / Line 86: @@
 = <span id="Emergence from Scale"></span>Emergence from Scale =
+* [[Claude]] 2 | [[Anthropic]]
-From Podcast interview with [[Creatives#Darlo Amodei |Darlo Amodei]], the CEO of Anthropic:
+Excerpt from [https://www.dwarkeshpatel.com/podcast Dwarkesh Patel; The Lunar Society] interview with [[Creatives#Darlo Amodei |Darlo Amodei]], the CEO of [[Anthropic]]:
 <b>Question</b>: Why is the universe  organized such that if you throw big blobs of compute at a wide enough distribution  of data, the thing becomes intelligent?
-<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: I think the truth is that we still don't know.  It's almost entirely an empirical fact. It's a fact that you could sense from the data and from a bunch of different places but we still don't have a satisfying explanation for it. If I were to try to make one and I'm just kind of waving my hands when I say this, there's these ideas in physics around long tail or power law of correlations or effects. When a bunch of  stuff happens, when you have a bunch of features, you get a lot of the data in the early fat part of the distribution before the tails.  For language, this would be things like  — “Oh, I figured out there are parts of   speech and nouns follow verbs.” And then there  are these more and more subtle correlations. So it kind of makes sense why every log or order  of magnitude that you add, you capture more of the distribution. What's not clear at all is why does  it scale so smoothly with parameters? Why does it scale so smoothly with the amount of data? You can think up some explanations of why it's linear. The parameters are like  a bucket, and the data is like water,   and so size of the bucket is proportional to size  of the water. But why does it lead to all this very smooth scaling? We still don't know. There's all these explanations. Our chief scientist,  Jared Kaplan did some stuff on fractal manifold  dimension that you can use to explain it. So there's all kinds of ideas, but I feel  like we just don't really know for sure.  And by the way, for the audience who  is trying to follow along. By scaling, we're referring to the fact that you can very  predictably see how if you go from Claude-1  to Claude-2 that the loss in terms of whether it  can predict the next token scales very smoothly. Okay, so we don't know why it's happening, but  can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge?
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: I think the truth is that we still don't know.  It's almost entirely an empirical fact. It's a fact that you could sense from the data and from a bunch of different places but we still don't have a satisfying explanation for it. If I were to try to make one and I'm just kind of waving my hands when I say this, there's these ideas in physics around long tail or power law of correlations or effects. When a bunch of stuff happens, when you have a bunch of features, you get a lot of the data in the early fat part of the distribution before the tails.  For language, this would be things like  — “Oh, I figured out there are parts of speech and nouns follow verbs.” And then there are these more and more subtle correlations. So it kind of makes sense why every log or order  of magnitude that you add, you capture more of the distribution. What's not clear at all is why does it scale so smoothly with parameters? Why does it scale so smoothly with the amount of data? You can think up some explanations of why it's linear. The parameters are like a bucket, and the data is like water, and so size of the bucket is proportional to size  of the water. But why does it lead to all this very smooth scaling? We still don't know. There's all these explanations. Our chief scientist, Jared Kaplan did some stuff on fractal manifold  dimension that you can use to explain it. So there's all kinds of ideas, but I feel like we just don't really know for sure.
-<b>Question</b>: Okay, so we don't know why it's happening, but  can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Is that at all predictable or are you just looking at the loss number?
+<b>Question</b>: And by the way, for the audience who is trying to follow along. By scaling, we're referring to the fact that you can very predictably see how if you go from [[Claude]] 1 to [[Claude]] 2 that the loss in terms of whether it can predict the next token scales very smoothly. Okay, so we don't know why it's happening, but  can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Okay, so we don't know why it's happening, but can you at least predict empirically that here is the loss at which this ability will emerge, here is the place where this circuit will emerge? Is that at all predictable or are you just looking at the loss number?
-<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: That is much less predictable. What's predictable is this statistical average, this loss, this entropy. And it's super predictable. It's sometimes predictable even to several significant figures which you don't  see outside of physics. You don't expect to see   it in this messy empirical field. But specific  abilities are actually very hard to predict. Back when I was working on GPT-2 and GPT-3, when  does arithmetic come in place? When do models learn to code? Sometimes it's very abrupt. It's like how you can predict statistical averages of the weather, but the weather on  one particular day is very hard to predict.
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: That is much less predictable. What's predictable is this statistical average, this loss, this entropy. And it's super predictable. It's sometimes predictable even to several significant figures which you don't  see outside of physics. You don't expect to see it in this messy empirical field. But specific abilities are actually very hard to predict. Back when I was working on GPT-2 and GPT-3, when  does arithmetic come in place? When do models learn to code? Sometimes it's very abrupt. It's like how you can predict statistical averages of the weather, but the weather on  one particular day is very hard to predict.
 <b>Question</b>: Dumb it down for me. I don't understand [[Manifold Hypothesis|manifold]]s, but mechanistically, it doesn't know addition yet and suddenly now it knows addition. What has happened?
-<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: This is another question that we don't know the  answer to. We're trying to answer this with things like mechanistic interpretability. You can think  about these things like circuits snapping into place. Although there is some evidence that when  you look at the models being able to add things, its chance of getting the right answer shoots  up all of a sudden. But if you look at what's   the probability of the right answer? You'll  see it climb from like one in a million to one in 100,000 to one in a 1000 long before it  actually gets the right answer. In many of these cases there's some continuous process going on  behind the scenes. I don't understand it at all.
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: This is another question that we don't know the  answer to. We're trying to answer this with things like mechanistic interpretability. You can think  about these things like circuits snapping into place. Although there is some evidence that when you look at the models being able to add things, its chance of getting the right answer shoots  up all of a sudden. But if you look at what's the probability of the right answer? You'll  see it climb from like one in a million to one in 100,000 to one in a 1000 long before it actually gets the right answer. In many of these cases there's some continuous process going on  behind the scenes. I don't understand it at all.
 <b>Question</b>: Does that imply that the circuit or the process for doing addition was pre-existing and it just got increased in salience?
@@ Line 99: / Line 107: @@
 <b>Question</b>: Are there abilities that won't emerge with scale?
-<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: I definitely think that things like alignment and values are not guaranteed to emerge with scale. One way to think about it is you train the  model and it's basically predicting the world, it's understanding the world. Its job is facts  not values. It's trying to predict what comes next. But there's free variables here — What should you do? What should you think? What should you value? There aren't bits for that. There's just — if I started with this I should finish with this. If I started with this other thing I should finish with this other thing.  And so I think that's not going to emerge. If it turns out that scaling plateaus before we reach human level intelligence, looking  back on it, what would be your explanation?
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: I definitely think that things like alignment and values are not guaranteed to emerge with scale. One way to think about it is you train the model and it's basically predicting the world, it's understanding the world. Its job is facts not values. It's trying to predict what comes next. But there's free variables here — What should you do? What should you think? What should you value? There aren't bits for that. There's just — if I started with this I should finish with this. If I started with this other thing I should finish with this other thing.  And so I think that's not going to emerge.
+<b>Question</b>: If it turns out that scaling plateaus before we reach human level intelligence, looking  back on it, what would be your explanation?  What do you think is likely to be the case if that turns out to be the outcome?
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: I would distinguish some problem with the  fundamental theory with some practical issue.
+* <b>Data</b>: One practical issue we could have is  we could run out of data. For various reasons, I think that's not going to happen but if you  look at it very naively we're not that far from running out of data. So it's like we just don't  have the data to continue the scaling curves.
+* <b>Compute</b>: Another way it could happen is we just use  up all of the compute that was available and that wasn't enough and then progress is slow after that.
+<center><i>
+I wouldn't bet on either of those things happening but they could. From a fundamental perspective, I personally think it's very unlikely that the scaling laws will just stop.
+</i></center>
+* <b>Architecture</b>: If they do, another reason could just be that we don't have quite the right architecture. If we tried to do it with an [[Long Short-Term Memory (LSTM)|LSTM]] or an [[Recurrent Neural Network (RNN)|RNN]] the slope  would be different. It still might be that we get there but there are some things that are just very hard to represent when you don't have the ability to attend far in the past that [[transformer]]s have. If somehow we just hit a wall and it wasn’t about the architecture I'd be very surprised by that.
+We're already at the point where to me the things <b>the models can't do don't seem to be different in kind from the things they can do</b>. You could have made a case a few years ago that they can't reason, they can't program. You could have drawn boundaries and said maybe you'll hit a wall. I didn't think we would hit a wall, a few other people didn't think we would hit a wall, but it was a more plausible case then. It's a less plausible case now. It could happen. This stuff is crazy.  We could hit a wall tomorrow. If that happens my explanation would be there's something wrong with the loss when you train on next word prediction. If you really want to learn to program at a really high level, it means you care about some tokens much more than others and they're rare enough that the loss function over focuses on the  appearance, the things that are responsible for the most bits of entropy, and instead they don't focus on this stuff that's really essential. So you could have the signal drowned out in the noise. I don't think it's going to play out that way for a number of reasons. But if you  told me — Yes, you trained your 2024 model. It was much bigger and it just wasn't any better, and you tried every architecture and didn't work, that's the explanation I would reach for.
-<b>Question</b>: What do you think is likely to be the  case if that turns out to be the outcome?
+<b>Question</b>: Is there a candidate for another loss function?  If you had to abandon next token prediction.
-<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: I would distinguish some problem with the  fundamental theory with some practical issue. One practical issue we could have is  we could run out of data. For various reasons, I think that's not going to happen but if you  look at it very naively we're not that far from running out of data. So it's like we just don't  have the data to continue the scaling curves. Another way it could happen is we just use  up all of the compute that was available and that wasn't enough and then progress is slow after that. I wouldn't bet on either of those things happening but they could. From a fundamental perspective, I personally think it's very unlikely that the scaling laws will just stop. If they do, another reason could just be that  we don't have quite the right architecture. If we tried to do it with an LSTM or an RNN the slope  would be different. It still might be that we get there but there are some things that are just very  hard to represent when you don't have the ability to attend far in the past that transformers have. If somehow we just hit a wall and it wasn’t about the architecture I'd be very surprised by that.  We're already at the point where to me the things the models can't do don't seem to be  different in kind from the things they can do. You could have made a case a few years ago  that they can't reason, they can't program.  You could have drawn boundaries and said maybe  you'll hit a wall. I didn't think we would hit a wall, a few other people didn't think we would  hit a wall, but it was a more plausible case then. It's a less plausible case now. It could happen. This stuff is crazy.  We could hit a wall tomorrow. If that happens my  explanation would be there's something wrong with the loss when you train on next word prediction. If you really want to learn to program at a really high level, it means you care about some  tokens much more than others and they're rare enough that the loss function over focuses on the  appearance, the things that are responsible for the most bits of entropy, and instead they don't  focus on this stuff that's really essential. So you could have the signal drowned out in the  noise. I don't think it's going to play out that way for a number of reasons. But if you  told me — Yes, you trained your 2024 model. It was much bigger and it just wasn't any better,  and you tried every architecture and didn't work, that's the explanation I would reach for.
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>:  I think then you would have to go for some kind of RL. There's many different  kinds. There's RL from immune feedback, there's RL against an objective, there's things  like Constitutional AI. There's things like amplification and debate. These are kind of both  alignment methods and ways of training models. You would have to try a bunch of things, but the  focus would have to be on what do we actually care   about the model doing? In a sense, we're a little  bit lucky that predict the next word gets us all these other things we need. There's no guarantee.
-<b>Question</b>: Is there a candidate for another loss function?
+<b>Question</b>: From your worldview it seems there's a multitude of different loss functions that it's just a matter of what can allow you to just throw a whole bunch of data at it. Next token  prediction itself is not significant.
-<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b> If you had to abandon next token prediction. I think then you would have to go for some   kind of RL. There's many different  kinds. There's RL from immune feedback, there's RL against an objective, there's things  like Constitutional AI. There's things like   amplification and debate. These are kind of both  alignment methods and ways of training models. You would have to try a bunch of things, but the  focus would have to be on what do we actually care   about the model doing? In a sense, we're a little  bit lucky that predict the next word gets us all these other things we need. There's no guarantee. From your worldview it seems there's a multitude   of different loss functions that it's just a  matter of what can allow you to just throw a whole bunch of data at it. Next token  prediction itself is not significant.  The thing with RL is you get slowed down a bit  because you have to design how the loss function works by some method. The nice thing with the  next token prediction is it's there for you.  It's the easiest thing in the world. So  I think it would slow you down if you   couldn't scale in just that very simplest way.
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: The thing with RL is you get slowed down a bit because you have to design how the loss function works by some method. The nice thing with the  next token prediction is it's there for you.  It's the easiest thing in the world. So I think it would slow you down if you couldn't scale in just that very simplest way.
 <b>Question</b>: You mentioned that data is likely not to be the constraint. Why do you think that is the case?
-<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>There's various possibilities here and for a  number of reasons I shouldn't go into the details,  but there's many sources of data in the world and there's many ways that you can also generate  data. My guess is that this will not be a blocker. Maybe it would be better  if it was, but it won't be.  Are you talking about multimodal? There’s just many different ways to do it.
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: There's various possibilities here and for a  number of reasons I shouldn't go into the details,  but there's many sources of data in the world and there's many ways that you can also generate  data. My guess is that this will not be a blocker. Maybe it would be better  if it was, but it won't be.
+<b>Question</b>: Are you talking about multimodal?
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: There’s just many different ways to do it.
+<b>Question</b>: How did you form your views on scaling? How far back can we go? And then you would be basically saying something similar to this.
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: This view that I have formed gradually from 2014 to 2017. My first experience with it was my first experience with AI. I saw some of the early stuff around [[Datasets|AlexNet]] in 2012. I always had wanted to study intelligence but before I was just like, this doesn’t seem like it’s actually working. All the way back to 2005. I'd read [[Creatives#Ray Kurzweil|Ray Kurzweil]]’s work. I'd read  even some of Eliezer (Yudkowsky)’s work on the early Internet back then. And I thought this stuff kind of looks far away. I look at the AI stuff of today and it’s not anywhere close. But with [[Datasets|AlexNet]] I was like, oh, this stuff is actually starting to work. So I joined [[Creatives#Andrew Ng|Andrew Ng]]’s group at [[Baidu]]. I had been in a different field and this was my first experience with AI and it was a bit different from a lot of the academic style research that was going on elsewhere in the world. I kind of got lucky in that the task that was given to me and the other folks there. It was just to make the best  speech recognition system that you can. There was a lot of data available, there were a lot of GPUs available. It posed the problem in a way that was amenable to discovering that kind  of scaling was a solution. That's very different from being a postdoc whose job is to come up with an idea that seems clever and new and makes your mark as someone who's invented something. I just tried the simplest experiments. I was just fiddling with some dials. I was like, try adding more layers to the [[Recurrent Neural Network (RNN)|RNN]], try training it for longer, what happens?  How long does it take to overfit? What if I add new data and repeat it less times? And  I just saw these very consistent patterns. I didn't really know that this was unusual or  that others weren't thinking in this way. This was almost like beginner's luck. It was my first  experience with it and I didn't really think about it beyond speech recognition. I was just like, oh, I don't know anything about this field. There are zillions of things people do with  machine learning. But I'm like, weird, this seems to be true in the speech recognition field. It was just before [[OpenAI]] started that I met Ilya,  who you interviewed. One of the first  things he said to me was — “Look. The models, they just want to learn. You have to  understand this. The models, they just want to learn.” And it was a bit like a Zen Koan. I listened to this and I became enlightened.
+<hr><center><b><i>
+Look. The models, they just want to learn. You have to  understand this. The models, they just want to learn. - [[Creatives#Ilya Sutskever |Ilya Sutskever]]
+</i></b></center><hr>
+And over the years, I would be the one who would formalize a lot of these things and kind of put them together, but what that told me was that the phenomenon that I'd seen wasn't just some random thing. It was broad. It was more general. The models just want to learn. You get the obstacles out of their way. You give them good data, you give them enough space to operate in, you don't do something stupid like condition them badly numerically, and they want to learn. They'll do it.
+<b>Question</b>: What I find really interesting about what you said is there were many people who were aware that these things are really good at speech recognition or at playing these constrained games. Very few extrapolated from there like you and Ilya did to something that is generally intelligent. What was different about the way you were thinking about it versus how others were thinking about  it? What made you think it's getting better at speech in this consistent way, it will get  better at everything in this consistent way.
-<b>Question</b>: How did you form your views on scaling?
+<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>: I genuinely don't know. At first when I saw it for speech, I assumed this was just true for speech or for this narrow class of models. I think it was just that over the period between 2014 and 2017, I tried it for a lot of things and saw the same  thing over and over again. I watched the same being true with Dota. I watched the same being  true with robotics. Many people thought that as a counterexample, but I just thought, well, it's hard to get data for robotics, but if we look within the data that we have, we see the same patterns. I think people were very focused on solving the problem in front of them. It's very hard to explain why one  person thinks one way and another person thinks a different way. People just see it through a different lens. They are looking vertically instead of horizontally. They're not thinking about the scaling, they're thinking about how do I solve my problem?  And for robotics, there's not enough data.  That can easily abstract to — scaling doesn't work because we don't have the data. For some reason, and it may just have been random, I was obsessed with that particular direction.
-<b>[[Creatives#Darlo Amodei |Darlo Amodei]]</b>How far back can we go? And then you would be basically saying something similar to this. This view that I have formed gradually from  2014 to 2017. My first experience with  it was my first experience with AI. I saw some of the early stuff around AlexNet in 2012. I always had wanted to study intelligence but before I was just like, this doesn’t seem  like it’s actually working. All the way back to 2005. I'd read Ray Kurzweil’s work. I'd read  even some of Eliezer’s work on the early Internet back then. And I thought this stuff kind of  looks far away. I look at the AI stuff of   today and it’s not anywhere close. But with AlexNet I was like, oh, this stuff is actually starting to  work. So I joined Andrew Ng’s group   at Baidu. I had been in a different field and  this was my first experience with AI and it was a bit different from a lot of the academic style  research that was going on elsewhere in the world. I kind of got lucky in that the task  that was given to me and the other   folks there. It was just to make the best  speech recognition system that you can. There was a lot of data available, there were a  lot of GPUs available. It posed the problem in a way that was amenable to discovering that kind  of scaling was a solution. That's very different   from being a postdoc whose job is to come up with  an idea that seems clever and new and makes your mark as someone who's invented something. I just tried the simplest experiments. I was just fiddling with some dials. I was  like, try adding more layers to the RNN, try training it for longer, what happens?  How long does it take to overfit? What if   I add new data and repeat it less times? And  I just saw these very consistent patterns. I didn't really know that this was unusual or  that others weren't thinking in this way. This was almost like beginner's luck. It was my first  experience with it and I didn't really think   about it beyond speech recognition. I was just  like, oh, I don't know anything about this field.  There are zillions of things people do with  machine learning. But I'm like, weird, this seems to be true in the speech recognition field. It was just before OpenAI started that I met Ilya,  who you interviewed. One of the first  things he said to me was — “Look. The models, they just want to learn. You have to  understand this. The models, they just want to learn.” And it was a bit like a Zen Koan.  I listened to this and I became enlightened. And over the years, I would be the one who would  formalize a lot of these things and kind of put them together, but what that told me was that the  phenomenon that I'd seen wasn't just some random thing. It was broad. It was more general.  The models just want to learn. You get the obstacles out of their way. You give them good  data, you give them enough space to operate in,  you don't do something stupid like  condition them badly numerically, and they want to learn. They'll do it. What I find really interesting about what you said is there were many people who were  aware that these things are really good at speech recognition or at playing these constrained games.  Very few extrapolated from there like you and Ilya did to something that is generally intelligent. What was different about the way you were thinking   about it versus how others were thinking about  it? What made you think it's getting better at speech in this consistent way, it will get  better at everything in this consistent way.  I genuinely don't know. At first when I saw it for  speech, I assumed this was just true for speech or for this narrow class of models. I think it was  just that over the period between 2014 and 2017, I tried it for a lot of things and saw the same  thing over and over again. I watched the same being true with Dota. I watched the same being  true with robotics. Many people thought that as a counterexample, but I just thought,  well, it's hard to get data for robotics,   but if we look within the data that  we have, we see the same patterns. I think people were very focused  on solving the problem in front of them. It's very hard to explain why one  person thinks one way and another person thinks a different way. People just see it  through a different lens. They are looking vertically instead of horizontally.  They're not thinking about the scaling,  they're thinking about how do I solve my problem?  And for robotics, there's not enough data.  That can easily abstract to — scaling  doesn't work because we don't have the data. For some reason, and it may just have been random,  I was obsessed with that particular direction.
+<youtube>Nlkk3glap_U</youtube>
 = <span id="Emergence from Analogies"></span>Emergence from Analogies =

Difference between revisions of "Emergence"

Latest revision as of 15:14, 5 January 2026

Contents

Emergence from Scale

Emergence from Analogies

Emergence & Reductionism

Examples

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools