Self-Supervised

From
Jump to: navigation, search

YouTube search... ...Google search

Self-supervised learning refers to an unsupervised learning problem that is framed as a supervised learning problem in order to apply supervised learning algorithms to solve it. Supervised learning algorithms are used to solve an alternate or pretext task, the result of which is a model or representation that can be used in the solution of the original (actual) modeling problem. A common example of self-supervised learning is computer vision where a corpus of unlabeled images is available and can be used to train a supervised model, such as making images grayscale and having a model predict a color representation (colorization) or removing blocks of the image and have a model predict the missing parts (inpainting). 14 Different Types of Learning in Machine Learning | Jason Brownlee - Machine Learning Mastery


Given a task and enough labels, supervised learning can solve it really well. Good performance usually requires a decent amount of labels, but collecting manual labels is expensive (i.e. ImageNet) and hard to be scaled up. Considering the amount of unlabelled data (e.g. free text, all the images on the Internet) is substantially more than a limited number of human curated labelled Datasets, it is kinda wasteful not to use them. However, unsupervised learning is not easy and usually works much less efficiently than supervised learning. What if we can get labels for free for unlabelled data and train unsupervised dataset in a supervised manner? We can achieve this by framing a supervised learning task in a special form to predict only a subset of information using the rest. In this way, all the information needed, both inputs and labels, has been provided. This is known as self-supervised learning. This idea has been widely used in language modeling. The default task for a language model is to predict the next word given the past sequence. Bidirectional Encoder Representations from Transformers (BERT) adds two other auxiliary tasks and both rely on self-generated labels.

Why Self-Supervised Learning? Self-supervised learning empowers us to exploit a variety of labels that come with the data for free. The motivation is quite straightforward. Producing a dataset with clean labels is expensive but unlabeled data is being generated all the time. To make use of this much larger amount of unlabeled data, one way is to set the learning objectives properly so as to get supervision from the data itself. The self-supervised task, also known as pretext task, guides us to a supervised loss function. However, we usually don’t care about the final performance of this invented task. Rather we are interested in the learned intermediate representation with the expectation that this representation can carry good semantic or structural meanings and can be beneficial to a variety of practical downstream tasks. Self-Supervised Representation Learning | Lilian Weng - Lil'Log



In the context of self-supervised learning, a self-supervised loss is a type of loss function that is used to train a predictive model based on a self-supervised task. A self-supervised loss function is designed to take advantage of the supervisory signal provided by the self-supervised task, which allows the algorithm to learn representative features of the data that can be used for downstream tasks. The precise form of the self-supervised loss function depends on the specifics of the self-supervised task being used, but typically involves minimizing the difference between the predicted output and the actual output.



Towards Autonomous Machine Intelligence

Dr. Yann LeCun's paper, "A Path Towards Autonomous Machine Intelligence," proposes an ambitious framework for developing AI that mimics human and animal learning more closely. LeCun's framework represents a significant shift towards more autonomous, flexible, and intelligent systems capable of complex reasoning and decision-making without extensive external supervision Here are some of the key elements and hypotheses from his work:

1. Self-Supervised Learning (SSL): LeCun emphasizes the importance of SSL, where machines learn to understand the world by predicting parts of their input from other parts without needing extensive labeled data. This approach is crucial for developing AI that can reason and understand in a human-like manner.

2. Joint Embedding Predictive Architecture (JEPA): The central concept in LeCun's proposal is JEPA, an energy-based model that captures dependencies between inputs by mapping them into a representation space. This architecture allows the model to make predictions based on abstract representations rather than direct data, making it more efficient and scalable for tasks requiring long-term prediction and reasoning.

3. Intrinsic Motivation: LeCun suggests that autonomous AI should be driven by intrinsic objectives rather than external rewards or hard-coded programs. This idea is inspired by how animals and humans are motivated to explore and learn from their environment naturally.

4. Modular Architecture: The proposed framework consists of several modules:

  * Configurator Module: This acts as a manager, orchestrating other modules based on the task at hand.
  * Perception Module: It estimates the current state and represents it in a stratified manner.
  * World Model Module: This is the core of the system, using SSL to create a predictive model of the world.
  * Cost Module: It defines a cost function to evaluate the performance of predictions.
  * Memory Modules: These include short-term and long-term memory to manage and store information about states and corresponding predictions.

5. Hierarchical Representation and Prediction: By stacking JEPA models, the architecture can handle both short-term and long-term predictions at varying levels of abstraction. For example, a high-level representation might predict broad actions like a cook preparing a dish, while lower levels detail specific movements and actions


Positions/Conjectures/Crazy Hypotheses

Air Force Research Laboratory (AFRL)/Air Force Office of Scientific Research (AFOSR) Chief Scientist Distinguished Lecture Series featuring Dr. Yann LeCun In this talk, we discuss cognitive architecture for autonomous learning agents capable of planning and reasoning. The centerpiece of the architecture is a hierarchical world model, trained with self-supervised learning, with which the system can predict the consequences of its actions and plan sequences of actions to fulfill an objective. The world model uses a new type of deep learning architecture called H-JEPA (Hierarchical Joint Embedding Predictive Architecture) which can perform predictions at multiple time scales and multiple levels of abstraction.

  • Prediction is the essence of intelligence
    • Learning predictive models of the world is the basis of common sense
  • Almost everything is learned
    • Low-level feature, space, objects, physics, abstract representations...
  • H-JEPA with non-contrastive training are the thing
    • Probabilistic generative models and constrastive methods are doomed
  • Almost everything is learned through self-supervised learning
    • Almost nothing is learned through reinforcement, supervised or imitation
  • Intrinsic cost and architecture drive behavior & determine what we learn
  • Emotions and necessary for autonomous intelligence
    • They are anticipation of outcomes by the critic or the world model + intrinsic cost
  • Reasoning is an extension of simulation/prediction and planning
  • Conscious exists because we have only one world model engine