Foundation Models (FM)

From
Jump to: navigation, search

YouTube ... Quora ...Google search ...Google News ...Bing News

The term Foundation Model (FM) was coined by Stanford researchers to introduce a new category of ML models. They defined FMs as models trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks. A foundation model is a large artificial intelligence model trained on a vast quantity of unlabeled data at scale (usually by self-supervised learning) resulting in a model that can be adapted to a wide range of downstream tasks. Foundation models have helped bring about a major transformation in how AI systems are built - Wikipedia



If a Large Language Model (LLM) is impactful and used with specific use cases it is referred to as a foundation model



Artificial intelligence (AI) has come a long way in recent years, with the development of increasingly sophisticated models that can perform a wide range of tasks. One of the most exciting developments in this field is the emergence of foundation models. A foundation model is a “paradigm for building AI systems” in which a model trained on a large amount of unlabeled data can be adapted to many applications. This means that instead of building a separate AI model for each specific task, a single foundation model can be used for multiple tasks with minimal fine-tuning. Parameters are a machine learning term for the variables present in the model on which it was trained that can be used to infer new content.


Why use Foundation Models?

One of the main advantages of using foundation models is their flexibility. Because they are designed to be adapted to various downstream cognitive tasks by pre-training on broad data at scale, they can be used for a wide range of applications. Another advantage of using foundation models is their reusability. Instead of having to build a new AI model from scratch for each new task, a single foundation model can be used for multiple tasks with minimal fine-tuning. This can save time and resources and make it easier to develop and deploy AI systems.

  • A foundation model is a type of deep learning algorithm that has been pre-trained with large data sets scraped from the public internet.
  • Foundation models are trained with a wide variety of data and can transfer knowledge from one task to another.
  • They can be fine-tuned to complete different types of tasks and are primarily used for natural language processing and generation.
  • Compared to standalone, task-oriented machine learning models, foundation models help create reliable AI solutions faster and cheaper, with less data involved and minimal fine-tuning.


How to use Foundation Models?

Foundation models are trained on enormous quantities of unlabeled data through self-supervised learning. This means that the model learns by predicting missing information in the data, without the need for explicit labels. Once the foundation model has been trained, it can be used for various tasks through transfer learning. This involves adapting the model to a new task by fine-tuning it on a smaller amount of labeled data specific to that task. Some examples of foundation models include Generative Pre-trained Transformer (GPT)-3, Bidirectional Encoder Representations from Transformers (BERT), and DALL-E 2. These models have shown impressive capabilities in natural language processing and generation, as well as image generation. The use of foundation models has the potential to revolutionize many industries and applications. For example, they could be used to develop more sophisticated digital assistants, improve medical diagnosis, or generate new works of art.

Visual Foundation Models (VFM)

Visual Foundation Models (VFMs) are a type of foundation model that focuses on the creation or analysis of images and videos. They take advantage of the capabilities of foundation models such as LLMs and apply them to visual data.

VFMs are trained on large datasets of images and videos, and they can be used for a variety of tasks, including:

  • Image classification: VFMs can be used to classify images into different categories, such as cats, dogs, cars, and people.
  • Object detection: VFMs can be used to detect objects in images, such as faces, cars, and text.
  • Image segmentation: VFMs can be used to segment images into different parts, such as the background, foreground objects, and text.
  • Image generation: VFMs can be used to generate new images, such as realistic images of people or objects that do not exist in the real world.
  • Video analysis: VFMs can be used to analyze videos, such as tracking objects in motion or identifying people in a crowd.

VFMs are still under development, but they have the potential to revolutionize the way we interact with images and videos. For example, VFMs could be used to create new forms of art, to improve the accuracy of medical diagnoses, or to make self-driving cars safer.

Here are some examples of VFMs:

  • DALL-E: DALL-E is a VFM that can be used to generate images from text descriptions.
  • Flamingo: Flamingo is a VFM that can be used to translate images into text descriptions.
  • Florence: Florence is a VFM that can be used to answer questions about images.
  • NOOR: NOOR is a VFM that can be used to detect objects in images.

VFMs are a promising new technology with the potential to revolutionize the way we interact with images and videos. As they continue to develop, we can expect to see even more amazing and innovative applications for VFMs in the future.

Offerings

Amazon AWS

Amazon Bedrock provides multiple foundation models designed to allow companies to customize and create their own generative AI applications for targeted use cases and commercial use. With Bedrock’s serverless experience, you can get started quickly, privately customize FMs with your own data, and easily integrate and deploy them into your applications using the AWS tools and capabilities you are familiar with (including integrations with Amazon SageMaker ML features like Experiments to test different models and Pipelines to manage your FMs at scale) without having to manage any infrastructure

The initial set of foundation models supported by the service include ones from:

Microsoft

Microsoft Research has developed widely used visual architectures such as the Swin Transformer series and popular self-supervised learning methods such as PixPro and SimMIM. They also trained the world’s largest and best dense visual model (Swin V2-G with 3B parameters) as of November 2021.

IBM