Aviary

From
Revision as of 21:03, 14 June 2023 by BPeat (talk | contribs) (Ray)
Jump to: navigation, search

YouTube ... Quora ...Google search ...Google News ...Bing News

Anyscale tool to help developers work with Large Language Model (LLM). Called Aviary, Anyscale describes it as the “first fully free, cloud-based infrastructure designed to help developers choose and deploy the right technologies and approach for their LLM-based applications.” Like Ray, Aviary is being released as an open source project.

Ray

Ray is an open-source distributed computing framework for scaling machine learning and Python workloads. It is developed by Anyscale, a company that provides a managed platform for Ray. With Ray, developers can scale their compute-intensive workloads from their laptop to any cloud with minimal code changes. Ray has a strong ecosystem of distributed libraries and integrations that make it easy to scale existing workloads. Anyscale offers a fully managed Ray platform that provides a seamless user experience for developers and AI teams to speed development and deploy AI/ML workloads at scale.

Ray has a wide range of use cases for scaling machine learning and Python workloads. Some common use cases include:

  • Large language models (LLMs) and generative AI: Ray provides a distributed compute framework for scaling these models, allowing developers to train and deploy models faster and more efficiently. With specialized libraries for data streaming, training, fine-tuning, hyperparameter tuning, and serving, Ray simplifies the process of developing and deploying large-scale AI models.
  • Batch Inference: Ray can be used for batch inference, which is the process of generating model predictions on a large “batch” of input data. Ray for batch inference works with any cloud provider and ML framework, and is fast and cheap for modern deep learning applications. It scales from single machines to large clusters with minimal code changes.
  • Many Model Training: Many model training is common in ML use cases such as time series forecasting, which require fitting of models on multiple data batches corresponding to locations, products, etc. The focus is on training many models on subsets of a dataset. This is in contrast to training a single model on the entire dataset. When any given model you want to train can fit on a single GPU, Ray can assign each training run to a separate Ray Task.

Ray provides a higher-level API for parallel and pipelined data processing, while internally handling data batching, task parallelism and pipelining, and memory management. Ray takes functions and classes and translates them to the distributed setting as tasks and actors. This allows developers to easily parallelize their workloads and take advantage of the distributed computing capabilities of Ray.

In Ray, an actor is a stateful worker that can be used to encapsulate state and methods. An actor is also a “Ray worker” but is instantiated at runtime (upon `actor_cls.remote()`). All of its methods will run on the same process, using the same resources (designated when defining the Actor). This allows developers to create distributed objects with methods that can be invoked remotely. Actors are useful for implementing distributed systems, such as parameter servers, simulators, and databases.

Ray handles task scheduling and placement through a combination of resource requirements, scheduling strategies, and placement groups. For each task or actor, Ray will choose a node to run it based on the specified resource requirements and the availability of resources on the nodes in the cluster. Ray supports a `DEFAULT` scheduling strategy that schedules tasks or actors onto a group of the top k nodes based on resource utilization and locality. Developers can also specify a custom scheduling strategy using the `scheduling_strategy` option when defining tasks or actors.

Placement groups allow users to atomically reserve groups of resources across multiple nodes (i.e., gang scheduling). They can be then used to schedule Ray tasks and actors packed as close as possible for locality (PACK), or spread apart (SPREAD). Placement groups are generally used for gang-scheduling actors, but also support tasks.