Aviary

From
Jump to: navigation, search

YouTube ... Quora ...Google search ...Google News ...Bing News

Aviary is a new open-source project launched by Anyscale, the AI infrastructure company built by the creators of Ray. It is designed to help developers simplify the process of choosing and integrating the best open-source large language models (LLMs) into their applications. Aviary is the first fully free, cloud-based infrastructure designed to help developers choose and deploy the right technologies and approach for their LLM-based applications. It includes libraries, tooling, examples, documentation, and sample code—all available in open source and readily adaptable for small experiments or large evaluations. Aviary provides an extensive suite of pre-configured open-source LLMs, with reasonable defaults that work out of the box. It includes libraries, tooling, examples, documentation, and sample code—all available in open source and readily adaptable for small experiments or large evaluations.



Can deploy it yourself to a cloud service, or simply use Anyscale's hosted version.



Aviary is a new open-source project launched by Anyscale, the AI infrastructure company built by the creators of Ray. It is designed to help developers simplify the process of choosing and integrating the best open-source large language models (LLMs) into their applications. Aviary is built on top of Ray Serve, Anyscale’s popular open-source offering for serving and scaling AI applications, including LLMs. Aviary provides an extensive suite of pre-configured open-source LLMs, with reasonable defaults that work out of the box. It also aims to help solve the challenge of model selection. With the growing number of models, it’s not easy for anyone to know the best model for a specific use case. By making it easier to deploy open-source LLMs, Aviary is also making it easier for organizations to compare different LLMs. The comparisons enabled via Aviary include accuracy, latency and cost. The goal of Aviary is to enable developers to identify the best open source platform to fine-tune and scale an LLM application. Developers can submit test prompts to a pre-selected set of LLMs, including Llama, CarperAI, Dolly 2.0, Vicuna, StabilityAI, and Amazon’s LightGPT.

Aviary Explorer

Aviary Explorer is a tool provided by Anyscale that allows developers to submit one prompt to multiple LLMs and evaluate the quality of responses. It is an interactive front-end that helps developers review and assess the performance and compute costs of different LLMs. Aviary Explorer is built on Ray Serve, Anyscale's popular open-source offering for serving and scaling AI applications, including LLMs.


Ray

Ray is an open-source distributed computing framework for scaling machine learning and Python workloads. It is developed by Anyscale, a company that provides a managed platform for Ray. With Ray, developers can scale their compute-intensive workloads from their laptop to any cloud with minimal code changes. Ray has a strong ecosystem of distributed libraries and integrations that make it easy to scale existing workloads. Anyscale offers a fully managed Ray platform that provides a seamless user experience for developers and AI teams to speed development and deploy AI/ML workloads at scale.

Ray has a wide range of use cases for scaling machine learning and Python workloads. Some common use cases include:

  • Large language models (LLMs) and generative AI: Ray provides a distributed compute framework for scaling these models, allowing developers to train and deploy models faster and more efficiently. With specialized libraries for data streaming, training, fine-tuning, hyperparameter tuning, and serving, Ray simplifies the process of developing and deploying large-scale AI models.
  • Batch Inference: Ray can be used for batch inference, which is the process of generating model predictions on a large “batch” of input data. Ray for batch inference works with any cloud provider and ML framework, and is fast and cheap for modern deep learning applications. It scales from single machines to large clusters with minimal code changes.
  • Many Model Training: Many model training is common in ML use cases such as time series forecasting, which require fitting of models on multiple data batches corresponding to locations, products, etc. The focus is on training many models on subsets of a dataset. This is in contrast to training a single model on the entire dataset. When any given model you want to train can fit on a single GPU, Ray can assign each training run to a separate Ray Task.

Ray provides a higher-level API for parallel and pipelined data processing, while internally handling data batching, task parallelism and pipelining, and memory management. Ray takes functions and classes and translates them to the distributed setting as tasks and actors. This allows developers to easily parallelize their workloads and take advantage of the distributed computing capabilities of Ray.

In Ray, an actor is a stateful worker that can be used to encapsulate state and methods. An actor is also a “Ray worker” but is instantiated at runtime (upon `actor_cls.remote()`). All of its methods will run on the same process, using the same resources (designated when defining the Actor). This allows developers to create distributed objects with methods that can be invoked remotely. Actors are useful for implementing distributed systems, such as parameter servers, simulators, and databases.

Ray handles task scheduling and placement through a combination of resource requirements, scheduling strategies, and placement groups. For each task or actor, Ray will choose a node to run it based on the specified resource requirements and the availability of resources on the nodes in the cluster. Ray supports a `DEFAULT` scheduling strategy that schedules tasks or actors onto a group of the top k nodes based on resource utilization and locality. Developers can also specify a custom scheduling strategy using the `scheduling_strategy` option when defining tasks or actors.

Placement groups allow users to atomically reserve groups of resources across multiple nodes (i.e., gang scheduling). They can be then used to schedule Ray tasks and actors packed as close as possible for locality (PACK), or spread apart (SPREAD). Placement groups are generally used for gang-scheduling actors, but also support tasks.