AIOps / MLOps

Jump to: navigation, search

Youtube search... ...Google search

Machine learning capabilities give IT operations teams contextual, actionable insights to make better decisions on the job. More importantly, AIOps is an approach that transforms how systems are automated, detecting important signals from vast amounts of data and relieving the operator from the headaches of managing according to tired, outdated runbooks or policies. In the AIOps future, the environment is continually improving. The administrator can get out of the impossible business of refactoring rules and policies that are immediately outdated in today’s modern IT environment. Now that we have AI and machine learning technologies embedded into IT operations systems, the game changes drastically. AI and machine learning-enhanced automation will bridge the gap between DevOps and IT Ops teams: helping the latter solve issues faster and more accurately to keep pace with business goals and user needs. How AIOps Helps IT Operators on the Job | Ciaran Byrne - Toolbox

MLOps #28 ML Observability // Aparna Dhinakaran - Chief Product Officer at Arize AI As more and more machine learning models are deployed into production, it is imperative we have better observability tools to monitor, troubleshoot, and explain their decisions. In this talk, Aparna Dhinakaran, Co-Founder, CPO of Arize AI (Berkeley-based startup focused on ML Observability), will discuss the state of the commonly seen ML Production Workflow and its challenges. She will focus on the lack of model observability, its impacts, and how Arize AI can help. This talk highlights common challenges seen in models deployed in production, including model drift, data qualitydata quality issues, distribution changes, outliers, and bias. The talk will also cover best practices to address these challenges and where observability and explainability can help identify model issues before they impact the business. Aparna will be sharing a demo of how the Arize AI platform can help companies validate their models performance, provide real-time performance monitoring and alerts, and automate troubleshooting of slices of model performance with explainability. The talk will cover best practices in ML Observability and how companies can build more transparency and trust around their models. Aparna Dhinakaran is Chief Product Officer at Arize AI, a startup focused on ML Observability. She was previously an ML engineer at Uber, Apple, and Tubemogul (acquired by Adobe). During her time at Uber, she built a number of core ML Infrastructure platforms including Michaelangelo. She has a bachelors from Berkeley's Electrical Engineering and Computer Science program where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision PhD program at Cornell University.

Building an MLOps Toolchain The Fundamentals
Artificial intelligence and machine learning are the latest “must-have” technologies in helping organizations realize better business outcomes. However, most organizations don’t have a structured process for rolling out AI-infused applications. Data scientists create AI models in isolation from IT, which then needs to insert those models into applications—and ensure their security—to deliver any business value. In this ebook/webinar, we examine the best way to set up an MLOps process to ensure successful delivery of AI-infused applications.

Model Versioning - ModelDB

  • ModelDB: An open-source system for Machine Learning model versioning, metadata, and experiment management

Continuous Machine Learning (CML)

MLOps Tutorial #1: Intro to Continuous Integration for ML
DVCorg Learn how to use one of the most powerful ideas from the DevOps revolution, continuous integration, in your data science and machine learning projects. This hands-on tutorial shows you how to create an automatic model training & testing setup using GitHub Actions and Continuous Machine Learning (CML), two free and open-source tools in the Git ecosystem. Designed for total beginners! We'll be using: GitHub Actions: CML: Resources: Code: GitLab support:

MLOps Tutorial #3: Track ML models with Git & GitHub Actions
DVCorg In this tutorial, we'll compare ML models across two different Git branches of a project- and we'll do it in a continuous integration system (GitHub Actions) for automation superpowers! We'll cover:

- Why comparing model metrics takes more than a git diff - How pipelines, a method for making model training more reproducible, help you standardize model comparisons across Git branches - How to display a table comparing model performance to the main branch in a GitHub Pull Request

Helpful links: Dataset: Data on farmers’ adoption of climate change mitigation measures, individual characteristics, risk attitudes and social influences in a region of Switzerland Code: DVC pipelines & metrics documentation: CML project repo: DVC Discord channel:

Model Deployment Scoring

ML Model Deployment and Scoring on the Edge with Automatic ML & DF / Flink2Kafka
recorded on June 18, 2020. Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments.'s Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge. Speakers: James Medel ( - Technical Community Maker) Greg Keys ( - Solution Engineer) Kafka 2 Flink - An Apache Love Story This project has heavily inspired by two existing efforts from Data In Motion's FLaNK Stack and Data Artisan's blog on stateful streaming applications. The goal of this project is to provide insight into connecting an Apache Flink applications to Apache Kafka. Speaker: Ian R Brooks, PhD (Cloudera - Senior Solutions Engineer & Data)

Shawn Scully: Production and Beyond: Deploying and Managing Machine Learning Models
PyData NYC 2015 Machine learning has become the key component in building intelligence-infused applications. However, as companies increase the number of such deployments, the number of machine learning models that need to be created, maintained, monitored, tracked, and improved grow at a tremendous pace. This growth has lead to a huge (and well-documented) accumulation of technical debt. Developing a machine learning application is an iterative process that involves building multiple models over a dataset. The dataset itself evolves over time as new features and new data points are collected. Furthermore, once deployed, the models require updates over time. Changes in models and datasets become difficult to track over time, and one can quickly lose track of which version of the model used which data and why it was subsequently replaced. In this talk, we outline some of the key challenges in large-scale deployments of many interacting machine learning models. We then describe a methodology for management, monitoring, and optimization of such models in production, which helps mitigate the technical debt. In particular, we demonstrate how to: Track models and versions, and visualize their quality over time Track the provenance of models and datasets, and quantify how changes in data impact the models being served Optimize model ensembles in real time, based on changing data, and provide alerts when such ensembles no longer provide the desired accuracy.