Algorithm Administration

Jump to: navigation, search

YouTube ... Quora ...Google search ...Google News ...Bing News


  • Google AutoML automatically build and deploy state-of-the-art machine learning models
  • SageMaker | Amazon
  • MLOps | Microsoft ...model management, deployment, and monitoring with Azure
  • Ludwig - a Python toolbox from Uber that allows to train and test deep learning models
  • TPOT a Python library that automatically creates and optimizes full machine learning pipelines using genetic programming. Not for NLP, strings need to be coded to numerics.
  • H2O Driverless AI for automated Visualization, feature engineering, model training, hyperparameter optimization, and explainability.
  • alteryx: Feature Labs, Featuretools
  • MLBox Fast reading and distributed data preprocessing/cleaning/formatting. Highly robust feature selection and leak detection. Accurate hyper-parameter optimization in high-dimensional space. State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,…). Prediction with models interpretation. Primarily Linux.
  • auto-sklearn algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble a Bayesian hyperparameter optimization layer on top of scikit-learn. Not for large datasets.
  • Auto Keras is an open-source Python package for neural architecture search.
  • ATM -auto tune models - a multi-tenant, multi-data system for automated machine learning (model selection and tuning). ATM is an open source software library under the Human Data Interaction project (HDI) at MIT.
  • Auto-WEKA is a Bayesian hyperparameter optimization layer on top of Weka. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
  • TransmogrifAI - an AutoML library for building modular, reusable, strongly typed machine learning workflows. A Scala/SparkML library created by Salesforce for automated data cleansing, feature engineering, model selection, and hyperparameter optimization
  • RECIPE - a framework based on grammar-based genetic programming that builds customized scikit-learn classification pipelines.
  • AutoMLC Automated Multi-Label Classification. GA-Auto-MLC and Auto-MEKAGGP are freely-available methods that perform automated multi-label classification on the MEKA software.
  • Databricks MLflow an open source framework to manage the complete Machine Learning lifecycle using Managed MLflow as an integrated service with the Databricks Unified Analytics Platform... ...manage the ML lifecycle, including experimentation, reproducibility and deployment
  • SAS Viya automates the process of data cleansing, data transformations, feature engineering, algorithm matching, model training and ongoing governance.
  • Comet ML ...self-hosted and cloud-based meta machine learning platform allowing data scientists and teams to track, compare, explain and optimize experiments and models
  • Domino Model Monitor (DMM) | Domino ...monitor the performance of all models across your entire organization
  • Weights and Biases ...experiment tracking, model optimization, and dataset versioning
  • SigOpt ...optimization platform and API designed to unlock the potential of modeling pipelines. This fully agnostic software solution accelerates, amplifies, and scales the model development process
  • DVC ...Open-source Version Control System for Machine Learning Projects
  • ModelOp Center | ModelOp
  • Moogsoft and Red Hat Ansible Tower
  • DSS | Dataiku
  • Model Manager | SAS
  • Machine Learning Operations (MLOps) | DataRobot highly accurate predictive models with full transparency
  • Metaflow, Netflix and AWS open source Python library

Master Data Management (MDM)

YouTube search... ...Google search Feature Store / Data Lineage / Data Catalog

How is AI changing the game for Master Data Management?
Tony Brownlee talks about the ability to inspect and find data quality issues as one of several ways cognitive computing technology is influencing master data management.

Introducing Roxie. Data Management Meets Artificial Intelligence.
Introducing Roxie, Rubrik's Intelligent Personal Assistant. A hackathon project by Manjunath Chinni. Created in 10 hours with the power of Rubrik APIs.

DAS Webinar: Master Data Management – Aligning Data, Process, and Governance
Getting MDM “right” requires a strategic mix of Data Architecture, business process, and Data Governance.

How to manage Artificial Intelligence Data Collection [Enterprise AI Governance Data Management ]
Mind Data AI AI researcher Brian Ka Chan's AI ML DL introduction series. Collecting Data is an important step to the success of Artificial intelligence Program in the 4th industrial Revolution. In the current advancement of Artificial Intelligence technologies, machine learning has always been associated with AI, and in many cases, Machine Learning is considered equivalent of Artifical Intelligence. Machine learning is actually a subset of Artificial Intelligence, this discipline of machine learning relies on data to perform AI training, supervised or unsupervised. On average, 80% of the time that my team spent in AI or Data Sciences projects is about preparing data. Preparing data includes, but not limited to: Identify Data required, Identify the availability of data, and location of them, Profiling the data, Source the data, Integrating the data, Cleanse the data, and prepare the data for learning

What is Data Governance?
Understand what problems a Data Governance program is intended to solve and why the Business Users must own it. Also learn some sample roles that each group might need to play.

Top 10 Mistakes in Data Management
Come learn about the mistakes we most often see organizations make in managing their data. Also learn more about Intricity's Data Management Health Check which you can download here: To Talk with a Specialist go to:


YouTube search... ...Google search

How to manage model and data versions
Raj Ramesh Managing data versions and model versions is critical in deploying machine learning models. This is because if you want to re-create the models or go back to fix them, you will need both the data that went into training the model and as well as the model hyperparameters itself. In this video I explained that concept. Here's what I can do to help you. I speak on the topics of architecture and AI, help you integrate AI into your organization, educate your team on what AI can or cannot do, and make things simple enough that you can take action from your new knowledge. I work with your organization to understand the nuances and challenges that you face, and together we can understand, frame, analyze, and address challenges in a systematic way so you see improvement in your overall business, is aligned with your strategy, and most importantly, you and your organization can incrementally change to transform and thrive in the future. If any of this sounds like something you might need, please reach out to me at, and we'll get back in touch within a day. Thanks for watching my videos and for subscribing.

Version Control for Data Science Explained in 5 Minutes (No Code!)
In this code-free, five-minute explainer for complete beginners, we'll teach you about Data Version Control (DVC), a tool for adapting Git version control to machine learning projects.

- Why data science and machine learning badly need tools for versioning - Why Git version control alone will fall short - How DVC helps you use Git with big datasets and models - Cool features in DVC, like metrics, pipelines, and plots

Check out the DVC open source project on GitHub:

How to easily set up and version your Machine Learning pipelines, using Data Version Control (DVC) and Machine Learning Versioning (MLV)-tools | PyData Amsterdam 2019
Stephanie Bracaloni, Sarah Diot-Girard Have you ever heard about Machine Learning versioning solutions? Have you ever tried one of them? And what about automation? Come with us and learn how to easily build versionable pipelines! This tutorial explains through small exercises how to setup a project using DVC and MLV-tools.

Alessia Marcolini: Version Control for Data Science | PyData Berlin 2019
Track:PyData Are you versioning your Machine Learning project as you would do in a traditional software project? How are you keeping track of changes in your datasets? Recorded at the PyConDE & PyData Berlin 2019 conference.

Introduction to Pachyderm
Joey Zwicker A high-level introduction to the core concepts and features of Pachyderm as well as a quick demo. Learn more at:

E05 Pioneering version control for data science with Pachyderm co-founder and CEO Joe Doliner
5 years ago, Joe Doliner and his co-founder Joey Zwicker decided to focus on the hard problems in data science, rather than building just another dashboard on top of the existing mess. It's been a long road, but it's really payed off. Last year, after an adventurous journey, they closed a $10m Series A led by Benchmark. In this episode, Erasmus Elsner is joined by Joe Doliner to explore what Pachyderm does and how it scaled from just an idea into a fast growing tech company. Listen to the podcast version

Model Versioning - ModelDB

  • ModelDB: An open-source system for Machine Learning model versioning, metadata, and experiment management

Model Versioning Done Right: A ModelDB 2.0 Walkthrough
In a field that is rapidly evolving but lacks infrastructure to operationalize and govern models, ModelDB 2.0 provides the ability to version the full modeling process including the underlying data and training configurations, ensuring that teams can always go back and re-create a model, whether to remedy a production incident or to answer a regulatory query.

Model Versioning Why, When, and How
Models are the new code. While machine learning models are increasingly being used to make critical product and business decisions, the process of developing and deploying ML models remain ad-hoc. In the “wild-west” of data science and ML tools, versioning, management, and deployment of models are massive hurdles in making ML efforts successful. As creators of ModelDB, an open-source model management solution developed at MIT CSAIL, we have helped manage and deploy a host of models ranging from cutting-edge deep learning models to traditional ML models in finance. In each of these applications, we have found that the key to enabling production ML is an often-overlooked but critical step: model versioning. Without a means to uniquely identify, reproduce, or rollback a model, production ML pipelines remain brittle and unreliable. In this webinar, we draw upon our experience with ModelDB and Verta to present best practices and tools for model versioning and how having a robust versioning solution (akin to Git for code) can streamlining DS/ML, enable rapid deployment, and ensure high quality of deployed ML models.


YouTube search... ...Google search

In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training. Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. Hyperparameter (machine learning) | Wikipedia

Machine learning algorithms train on data to find the best set of weights for each independent variable that affects the predicted value or class. The algorithms themselves have variables, called hyperparameters. They’re called hyperparameters, as opposed to parameters, because they control the operation of the algorithm rather than the weights being determined. The most important hyperparameter is often the learning rate, which determines the step size used when finding the next set of weights to try when optimizing. If the learning rate is too high, the gradient descent may quickly converge on a plateau or suboptimal point. If the learning rate is too low, the gradient descent may stall and never completely converge. Many other common hyperparameters depend on the algorithms used. Most algorithms have stopping parameters, such as the maximum number of epochs, or the maximum time to run, or the minimum improvement from epoch to epoch. Specific algorithms have hyperparameters that control the shape of their search. For example, a Random Forest (or) Random Decision Forest Classifier has hyperparameters for minimum samples per leaf, max depth, minimum samples at a split, minimum weight fraction for a leaf, and about 8 more. Machine learning algorithms explained | Martin Heller - InfoWorld


Hyperparameter Tuning

Hyperparameters are the variables that govern the training process. Your model parameters are optimized (you could say "tuned") by the training process: you run data through the operations of the model, compare the resulting prediction with the actual value for each data instance, evaluate the accuracy, and adjust until you find the best combination to handle the problem. These algorithms automatically adjust (learn) their internal parameters based on data. However, there is a subset of parameters that is not learned and that have to be configured by an expert. Such parameters are often referred to as “hyperparameters” — and they have a big impact ...For example, the tree depth in a decision tree model and the number of layers in an artificial Neural Network are typical hyperparameters. The performance of a model can drastically depend on the choice of its hyperparameters. Machine learning algorithms and the art of hyperparameter selection - A review of four optimization strategies | Mischa Lisovyi and Rosaria Silipo - TNW

There are four commonly used optimization strategies for hyperparameters:

  1. Bayesian optimization
  2. Grid search
  3. Random search
  4. Hill climbing

Bayesian optimization tends to be the most efficient. You would think that tuning as many hyperparameters as possible would give you the best answer. However, unless you are running on your own personal hardware, that could be very expensive. There are diminishing returns, in any case. With experience, you’ll discover which hyperparameters matter the most for your data and choice of algorithms. Machine learning algorithms explained | Martin Heller - InfoWorld

Hyperparameter Optimization libraries:


  • Optimizer type
  • Learning rate (fixed or not)
  • Epochs
  • Regularization rate (or not)
  • Type of Regularization - L1, L2, ElasticNet
  • Search type for local minima
    • Gradient descent
    • Simulated
    • Annealing
    • Evolutionary
  • Decay rate (or not)
  • Momentum (fixed or not)
  • Nesterov Accelerated Gradient momentum (or not)
  • Batch size
  • Fitness measurement type
  • Stop criteria

Automated Learning

YouTube search... ...Google search

Several production machine-learning platforms now offer automatic hyperparameter tuning. Essentially, you tell the system what hyperparameters you want to vary, and possibly what metric you want to optimize, and the system sweeps those hyperparameters across as many runs as you allow. (Google Cloud hyperparameter tuning extracts the appropriate metric from the TensorFlow model, so you don’t have to specify it.)

An emerging class of data science toolkit that is finally making machine learning accessible to business subject matter experts. We anticipate that these innovations will mark a new era in data-driven decision support, where business analysts will be able to access and deploy machine learning on their own to analyze hundreds and thousands of dimensions simultaneously. Business analysts at highly competitive organizations will shift from using visualization tools as their only means of analysis, to using them in concert with AML. Data visualization tools will also be used more frequently to communicate model results, and to build task-oriented user interfaces that enable stakeholders to make both operational and strategic decisions based on output of scoring engines. They will also continue to be a more effective means for analysts to perform inverse analysis when one is seeking to identify where relationships in the data do not exist. 'Five Essential Capabilities: Automated Machine Learning' | Gregory Bonnette

H2O Driverless AI automatically performs feature engineering and hyperparameter tuning, and claims to perform as well as Kaggle masters. AmazonML SageMaker supports hyperparameter optimization. Microsoft Azure Machine Learning AutoML automatically sweeps through features, algorithms, and hyperparameters for basic machine learning algorithms; a separate Azure Machine Learning hyperparameter tuning facility allows you to sweep specific hyperparameters for an existing experiment. Google Cloud AutoML implements automatic deep transfer learning (meaning that it starts from an existing Deep Neural Network (DNN) trained on other data) and neural architecture search (meaning that it finds the right combination of extra network layers) for language pair translation, natural language classification, and image classification. Review: Google Cloud AutoML is truly automated machine learning | Martin Heller

Hyperparameter Tuning with Amazon SageMaker's Automatic Model Tuning - AWS Online Tech Talks
Learn how to use Automatic Model Tuning with Amazon SageMaker to get the best machine learning model for your dataset. Training machine models requires choosing seemingly arbitrary hyperparameters like learning rate and regularization to control the learning algorithm. Traditionally, finding the best values for the hyperparameters requires manual trial-and-error experimentation. Amazon SageMaker makes it easy to get the best possible outcomes for your machine learning models by providing an option to create hyperparameter tuning jobs. These jobs automatically search over ranges of hyperparameters to find the best values. Using sophisticated Bayesian optimization, a meta-model is built to accurately predict the quality of your trained model from the hyperparameters. Learning Objectives:

- Understand what hyperparameters are and what they do for training machine learning models
- Learn how to use Automatic Model Tuning with Amazon SageMaker for creating hyperparameter tuning of your training jobs
- Strategies for choosing and iterating on tuning ranges of a hyperparameter tuning job with Amazon SageMaker 

Automatic Hyperparameter Optimization in Keras for the MediaEval 2018 Medico Multimedia Task
Rune Johan Borgli, Pål Halvorsen, Michael Riegler, Håkon Kvale Stensland, Automatic Hyperparameter Optimization in Keras for the MediaEval 2018 Medico Multimedia Task. Proc. of MediaEval 2018, 29-31 October 2018, Sophia Antipolis, France. Abstract: This paper details the approach to the MediaEval 2018 Medico Multimedia Task made by the Rune team. The decided upon approach uses a work-in-progress hyperparameter optimization system called Saga. Saga is a system for creating the best hyperparameter finding in Keras, a popular machine learning framework, using Bayesian optimization and transfer learning. In addition to optimizing the Keras classifier configuration, we try manipulating the dataset by adding extra images in a class lacking in images and splitting a commonly misclassified class into two classes. Presented by Rune Johan Borgli


YouTube search... ...Google search

New cloud software suite of machine learning tools. It’s based on Google’s state-of-the-art research in image recognition called Neural Architecture Search (NAS). NAS is basically an algorithm that, given your specific dataset, searches for the most optimal neural network to perform a certain task on that dataset. AutoML is then a suite of machine learning tools that will allow one to easily train high-performance deep networks, without requiring the user to have any knowledge of deep learning or AI; all you need is labelled data! Google will use NAS to then find the best network for your specific dataset and task. AutoKeras: The Killer of Google’s AutoML | George Seif - KDnuggets

Automatic Machine Learning (AML)


DARTS: Differentiable Architecture Search

YouTube search... ...Google search


Youtube search... ...Google search

Machine learning capabilities give IT operations teams contextual, actionable insights to make better decisions on the job. More importantly, AIOps is an approach that transforms how systems are automated, detecting important signals from vast amounts of data and relieving the operator from the headaches of managing according to tired, outdated runbooks or policies. In the AIOps future, the environment is continually improving. The administrator can get out of the impossible business of refactoring rules and policies that are immediately outdated in today’s modern IT environment. Now that we have AI and machine learning technologies embedded into IT operations systems, the game changes drastically. AI and machine learning-enhanced automation will bridge the gap between DevOps and IT Ops teams: helping the latter solve issues faster and more accurately to keep pace with business goals and user needs. How AIOps Helps IT Operators on the Job | Ciaran Byrne - Toolbox

MLOps #28 ML Observability // Aparna Dhinakaran - Chief Product Officer at Arize AI As more and more machine learning models are deployed into production, it is imperative we have better observability tools to monitor, troubleshoot, and explain their decisions. In this talk, Aparna Dhinakaran, Co-Founder, CPO of Arize AI (Berkeley-based startup focused on ML Observability), will discuss the state of the commonly seen ML Production Workflow and its challenges. She will focus on the lack of model observability, its impacts, and how Arize AI can help. This talk highlights common challenges seen in models deployed in production, including model drift, data qualitydata quality issues, distribution changes, outliers, and bias. The talk will also cover best practices to address these challenges and where observability and explainability can help identify model issues before they impact the business. Aparna will be sharing a demo of how the Arize AI platform can help companies validate their models performance, provide real-time performance monitoring and alerts, and automate troubleshooting of slices of model performance with explainability. The talk will cover best practices in ML Observability and how companies can build more transparency and trust around their models. Aparna Dhinakaran is Chief Product Officer at Arize AI, a startup focused on ML Observability. She was previously an ML engineer at Uber, Apple, and Tubemogul (acquired by Adobe). During her time at Uber, she built a number of core ML Infrastructure platforms including Michaelangelo. She has a bachelors from Berkeley's Electrical Engineering and Computer Science program where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision PhD program at Cornell University.

DevOps for AI - Microsoft
DOES18 Las Vegas — Because the AI field is young compared to traditional software development, best practices and solutions around life cycle management for these AI systems have yet to solidify. This talk will discuss how we did this at Microsoft in different departments (one of them being Bing). DevOps for AI - Microsoft Gabrielle Davelaar, Data Platform Solution Architect/A.I., Microsoft Jordan Edwards, Senior Program Manager, Microsoft Gabrielle Davelaar is a Data Platform Solution Architect specialized in Artificial Intelligence solutions at Microsoft. She was originally trained as a computational neuroscientist. Currently she helps Microsoft’s top 15 Fortune 500 customers build trustworthy and scalable platforms able to create the next generation of A.I. applications. While helping customers with their digital A.I. transformation, she started working with engineering to tackle one key issue: A.I. maturity. The demand for this work is high, and Gabrielle is now working on bringing together the right people to create a full offering. Her aspirations are to be a technical leader in the healthcare digital transformation. Empowering people to find new treatments using A.I. while insuring privacy and taking data governance in consideration. Jordan Edwards is a Senior Program Manager on the Azure AI Platform team. He has worked on a number of highly performant, globally distributed systems across Bing, Cortana and Microsoft Advertising and is currently working on CI/CD experiences for the next generation of Azure Machine Learning. Jordan has been a key driver of dev-ops modernization in AI+R, including but not limited to: moving to Git, moving the organization at large to CI/CD, packaging and build language modernization, movement from monolithic services to microservice platforms and driving for a culture of friction free devOps and flexible engineering culture. His passion is to continue driving Microsoft towards a culture which enables our engineering talent to do and achieve more. DOES18 Las Vegas DOES 2018 US DevOps Enterprise Summit 2018

Productionizing Machine Learning with AI Platform Pipelines (Cloud AI Huddle)
In this virtual edition of AI Huddle, Yufeng introduces Cloud AI Platform Pipelines, a newly launched product from GCP. We'll first cover the basics of what it is and how to get it set up, and then dive deeper into how to operationalize your ML pipeline in this environment, using the Kubeflow Pipelines SDK. All the code we'll be using is already in the open, so boot up your GCP console and try it out yourself!

How AIOps is transforming customer experience by breaking down DevOps silos at KPN
DevOps means dev fails fast and ops fails never. The challenge is that the volume of data exceeds human ability to analyze it and take action before user experience degrades. In this session, Dutch telecom KPN shares how AIOps from Broadcom is helping connect operational systems and increase automation across their toolchain to breakdown DevOps silos and provide faster feedback loops by preventing problems from getting into production.

Getting to Sub-Second Incident Response with AIOps (FutureStack19)
Machine learning and artificial intelligence are revolutionizing the way operations teams work. In this talk, we’ll explore the incident management practices that can get you toward zero downtime, and talk about how New Relic can help you along that path. Speaker: Dor Sasson, Senior Product Manager AIOps, New Relic

AWS re:Invent 2018: [REPEAT 1] AIOps: Steps Towards Autonomous Operations (DEV301-R1)
In this session, learn how to architect a predictive and preventative remediation solution for your applications and infrastructure resources. We show you how to collect performance and operational intelligence, understand and predict patterns using AI & ML, and fix issues. We show you how to do all this by using AWS native solutions: Amazon SageMaker and Amazon CloudWatch.

Machine Learning & AIOps: Why IT Operations & Monitoring Teams Should Care
In this webinar, we break down what machine learning is, what it can do for your organization, and questions to ask when evaluating ML tools. For more information - please visit out our product pages:

Anthony Bulk @ NTT DATA, "AI/ML Democratization"
North Technology People would like to welcome you to the CAMDEA Digital Forum for Tuesday 15th September Presenter: Anthony Bulk Job title: Director AI & Data Presentation: AI/ML Democratization

AIOps: The next 5 years
A Conversation with Will Cappelli, CTO of EMEA, Moogsoft & Rich Lane, Senior Analyst, Forrester In this webinar, the speakers discuss the role that AIOps can and will play in the enterprise of the future, how the scope of AIOps platforms will expand, and what new functionality may be deployed. Watch this webinar to learn how AIOps will enable and support the digitalization of key business processes, and what new AI technologies and algorithms are likely to have the most impact on the continuing evolution of AIOps. For more information, visit our website

Building an MLOps Toolchain The Fundamentals
Artificial intelligence and machine learning are the latest “must-have” technologies in helping organizations realize better business outcomes. However, most organizations don’t have a structured process for rolling out AI-infused applications. Data scientists create AI models in isolation from IT, which then needs to insert those models into applications—and ensure their security—to deliver any business value. In this ebook/webinar, we examine the best way to set up an MLOps process to ensure successful delivery of AI-infused applications.

Webinar: MLOps automation with Git Based CI/CD for ML
Deploying AI/ML based applications is far from trivial. On top of the traditional DevOps challenges, you need to foster collaboration between multidisciplinary teams (data-scientists, data/ML engineers, software developers and DevOps), handle model and experiment versioning, data versioning, etc. Most ML/AI deployments involve significant manual work, but this is changing with the introduction of new frameworks that leverage cloud-native paradigms, Git and Kubernetes to automate the process of ML/AI-based application deployment. In this session we will explain how ML Pipelines work, the main challenges and the different steps involved in producing models and data products (data gathering, preparation, training/AutoML, validation, model deployment, drift monitoring and so on). We will demonstrate how the development and deployment process can be greatly simplified and automated. We’ll show how you can: a. maximize the efficiency and collaboration between the various teams, b. harness Git review processes to evaluate models, and c. abstract away the complexity of Kubernetes and DevOps. We will demo how to enable continuous delivery of machine learning to production using Git, CI frameworks (e.g. GitHub Actions) with hosted Kubernetes, Kubeflow, MLOps orchestration tools (MLRun), and Serverless functions (Nuclio) using real-world application examples. Presenter: Yaron Haviv, Co-Founder and CTO @Iguazio

How RealPage Leveraged Full Stack Visibility and Integrated AIOps for SaaS Innovation and Customer S
Development and operation teams continue to struggle with having a unified view of their applications and infrastructure. In this webinar, you'll learn how RealPage, the industry leader in SaaS-based Property Management Solutions, leverages an integrated AppDynamics and Virtana AIOps solution to deliver a superior customer experience by managing the performance of its applications as well as its infrastructure. RealPage must ensure its applications and infrastructure are always available and continuously performing, while constantly innovating to deliver new capabilities to sustain their market leadership. The combination of their Agile development process and highly virtualized infrastructure environment only adds to the complexity of managing both. To meet this challenge, RealPage is leveraging the visibility and AIOps capabilities delivered by the integrated solutions from AppDynamics and Virtana.

Machine Learning on Kubernetes with Kubeflow
[[Google[[ Cloud Platform Join Fei and Ivan as they talk to us about the benefits of running your TensorFlow models in Kubernetes using Kubeflow. Working on a cool project and want to get in contact with us? Fill out Don't forget to subscribe to the channel! → this form → Watch more Take5 episodes here →

PipelineAI End-to-End TensorFlow Model Training + Deploying + Monitoring + Predicting (Demo)
100% open source and reproduce-able on your laptop through Docker - or in production through Kubernetes! Details at and End-to-end pipeline demo: Train and deploy a TensorFlow model from research to live production. Includes full metrics and insight into the offline training and online predicting phases.

Deep Learning Pipelines: Enabling AI in Production
Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is an Apache Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipelines scales out many compute-intensive deep learning tasks. In this talk, we discuss the philosophy behind Deep Learning Pipelines, as well as the main tools it provides, how they fit into the deep learning ecosystem, and how they demonstrate Spark's role in deep learning. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Website:

PipelineAI: High Performance Distributed TensorFlow AI + GPU + Model Optimizing Predictions
We will each build an end-to-end, continuous TensorFlow AI model training and deployment pipeline on our own GPU-based cloud instance. At the end, we will combine our cloud instances to create the LARGEST Distributed TensorFlow AI Training and Serving Cluster in the WORLD! Pre-requisites Just a modern browser and an internet connection. We'll provide the rest! Agenda Spark ML TensorFlow AI Storing and Serving Models with HDFS Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out CUDA + cuDNN GPU Development Overview TensorFlow Model Checkpointing, Saving, Exporting, and Importing Distributed TensorFlow AI Model Training (Distributed TensorFlow) TensorFlow's Accelerated Linear Algebra Framework (XLA) TensorFlow's Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler Centralized Logging and Visualizing of Distributed TensorFlow Training (Tensorboard) Distributed TensorFlow AI Model Serving/Predicting (TensorFlow Serving) Centralized Logging and Metrics Collection (Prometheus, Grafana) Continuous TensorFlow AI Model Deployment (TensorFlow, Airflow) Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes) High-Performance and Fault-Tolerant Micro-services (NetflixOSS)

Bringing Your Data Pipeline into The Machine Learning Era - Chris Gaun & Jörg Schad, Mesosphere
Want to view more sessions and keep the conversations going? Join us for KubeCon + CloudNativeCon North America in Seattle, December 11 - 13, 2018 ( or in Shanghai, November 14-15 ( Bringing Your Data Pipeline into The Machine Learning Era - Chris Gaun, Mesosphere (Intermediate Skill Level) Kubeflow is a new tool that makes it easy to run distributed machine learning solutions (e.g. Tensorflow) on Kubernetes. However, much of the data that can feed machine learning algorithms is already in existing distributed data stores. This presentation shows how to connect existing distributed data services running on Apache Mesos to Tensorflow on Kubernetes using the Kubeflow tool. Chris Gaun will show you how this existing data can now leverage machine learning, such as Tensorflow, on Kubernetes using the Kubeflow tool. These lessons can be extrapolated to any local distributed data. Chris Gaun is a CNCF ambassador and product marketing manager at Mesosphere. He has presented at Kubecon in 2016 and has put on over 40 free Kubernetes workshops across US and EU in 2017. About Jörg He is a technical lead at Mesosphere The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects. Learn more at The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects.

Deep Dive into Deep Learning Pipelines - Sue Ann Hong & Tim Hunter
"Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is a Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipelines scales out many compute-intensive deep learning tasks. In this talk we dive into - the various use cases of Deep Learning Pipelines such as prediction at massive scale, transfer learning, and hyperparameter tuning, many of which can be done in just a few lines of code. - how to work with complex data such as images in Spark and Deep Learning Pipelines. - how to deploy deep learning models through familiar Spark APIs such as MLlib and Spark SQL to empower everyone from machine learning practitioners to business analysts. Finally, we discuss integration with popular deep learning frameworks. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Connect with us: Website:

How to Build Machine Learning Pipelines in a Breeze with Docker
Docker Let's review and see in action a few projects based on docker containers that could help you prototype ML based projects (detect faces, nudity, evaluate sentiment, building a smart bot) within a few hours. Let´s see in practice where containers could help in this type of projects.

End to End Streaming ML Recommendation Pipeline Spark 2 0, Kafka, TensorFlow Workshop
End to End Streaming ML Recommendation Pipeline Spark 2 0, Kafka, TensorFlow Workshop Presented at Bangalore Apache Spark Meetup by Chris Fregly on 10/12/2016. Connect with Chris Fregly at

Machine Learning with caret: Building a pipeline
Building a Machine Learning model with RandomForest and caret in R.

Let’s Write a Pipeline - Machine Learning Recipes #4
Google Developer In this episode, we’ll write a basic pipeline for supervised learning with just 12 lines of code. Along the way, we'll talk about training and testing data. Then, we’ll work on our intuition for what it means to “learn” from data. Check out TensorFlow Playground:

Kevin Goetsch | Deploying Machine Learning using sklearn pipelines
PyData Chicago 2016 Sklearn pipeline objects provide an framework that simplifies the lifecycle of data science models. This talk will cover the how and why of encoding feature engineering, estimators, and model ensembles in a single deployable object.

Building online churn prediction ML model using XGBoost, Spark, Featuretools, Python and GCP
Mariusz Jacyno This video shows step by step how to build, evaluate and deploy churn prediction model using Python, Spark, automated feature engineering (Featuretools), extreme gradient boosting algorithm (XGBoost) and Google ML service.

MLOps #34 Owned By Statistics: How Kubeflow & MLOps Can Help Secure ML Workloads // David Aronchick
While machine learning is spreading like wildfire, very little attention has been paid to the ways that it can go wrong when moving from development to production. Even when models work perfectly, they can be attacked and/or degrade quickly if the data changes. Having a well understood MLOps process is necessary for ML security! Using Kubeflow, we demonstrated how to the common ways machine learning workflows go wrong, and how to mitigate them using MLOps pipelines to provide reproducibility, validation, versioning/tracking, and safe/compliant deployment. We also talked about the direction for MLOps as an industry, and how we can use it to move faster, with less risk, than ever before. David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, he led product management for Kubernetes on behalf of Google, launched Google Kubernetes Engine, and co-founded the Kubeflow project. He has also worked at Microsoft, Amazon and Chef and co-founded three startups. When not spending too much time in service of electrons, he can be found on a mountain (on skis), traveling the world (via restaurants) or participating in kid activities, of which there are a lot more than he remembers than when he was that age.

An introduction to MLOps on Google Cloud
The enterprise machine learning life cycle is expanding as firms increasingly look to automate their production ML systems. MLOps is an ML engineering culture and practice that aims at unifying ML system development and ML system operation enabling shorter development cycles, increased deployment velocity, and more dependable releases in close alignment with business objectives. Learn how to construct your systems to standardize and manage the life cycle of machine learning in production with MLOps on Google Cloud. Speaker: Nate Keating Watch more: Google Cloud Next ’20: OnAir →

MLOps: Planning for the new normal
Join us for a conversation with Diego Oppenheimer, CEO of Algorithmia, and Kahini Shah, Investor at Gradient Ventures, as they discuss the impacts that the global pandemic is having on AI/ML.

Artificial Intelligence (AI) Operations in ServiceNow | Share The Wealth
Mike Desmond of GlideFast Consulting explains AI Operations in ServiceNow in this Share the Wealth session.

GitOps for MLOps - David Aronchick
GitOps Days 2020 Day 1, Session 8 (May 20) While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on some best practices to bring GitOps to MLOps for production-grade applications and possibilities for the future. David Aronchick, Head of Open Source ML Strategy, Microsoft David Aronchick leads open source machine learning strategy at Azure. He spends most of his time helping humans convince machines to be smarter. (He’s only moderately successful at this.) Previously, he led product management for Kubernetes on behalf of Google, launched Google Kubernetes Engine, and cofounded the Kubeflow project. He’s also worked at Microsoft, Amazon, and Chef and cofounded three startups. When not spending too much time in service of electrons, he can be found on a mountain (on skis), traveling the world (via restaurants), or participating in kid activities, of which there are a lot more than he remembers when he was that age.

MLOps Demo — A Day in the Life
DataRobot MLOps is the way to manage all your production models. In this demo, we will walk through a typical day-in-a-life of a financial services company running a Center of Excellence for delivering ML in Production. We'll see the types of administrative activities our MLOps engineer George performs, and understand how he works with Mary, our lead Data Scientist. George and Mary rely on each other to effectively build, deploy, and manage models for their company over their full lifecycle. Visit the MLOps product page to find out more:

MLOPs topics in the cloud
Pragmatic AI Labs A lecture on emerging topics on MLOps in the cloud including Azure, GCP and Sagemaker.

Commit Virtual 2020: MLOps DevOps for Machine Learning
GitLab Speaker: Monmayuri Ray The practice of Devops - developing software and operationalizing the development cycle has been evolving for over a decade. Now, a new addition has joined this holistic development cycle: Machine Predictions. The emerging art and science of machine learning algorithms integrated into current operational systems is opening new possibilities for engineers, scientists, and architects in the tech world. This presentation will take the audience on a journey in understanding the fundamentals of orchestrating machine predictions using MLOps in this ever-changing, agile world of software development. You’ll learn how to excel at the craft of DevOps for Machine Learning (ML). Monmayuri will unpack the theoretical constructs and show they apply to real-world scenarios. Get in touch with Sales:

AIDevFest20: Machine Learning Design Patterns for MLOps
Speaker: Valliappa Lakshmanan (LAK), Google Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects. Anyone designing infrastructure for machine learning will have to be able to provide easy ways for the data engineers, data scientists, and ML engineers to implement these, and other, design patterns. Website:

Python Lunch Break presents MLOps
Deploy your Chatbot using CI/CD by William Arias Leverage DevOps practices in your next Chatbot deployments, create pipelines for training and testing your Chatbot before automating the deployment to staging or production. NLP developers can focus on the Chatbot development and worry less about infrastructure and environments configuration Bio: Born and raised in Colombia, studied Electronic Engineering and specialized in Digital Electronics, former exchange student in Czech Technical University in the faculty of Data Science. I started my career working in Intel as Field Engineer, then moved to Oracle in Bogota. After 5 years in Oracle as Solution Architect I decided to get deeper in my knowledge of Machine Learning and Databases, relocated to Prague in Czech Republic, where I studied Data Science for a while from the academia perspective, later jumped to Solution Architect in CA technologies, later moved on to Developer of Automations and NLP in Adecco. Currently I work at Gitlab as Technical Marketing Engineer, teaching and evangelizing developers about DevOps

Continuous Machine Learning (CML)

MLOps Tutorial #1: Intro to Continuous Integration for ML
DVCorg Learn how to use one of the most powerful ideas from the DevOps revolution, continuous integration, in your data science and machine learning projects. This hands-on tutorial shows you how to create an automatic model training & testing setup using GitHub Actions and Continuous Machine Learning (CML), two free and open-source tools in the Git ecosystem. Designed for total beginners! We'll be using: GitHub Actions: CML: Resources: Code: GitLab support:

MLOps Tutorial #3: Track ML models with Git & GitHub Actions
DVCorg In this tutorial, we'll compare ML models across two different Git branches of a project- and we'll do it in a continuous integration system (GitHub Actions) for automation superpowers! We'll cover:

- Why comparing model metrics takes more than a git diff - How pipelines, a method for making model training more reproducible, help you standardize model comparisons across Git branches - How to display a table comparing model performance to the main branch in a GitHub Pull Request

Helpful links: Dataset: Data on farmers’ adoption of climate change mitigation measures, individual characteristics, risk attitudes and social influences in a region of Switzerland Code: DVC pipelines & metrics documentation: CML project repo: DVC Discord channel:


Youtube search... ...Google search

DevSecOps (also known as SecDevOps and DevOpsSec) is the process of integrating secure development best practices and methodologies into continuous design, development, deployment and integration processes

Dull, Dangerous, and/or Dirty

DevSecOps : What, Why and How
Black Hat In this talk, we shall focus on how a DevOps pipeline can easily be metamorphosed into a DevSecOps and the benefits which can be achieved with this transformation. The talk (assisted with various demos) will focus on developing a DevSecOps pipeline using free/open-source tools in various deployment platforms, i.e. on-premise, cloud native and hybrid scenarios. By Anant Shrivastava Full Abstract & Presentation Material

DevSecOps State of the Union
Clint Gibler, Research Director, NCC Group It’s tough to keep up with the DevSecOps resources out there, or even know where to start. This talk will summarize and distill the unique tips and tricks, lessons learned, and tools discussed in dozens of blog posts and more than 50 conference talks over the past few years, and combine it with knowledge gained from in-person discussions with security leaders at companies with mature security programs. Pre-Requisites: General understanding of the fundamental areas of modern application security programs, including threat modeling, secure code reviews, security training, building security culture/developing security champions, security scanning (static and dynamic analysis tools), monitoring and logging in production, etc. Understanding of how software generally moves from development to production in agile environments that embrace CI/CD practices. Basic understanding of the principles of network/infrastructure and cloud security.

Webinar: Critical DevSecOps considerations for multi-cloud Kubernetes
Cloud Native Computing Foundation (CNCF) The distributed nature of Kubernetes has turned both legacy infrastructure and traditional cybersecurity approaches on their heads. Organizations building cloud-native environments in their own data centers grapple with operationalizing and scaling Kubernetes clusters, and then ensuring system-wide security from the infrastructure layer all the way up to each container. In this webinar, you’ll hear from two cloud-native experts in infrastructure and security who will offer up valuable insights on: - How containerized applications use compute, storage, and network resources differently than do legacy applications - Why hyper-converged infrastructure is suited for Kubernetes - How a Kubernetes stack should be instrumented for observability - Best practices for implementing system-wide security for multi-cloud Kubernetes Presented by Nutanix and Sysdig Presenters: Sylvain Huguet, Sr. Product Manager - Karbon/Kubernetes @Nutanix Loris Digioanni, CTO & Founder @Sysdig.

DevOps for Beginners Where to Start
DevOpsTV Tim Beattie Global Head of Product, Open Innovation Labs, RedHat If you're beginning to learn more about DevOps, you may be confused about where to start or where to focus. The term “DevOps” refers to this idea that we no longer have pure and separated development streams and operations streams – and to a set of DevOps practices that optimizes and improves both functions. Join this webinar to explore how to get going with this cross-functional way of working that breaks down walls, removes bottlenecks, improves speed of delivery, and increases experimentation.

DevSecOps Days DC 2020
DevSecOps Days is a global series of one and two day conferences helping to educate, evolve, and explore concepts around developing security as code. Join us online to meet fellow practitioners integrating security into their DevOps practices, learn about their journeys, share ideas on integrating security into your teams, and trade insights on automating security tools in your DevOps pipeline.

Welcome (Hasan Yasar) 00:00--10:47

The Need for Threat Modeling in a DevSecOps World (Simone Curzi) 10:48--44:50

Close the Gap: Bringing Engineering & Security/Compliance Together (Sare Ehmann) 49:15--1:20:29

Delivering DevSecOps Value Faster? Use Metrics and Standards (Steven Woodward) 1:24:15--1:54:13

Navigating DevOps Requirements with Risk Management (Victoria (Vicky) Hailey) 2:02:03--2:25:05

Make it Personal to Make it Happen (Ruth Lennon) 02:34:11--03:03:00

Keynote: Successful DevSecOps Implementation for Production Environments is a Journey Not a Milestone (Christopher Brazier) 03:44:11--04:29:24

Evolve Your DevSecOps to Manage Speed and Risk (Altaz Valani) 04:34:09--05:03:12

How Software Acquisition and DevSecOps Can Improve the Lethality of the DOD (Sean Brady) 05:14:05--05:42:48

Why Organizations Need DevSecOps Now More Than Ever (Krishna Guru) 05:47:41--06:14:05

Wrap-up (HasanYasar) 06:14:17--06:26:29

DevSecOps Days Pittsburgh 2020
Software Engineering Institute | Carnegie Mellon University Speaker Presentations (with timestamps)

Welcome and Introductions Hasan Yasar 00:00:32 – 00:17:13

Keynote: Core Values and Customer-Focused Delivery: DevOps in Special Operations Jason Turse 00:27:57 – 01:11:05

Are We Building Just Containers or Secure Containers? Arun Chaudhary 01:15:28 – 01:43:48

Are your Pipelines Secure? Lock-down Unsecured CI/CD Pipelines Angel Rivera 01:47:25 – 02:20:05

Risk: A Bias Toward Action Jason Lamb 02:30:25 – 02:55:33

The Innovation Ninja Amber Vanderburg 03:00:30 – 03:30:34

Security is an Awesome Product Feature Mark P. Hahn 04:16:57 – 04:43:08

Exhilarating Journey of Transformation into Digital Edge Himanshu Patel 04:47:57 – 05:16:12

Invited Speaker: Intentional Culture Architecture Jeff Shupack 05:20:11 – 05:40:17

Service Mess to Service Mesh Rob Richardson 05:59:56 – 06:26:35

Code to Kubernetes: Deployment Shouldn’t be an Afterthought Lakmal Warusawithana 06:32:25 – 06:53:30

Design Thinking, Security, and Your Life Alex Cowan 06:59:56 – 07:23:28

Keynote: How to Use Hacker Personas to Successfully Build DevSecOps Pipeline Robin Yeman 07:29:54 – 08:00:45

Outbrief and Closing Remarks Suzanne Miller • Hasan Yasar 08:12:18 – 08:26:42

Meet fellow practitioners integrating security into their DevOps practices. Learn about their journeys, share ideas on integrating security into your teams, and trade insights on automating security tools in your DevOps pipeline.

DevSecOps in Government

DOD Enterprise DevSecOps Initiative
CSIAC The current Department of Defense (DOD) software acquisition process is not responsive to the needs of our warfighters. Therefore, it is difficult for the DOD to keep pace with our potential adversaries and avoid falling behind them. To address this situation, the DOD is pursuing a new software development activity called the DOD Enterprise DevSecOps Initiative. This webinar will present the vision for transforming DOD software acquisition into secure, responsive software factories. It will examine and explore the utilization of modern software development processes and tools to revolutionize the Department’s ability to provide responsive, timely, and secure software capabilities for our warfighters. The focus of the effort involves exploiting automated software tools, services, and standards so warfighters can rapidly create, deploy, and operate software applications in a secure, flexible, and interoperable manner. Slides

DevSecOps Implementation in the DoD: Barriers and Enablers
Software Engineering Institute | Carnegie Mellon University Today's DOD software development and deployment is not responsive to warfighter needs. As a result, the DOD's ability to keep pace with potential adversaries is falling behind. In this webcast, panelists discuss potential enablers of and barriers to using modern software development techniques and processes in the DOD or similar segregated environments. These software development techniques and processes are as commonly known as DevSecOps.

Andrew Weiss/Harshal Dharia: Government DevSecOps with Azure and GitHub
Andrew Weiss, Federal Solutions Engineer, GitHub and Harshal Dharia, Senior Azure Specialist, Microsoft Federal presents Government DevSecOps with Azure and GitHub at the Azure Government February 2020 meetup.

Infrastructure: Foundations of the Future
GovernmentCIO Media & Research As agencies continue to modernize and move away from their legacy systems, how are they reshaping their IT infrastructure and leveraging data to transform their enterprise? Hear from top government IT leaders at the VA, DOD, and HHS about how they’re adopting cloud and data center infrastructure, strategies and capabilities to move their organizations toward the future.

Strangler Fig / Strangler Pattern

Strangulation of a legacy or undesirable solution is a safe way to phase one thing out for something better, cheaper, or more expandable. You make something new that obsoletes a small percentage of something old, and put them live together. You do some more work in the same style, and go live again (rinse, repeat). Strangler Applications | Paul Hammant studies

"Choking the Monolith − The Strangler Pattern Applied" by Tobias Goeschel (@w3ltraumpirat)
The so called "Strangler Fig" (aka "Strangler") pattern is a much cited strategy for replacing legacy systems with new, often microservice-based architectures. However, it's not actually a microservice pattern, and there are several - quite different - ways to implement it. Are you confused yet? Fear not: We will have a look at the theory, and then explore together, how Strangler Fig could be used to improve and replace a project worthy being called "The Most Horrible Piece Of Code in the World". === As a principal consultant at codecentric, Tobias has seen, improved and survived enough of "the code that earns our money" to enjoy it. Slides

AWS New York Summit 2019: Migrating Monolithic Applications with the Strangler Pattern (FSV303)
Learn more about Amazon AWS Global Summits at – “Lifting and shifting” an enterprise-scale application will yield some of the benefits of the cloud, but elasticity and agility may still be limited. Conversely, rewriting that application to be cloud-native can be costly in terms of both time and money and could cause you to miss market opportunities. This session explores the challenges financial institutions face when migrating their existing portfolio of applications to the cloud. Then, we share practical tips to migrate applications to realize cloud-native architectural benefits incrementally using the strangler pattern.

Model Monitoring

YouTube search... ...Google search

Monitoring production systems is essential to keeping them running well. For ML systems, monitoring becomes even more important, because their performance depends not just on factors that we have some control over, like infrastructure and our own software, but also on data, which we have much less control over. Therefore, in addition to monitoring standard metrics like latency, traffic, errors and saturation, we also need to monitor model prediction performance. An obvious challenge with monitoring model performance is that we usually don’t have a verified label to compare our model’s predictions to, since the model works on new data. In some cases we might have some indirect way of assessing the model’s effectiveness, for example by measuring click rate for a recommendation model. In other cases, we might have to rely on comparisons between time periods, for example by calculating a percentage of positive classifications hourly and alerting if it deviates by more than a few percent from the average for that time. Just like when validating the model, it’s also important to monitor metrics across slices, and not just globally, to be able to detect problems affecting specific segments. ML Ops: Machine Learning as an Engineering Discipline | Cristiano Breuel - Towards Data Science

Josh Wills: Visibility and Monitoring for Machine Learning Models
Josh started our Meetup with a short talk on deploying machine learning models into production. He's worked as the Director of Data Science at Cloudera, he wrote the Java version of Google's AB testing framework, and he recently held the position of Director of Data Engineering at Slack. In his opinion the most important question is: "How often do you want to deploy this?" You should never deploy a machine learning model once. If the problem is not important enough to keep working on it and deploy new models, then its not important enough to pay the cost of putting it into production in the first place. Watch his talk to get his thoughts on testing machine learning models in production.

Concept Drift: Monitoring Model Quality in Streaming Machine Learning Applications
Most machine learning algorithms are designed to work on stationary data. Yet, real-life streaming data is rarely stationary. Models lose prediction accuracy over time if they are not retrained. Without model quality monitoring, retraining decisions are suboptimal and costly. Here, we review the monitoring methods and evaluate them for applicability in modern fast data and streaming applications.

Monitoring models in production - Jannes Klaas
PyData Amsterdam 2018 A Data Scientists work is not done once machine learning models are in production. In this talk, Jannes will explain ways of monitoring Keras neural network models in production, how to track model decay and set up alerting using Flask, Docker and a range of self-built tools.

Machine Learning Models in Production
Data Scientists and Machine Learning practitioners, nowadays, seem to be churning out models by the dozen and they continuously experiment to find ways to improve their accuracies. They also use a variety of ML and DL frameworks & languages , and a typical organization may find that this results in a heterogenous, complicated bunch of assets that require different types of runtimes, resources and sometimes even specialized compute to operate efficiently. But what does it mean for an enterprise to actually take these models to "production" ? How does an organization scale inference engines out & make them available for real-time applications without significant latencies ? There needs to be different techniques for batch ,offline, inferences and instant, online scoring. Data needs to be accessed from various sources and cleansing, transformations of data needs to be enabled prior to any predictions. In many cases, there maybe no substitute for customized data handling with scripting either. Enterprises also require additional auditing and authorizations built in, approval processes and still support a "continuous delivery" paradigm whereby a data scientist can enable insights faster. Not all models are created equal, nor are consumers of a model - so enterprises require both metering and allocation of compute resources for SLAs. In this session, we will take a look at how machine learning is operationalized in IBM Data Science Experience, DSX, a Kubernetes based offering for the Private Cloud and optimized for the HortonWorks Hadoop Data Platform. DSX essentially brings in typical software engineering development practices to Data Science, organizing the dev-test-production for machine learning assets in much the same way as typical software deployments. We will also see what it means to deploy, monitor accuracies and even rollback models & custom scorers as well as how API based techniques enable consuming business processes and applications to remain relatively stable amidst all the chaos. Speaker: Piotr Mierzejewski, Program Director Development IBM DSX Local, IBM

A/B Testing

YouTube search... ...Google search

A/B testing (also known as bucket testing or split-run testing) is a user experience research methodology. A/B tests consist of a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing or "two-sample hypothesis testing" as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective. A/B testing | Wikipedia

A randomized controlled trial (or randomized control trial; RCT) is a type of scientific (often medical) experiment that aims to reduce certain sources of bias when testing the effectiveness of new treatments; this is accomplished by randomly allocating subjects to two or more groups, treating them differently, and then comparing them with respect to a measured response. One group—the experimental group—receives the intervention being assessed, while the other—usually called the control group—receives an alternative treatment, such as a placebo or no intervention. The groups are monitored under conditions of the trial design to determine the effectiveness of the experimental intervention, and efficacy is assessed in comparison to the control. There may be more than one treatment group or more than one control group. Randomized controlled trial | Wikipedia

Stanford Seminar: Peeking at A/B Tests - Why It Matters and What to Do About It
Ramesh Johari Stanford University I'll describe a novel statistical methodology that has been deployed by the commercial A/B testing platform Optimizely to communicate experimental results to their customers. Our methodology addresses the issue that traditional p-values and confidence intervals give unreliable inference. This is because users of A/B testing software are known to continuously monitor these measures as the experiment is running. We provide always valid p-values and confidence intervals that are provably robust to this effect. Not only does this make it safe for a user to continuously monitor, but it empowers her to detect true effects more efficiently. I'll cover why continuous monitoring is a problem, and how our solution addresses it. The talk will be presented at a level that does not presume much statistical background. Ramesh Johari is an Associate Professor at Stanford University, with a full-time appointment in the Department of Management Science and Engineering (MS&E), and courtesy appointments in the Departments of Computer Science (CS) and Electrical Engineering (EE). He is a member of the Operations Research group and the Social Algorithms Lab (SOAL) in MS&E, the Information Systems Laboratory (ISL) in EE, and the Institute for Computational and Mathematical Engineering (ICME). Learn more about Stanford's Human-Computer Interaction Group:

Facebook's A B Platform Interactive Analysis in Realtime - @Scale 2014 - Data
Itamar Rosenn, Engineering Manager at Facebook This talk presents Facebook¹s platform for streamlined and automated A/B test analysis. We will discuss the system architecture, how it enables interactive analysis on near-realtime results, and the challenges we've faced in scaling the system to meet our customer needs.

Scoring Deployed Models

YouTube search... ...Google search

ML Model Deployment and Scoring on the Edge with Automatic ML & DF / Flink2Kafka
recorded on June 18, 2020. Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments.'s Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge. Speakers: James Medel ( - Technical Community Maker) Greg Keys ( - Solution Engineer) Kafka 2 Flink - An Apache Love Story This project has heavily inspired by two existing efforts from Data In Motion's FLaNK Stack and Data Artisan's blog on stateful streaming applications. The goal of this project is to provide insight into connecting an Apache Flink applications to Apache Kafka. Speaker: Ian R Brooks, PhD (Cloudera - Senior Solutions Engineer & Data)

Shawn Scully: Production and Beyond: Deploying and Managing Machine Learning Models
PyData NYC 2015 Machine learning has become the key component in building intelligence-infused applications. However, as companies increase the number of such deployments, the number of machine learning models that need to be created, maintained, monitored, tracked, and improved grow at a tremendous pace. This growth has lead to a huge (and well-documented) accumulation of technical debt. Developing a machine learning application is an iterative process that involves building multiple models over a dataset. The dataset itself evolves over time as new features and new data points are collected. Furthermore, once deployed, the models require updates over time. Changes in models and datasets become difficult to track over time, and one can quickly lose track of which version of the model used which data and why it was subsequently replaced. In this talk, we outline some of the key challenges in large-scale deployments of many interacting machine learning models. We then describe a methodology for management, monitoring, and optimization of such models in production, which helps mitigate the technical debt. In particular, we demonstrate how to: Track models and versions, and visualize their quality over time Track the provenance of models and datasets, and quantify how changes in data impact the models being served Optimize model ensembles in real time, based on changing data, and provide alerts when such ensembles no longer provide the desired accuracy.