Difference between revisions of "Evaluation"

From
Jump to: navigation, search
m
m (Model Deployment and Scoring)
Line 114: Line 114:
 
|}<!-- B -->
 
|}<!-- B -->
  
= Model Deployment and Scoring =
+
= Model Deployment Scoring =
 
{|<!-- T -->
 
{|<!-- T -->
 
| valign="top" |
 
| valign="top" |
Line 127: Line 127:
 
{| class="wikitable" style="width: 550px;"
 
{| class="wikitable" style="width: 550px;"
 
||
 
||
<youtube>8Vo7aFoMPoE</youtube>
+
<youtube>q-VPALG6ogY</youtube>
<b>Customer Segmentation – ML Model Scoring
+
<b>Shawn Scully: Production and Beyond: Deploying and Managing Machine Learning Models
</b><br>Cloud Pak for Data Accelerators in Wealth Management | Customer Segmentation – ML Model Scoring Our Data Scientist Jenny has clean data and has trained the Machine Learning model. Now the ML models are ready to be operationalized for scoring and dynamic segmentation of clients.
+
</b><br>PyData NYC 2015  Machine learning has become the key component in building intelligence-infused applications. However, as companies increase the number of such deployments, the number of machine learning models that need to be created, maintained, monitored, tracked, and improved grow at a tremendous pace. This growth has lead to a huge (and well-documented) accumulation of technical debt.  Developing a machine learning application is an iterative process that involves building multiple models over a dataset. The dataset itself evolves over time as new features and new data points are collected. Furthermore, once deployed, the models require updates over time. Changes in models and datasets become difficult to track over time, and one can quickly lose track of which version of the model used which data and why it was subsequently replaced. In this talk, we outline some of the key challenges in large-scale deployments of many interacting machine learning models. We then describe a methodology for management, monitoring, and optimization of such models in production, which helps mitigate the technical debt. In particular, we demonstrate how to: Track models and versions, and visualize their quality over time Track the provenance of models and datasets, and quantify how changes in data impact the models being served Optimize model ensembles in real time, based on changing data, and provide alerts when such ensembles no longer provide the desired accuracy.
 
|}
 
|}
 
|}<!-- B -->
 
|}<!-- B -->

Revision as of 14:59, 5 September 2020

YouTube search... ...Google search

Many products today leverage artificial intelligence for a wide range of industries, from healthcare to marketing. However, most business leaders who need to make strategic and procurement decisions about these technologies have no formal AI background or academic training in data science. The purpose of this article is to give business people with no AI expertise a general guideline on how to assess an AI-related product to help decide whether it is potentially relevant to their business. How to Assess an Artificial Intelligence Product or Solution (Even if You’re Not an AI Expert) | Daniel Faggella - Emerj

Evaluating AI- and ML-Based Security Products
Anup Ghosh, Founder and CEO, Invincea Liam Randall, President, Critical Stack, A Division of Capital One Chad Skipper, VP Competitive Intelligence and Product Testing, Cylance Mike Spanbauer, Vice President of Research and Strategy, NSS Labs With endless AI or machine learning product claims, buyers are left bewildered with how to test these claims. It falls to independent third-party test organizations to develop and update traditional test protocols to test and validate AI and ML product capability claims. This panel will tackle the key issues that third-party testing must address to validate AI and ML security products.

How Should We Evaluate Machine Learning for AI?: Percy Liang
Machine learning has undoubtedly been hugely successful in driving progress in AI, but it implicitly brings with it the train-test evaluation paradigm. This standard evaluation only encourages behavior that is good on average; it does not ensure robustness as demonstrated by adversarial examples, and it breaks down for tasks such as dialogue that are interactive or do not have a correct answer. In this talk, I will describe alternative evaluation paradigms with a focus on natural language understanding tasks, and discuss ramifications for guiding progress in AI in meaningful directions. Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction. Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).

ML Test Score

Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley - Google Research Full Stack Deep Learning

ML Test Score (2) - Testing & Deployment - Full Stack Deep Learning
How can you test your machine learning system?  A Rubric for Production Readiness and Technical Debt Reduction  is an exhaustive framework/checklist from practitioners at Google. - The paper presents a rubric as a set of 28 actionable tests and offers a scoring system to measure how ready for production a given machine learning system is. These are categorized into 4 sections: (1) data tests, (2) model tests, (3) ML infrastructure tests, and (4) monitoring tests. - The scoring system provides a vector for incentivizing ML system developers to achieve stable levels of reliability by providing a clear indicator of readiness and clear guidelines for how to improve.

What is Your ML Score? - Tania Allard
Developer Advocate at Microsoft Using machine learning in real-world applications and production systems is complex. Testing, monitoring, and logging are key considerations for assessing the decay, current status, and production-readiness of machine learning systems. Where do you get started? Who is responsible for testing and monitoring? I’ll discuss the most frequent issues encountered in real-life ML applications and how you can make systems more robust. I’ll also provide a rubric with actionable examples to ensure quality and adequacy of a model in production.

Buying

Build or buy AI? You're asking the wrong question
Evan Kohn, chief business officer and head of marketing at Pypestream, talks with Tonya Hall about why companies need to turn to staffing for AI and building data sets.

Why you should Buy Open-Source AI
Considering an AI assistant in your home? Before you auto-buy that pretty picture in front of you, be sure to check out the open-source offerings as well.

Best Practices

Rules of ML
Google research scientist Martin Zinkevich

Best Practices of In-Platform AI/ML Webinar
Productive use of machine learning and artificial intelligence technologies is impossible without a platform that allows autonomous functioning of AI/ML mechanisms. In-platform AI/ML has a number of advantages that can be obtained via best practices by InterSystems. On this webinar, we will present: • MLOps as the natural paradigm for in-platform AI/ML • A full cycle of AI/ML content development and in-platform deployment (including bidirectional integration of Jupyter with InterSystems IRIS) • New toolset added to ML Toolkit: integration and orchestration for Julia mathematical modeling environment • Automated AI/ML model selection and parameter determination via an SQL query • Cloud-enhanced ML • Featured use case demo: hospital readmission prediction (addresses running in InterSystems IRIS of the models trained outside the platform's control)

Model Deployment Scoring

ML Model Deployment and Scoring on the Edge with Automatic ML & DF / Flink2Kafka
recorded on June 18, 2020. Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments. H2O.ai's Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge. Speakers: James Medel (H2O.ai - Technical Community Maker) Greg Keys (H2O.ai - Solution Engineer) Kafka 2 Flink - An Apache Love Story This project has heavily inspired by two existing efforts from Data In Motion's FLaNK Stack and Data Artisan's blog on stateful streaming applications. The goal of this project is to provide insight into connecting an Apache Flink applications to Apache Kafka. Speaker: Ian R Brooks, PhD (Cloudera - Senior Solutions Engineer & Data)

Shawn Scully: Production and Beyond: Deploying and Managing Machine Learning Models
PyData NYC 2015 Machine learning has become the key component in building intelligence-infused applications. However, as companies increase the number of such deployments, the number of machine learning models that need to be created, maintained, monitored, tracked, and improved grow at a tremendous pace. This growth has lead to a huge (and well-documented) accumulation of technical debt. Developing a machine learning application is an iterative process that involves building multiple models over a dataset. The dataset itself evolves over time as new features and new data points are collected. Furthermore, once deployed, the models require updates over time. Changes in models and datasets become difficult to track over time, and one can quickly lose track of which version of the model used which data and why it was subsequently replaced. In this talk, we outline some of the key challenges in large-scale deployments of many interacting machine learning models. We then describe a methodology for management, monitoring, and optimization of such models in production, which helps mitigate the technical debt. In particular, we demonstrate how to: Track models and versions, and visualize their quality over time Track the provenance of models and datasets, and quantify how changes in data impact the models being served Optimize model ensembles in real time, based on changing data, and provide alerts when such ensembles no longer provide the desired accuracy.