Difference between revisions of "ML Test Score"

From
Jump to: navigation, search
m
m
Line 24: Line 24:
 
** [[Bias and Variances]]
 
** [[Bias and Variances]]
 
** [[Algorithm Administration#Model Monitoring|Model Monitoring]]
 
** [[Algorithm Administration#Model Monitoring|Model Monitoring]]
* [[AI Solver]] ... [[Algorithms]] ... [[Algorithm Administration|Administration]] ... [[Model Search]] ... [[Discriminative vs. Generative]] ... [[Optimizer]] ... [[Train, Validate, and Test]]
+
* [[AI Solver]] ... [[Algorithms]] ... [[Algorithm Administration|Administration]] ... [[Model Search]] ... [[Discriminative vs. Generative]] ... [[Train, Validate, and Test]]
 
* [[Risk, Compliance and Regulation]] ... [[Ethics]] ... [[Privacy]] ... [[Law]] ... [[AI Governance]] ... [[AI Verification and Validation]]
 
* [[Risk, Compliance and Regulation]] ... [[Ethics]] ... [[Privacy]] ... [[Law]] ... [[AI Governance]] ... [[AI Verification and Validation]]
 
* [[Artificial General Intelligence (AGI) to Singularity]] ... [[Inside Out - Curious Optimistic Reasoning| Curious Reasoning]] ... [[Emergence]] ... [[Moonshots]] ... [[Explainable / Interpretable AI|Explainable AI]] ...  [[Algorithm Administration#Automated Learning|Automated Learning]]
 
* [[Artificial General Intelligence (AGI) to Singularity]] ... [[Inside Out - Curious Optimistic Reasoning| Curious Reasoning]] ... [[Emergence]] ... [[Moonshots]] ... [[Explainable / Interpretable AI|Explainable AI]] ...  [[Algorithm Administration#Automated Learning|Automated Learning]]

Revision as of 21:46, 5 March 2024

YouTube search... ...Google search

Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley - Google Research Full Stack Deep Learning


systems_comparison.PNG

ML Test Score (2) - Testing & Deployment - Full Stack Deep Learning
How can you test your machine learning system?  A Rubric for Production Readiness and Technical Debt Reduction  is an exhaustive framework/checklist from practitioners at Google. - The paper presents a rubric as a set of 28 actionable tests and offers a scoring system to measure how ready for production a given machine learning system is. These are categorized into 4 sections: (1) data tests, (2) model tests, (3) ML infrastructure tests, and (4) monitoring tests. - The scoring system provides a vector for incentivizing ML system developers to achieve stable levels of reliability by providing a clear indicator of readiness and clear guidelines for how to improve.

What is Your ML Score? - Tania Allard
Developer Advocate at Microsoft Using machine learning in real-world applications and production systems is complex. Testing, monitoring, and logging are key considerations for assessing the decay, current status, and production-readiness of machine learning systems. Where do you get started? Who is responsible for testing and monitoring? I’ll discuss the most frequent issues encountered in real-life ML applications and how you can make systems more robust. I’ll also provide a rubric with actionable examples to ensure quality and adequacy of a model in production.


data.PNG

model.PNG

infra.PNG

monitor.PNG

score.PNG