Difference between revisions of "ML Test Score"

Latest revision as of 21:39, 26 April 2024

Strategy & Tactics ... Project Management ... Best Practices ... Checklists ... Project Check-in ... Evaluation ... Measures
AI Solver ... Algorithms ... Administration ... Model Search ... Discriminative vs. Generative ... Train, Validate, and Test
Risk, Compliance and Regulation ... Ethics ... Privacy ... Law ... AI Governance ... AI Verification and Validation
Artificial General Intelligence (AGI) to Singularity ... Curious Reasoning ... Emergence ... Moonshots ... Explainable AI ... Automated Learning
Cybersecurity: Evaluating & Selling
Data Science ... Governance ... Preprocessing ... Exploration ... Interoperability ... Master Data Management (MDM) ... Bias and Variances ... Benchmarks ... Datasets
Automated Scoring
Development ... Notebooks ... AI Pair Programming ... Codeless ... Hugging Face ... AIOps/MLOps ... AIaaS/MLaaS
Machine Learning: The High Interest Credit Card of Technical Debt | | D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young - Google Research
Hidden Technical Debt in Machine Learning Systems D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J. Crespo, and D. Dennison - Google Research

Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley - Google Research Full Stack Deep Learning

ML Test Score (2) - Testing & Deployment - Full Stack Deep Learning How can you test your machine learning system? A Rubric for Production Readiness and Technical Debt Reduction is an exhaustive framework/checklist from practitioners at Google. - The paper presents a rubric as a set of 28 actionable tests and offers a scoring system to measure how ready for production a given machine learning system is. These are categorized into 4 sections: (1) data tests, (2) model tests, (3) ML infrastructure tests, and (4) monitoring tests. - The scoring system provides a vector for incentivizing ML system developers to achieve stable levels of reliability by providing a clear indicator of readiness and clear guidelines for how to improve.

What is Your ML Score? - Tania Allard Developer Advocate at Microsoft Using machine learning in real-world applications and production systems is complex. Testing, monitoring, and logging are key considerations for assessing the decay, current status, and production-readiness of machine learning systems. Where do you get started? Who is responsible for testing and monitoring? I’ll discuss the most frequent issues encountered in real-life ML applications and how you can make systems more robust. I’ll also provide a rubric with actionable examples to ensure quality and adequacy of a model in production.

@@ Line 2: / Line 2: @@
 |title=PRIMO.ai
 |titlemode=append
-|keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, Microsoft, Azure, Amazon, AWS
+|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
-|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
+<!-- Google tag (gtag.js) -->
+<script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments);}
+  gtag('js', new Date());
+  gtag('config', 'G-4GCWLBVJ7T');
+</script>
 }}
 [http://www.youtube.com/results?search_query=ML+Test+Score+artificial+intelligence+Deep+Machine+Learning YouTube search...]
 [http://www.google.com/search?q=ML+Test+Score+artificial+intelligence+Deep+Machine+Learning ...Google search]
-* [[Evaluation]]
+* [[Strategy & Tactics]] ... [[Project Management]] ... [[Best Practices]] ... [[Checklists]] ... [[Project Check-in]] ... [[Evaluation]] ... [[Evaluation - Measures|Measures]]
-** [[Evaluation - Measures]]
+** [[Evaluation - Measures#Accuracy|Accuracy]]
-*** [[Evaluation - Measures#Accuracy|Accuracy]]
+** [[Evaluation - Measures#Precision & Recall (Sensitivity)|Precision & Recall (Sensitivity)]]
-*** [[Evaluation - Measures#Precision & Recall (Sensitivity)|Precision & Recall (Sensitivity)]]
+** [[Evaluation - Measures#Specificity|Specificity]]
-*** [[Evaluation - Measures#Specificity|Specificity]]
+** [[Benchmarks]]
-*** [[Benchmarks]]
 ** [[Bias and Variances]]
-** [[Explainable Artificial Intelligence (XAI)]]
+** [[Algorithm Administration#Model Monitoring|Model Monitoring]]
-** [[Train, Validate, and Test]]
+* [[AI Solver]] ... [[Algorithms]] ... [[Algorithm Administration|Administration]] ... [[Model Search]] ... [[Discriminative vs. Generative]] ... [[Train, Validate, and Test]]
-** [[AI Verification and Validation]]
+* [[Risk, Compliance and Regulation]] ... [[Ethics]] ... [[Privacy]] ... [[Law]] ... [[AI Governance]] ... [[AI Verification and Validation]]
-** [[Model Monitoring]]
+* [[Artificial General Intelligence (AGI) to Singularity]] ... [[Inside Out - Curious Optimistic Reasoning| Curious Reasoning]] ... [[Emergence]] ... [[Moonshots]] ... [[Explainable / Interpretable AI|Explainable AI]] ...  [[Algorithm Administration#Automated Learning|Automated Learning]]
-* [[Trust]]
 * [[Cybersecurity: Evaluating & Selling]]
-* [[Strategy & Tactics]]
+* [[Data Science]] ... [[Data Governance|Governance]] ... [[Data Preprocessing|Preprocessing]] ... [[Feature Exploration/Learning|Exploration]] ... [[Data Interoperability|Interoperability]] ... [[Algorithm Administration#Master Data Management (MDM)|Master Data Management (MDM)]] ... [[Bias and Variances]] ... [[Benchmarks]] ... [[Datasets]]
-* [[Checklists]]
-* [[AI Governance]]
-** [[Data Governance]]
-*** [[Data Science]]
-*** [[Master Data Management (MDM) / Feature Store / Data Lineage / Data Catalog]]
 * [[Automated Scoring]]
-* [[Risk, Compliance and Regulation]]
+* [[Development]] ... [[Notebooks]] ... [[Development#AI Pair Programming Tools|AI Pair Programming]] ... [[Codeless Options, Code Generators, Drag n' Drop|Codeless]] ... [[Hugging Face]] ... [[Algorithm Administration#AIOps/MLOps|AIOps/MLOps]] ... [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)|AIaaS/MLaaS]]
-* [[AIOps / MLOps]]
 * [http://research.google/pubs/pub43146/ Machine Learning: The High Interest Credit Card of Technical Debt | | D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young -] [[Google]] Research
 * [http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Hidden Technical Debt in Machine Learning Systems D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J. Crespo, and D. Dennison -] [[Google]] Research
 Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. [http://research.google/pubs/pub46555/ The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley -] [[Google]] Research   Full Stack Deep Learning
+http://millengustavo.github.io/assets/images/ml_test_production/systems_comparison.PNG
 {|<!-- T -->
@@ Line 53: / Line 57: @@
 |}
 |}<!-- B -->
+{|<!-- T -->
+| valign="top" |
+{| class="wikitable" style="width: 550px;"
+||
+http://millengustavo.github.io/assets/images/ml_test_production/data.PNG
+|}
+|<!-- M -->
+| valign="top" |
+{| class="wikitable" style="width: 550px;"
+||
+http://millengustavo.github.io/assets/images/ml_test_production/model.PNG
+|}
+|}<!-- B -->
+{|<!-- T -->
+| valign="top" |
+{| class="wikitable" style="width: 550px;"
+||
+http://millengustavo.github.io/assets/images/ml_test_production/infra.PNG
+|}
+|<!-- M -->
+| valign="top" |
+{| class="wikitable" style="width: 550px;"
+||
+http://millengustavo.github.io/assets/images/ml_test_production/monitor.PNG
+|}
+|}<!-- B -->
+http://millengustavo.github.io/assets/images/ml_test_production/score.PNG

Difference between revisions of "ML Test Score"

Latest revision as of 21:39, 26 April 2024

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools