Difference between revisions of "ML Test Score"
m |
m |
||
| (16 intermediate revisions by the same user not shown) | |||
| Line 2: | Line 2: | ||
|title=PRIMO.ai | |title=PRIMO.ai | ||
|titlemode=append | |titlemode=append | ||
| − | |keywords=artificial, intelligence, machine, learning, models | + | |keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools |
| − | |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | + | |
| + | <!-- Google tag (gtag.js) --> | ||
| + | <script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script> | ||
| + | <script> | ||
| + | window.dataLayer = window.dataLayer || []; | ||
| + | function gtag(){dataLayer.push(arguments);} | ||
| + | gtag('js', new Date()); | ||
| + | |||
| + | gtag('config', 'G-4GCWLBVJ7T'); | ||
| + | </script> | ||
}} | }} | ||
[http://www.youtube.com/results?search_query=ML+Test+Score+artificial+intelligence+Deep+Machine+Learning YouTube search...] | [http://www.youtube.com/results?search_query=ML+Test+Score+artificial+intelligence+Deep+Machine+Learning YouTube search...] | ||
[http://www.google.com/search?q=ML+Test+Score+artificial+intelligence+Deep+Machine+Learning ...Google search] | [http://www.google.com/search?q=ML+Test+Score+artificial+intelligence+Deep+Machine+Learning ...Google search] | ||
| − | * [[Evaluation]] | + | * [[Strategy & Tactics]] ... [[Project Management]] ... [[Best Practices]] ... [[Checklists]] ... [[Project Check-in]] ... [[Evaluation]] ... [[Evaluation - Measures|Measures]] |
| − | + | ** [[Evaluation - Measures#Accuracy|Accuracy]] | |
| − | + | ** [[Evaluation - Measures#Precision & Recall (Sensitivity)|Precision & Recall (Sensitivity)]] | |
| − | + | ** [[Evaluation - Measures#Specificity|Specificity]] | |
| − | + | ** [[Benchmarks]] | |
| − | |||
** [[Bias and Variances]] | ** [[Bias and Variances]] | ||
| − | ** [[ | + | ** [[Algorithm Administration#Model Monitoring|Model Monitoring]] |
| − | * | + | * [[AI Solver]] ... [[Algorithms]] ... [[Algorithm Administration|Administration]] ... [[Model Search]] ... [[Discriminative vs. Generative]] ... [[Train, Validate, and Test]] |
| − | * | + | * [[Risk, Compliance and Regulation]] ... [[Ethics]] ... [[Privacy]] ... [[Law]] ... [[AI Governance]] ... [[AI Verification and Validation]] |
| − | * | + | * [[Artificial General Intelligence (AGI) to Singularity]] ... [[Inside Out - Curious Optimistic Reasoning| Curious Reasoning]] ... [[Emergence]] ... [[Moonshots]] ... [[Explainable / Interpretable AI|Explainable AI]] ... [[Algorithm Administration#Automated Learning|Automated Learning]] |
| − | |||
* [[Cybersecurity: Evaluating & Selling]] | * [[Cybersecurity: Evaluating & Selling]] | ||
| − | * [[ | + | * [[Data Science]] ... [[Data Governance|Governance]] ... [[Data Preprocessing|Preprocessing]] ... [[Feature Exploration/Learning|Exploration]] ... [[Data Interoperability|Interoperability]] ... [[Algorithm Administration#Master Data Management (MDM)|Master Data Management (MDM)]] ... [[Bias and Variances]] ... [[Benchmarks]] ... [[Datasets]] |
| − | |||
| − | |||
| − | |||
| − | |||
| − | |||
* [[Automated Scoring]] | * [[Automated Scoring]] | ||
| − | * [[ | + | * [[Development]] ... [[Notebooks]] ... [[Development#AI Pair Programming Tools|AI Pair Programming]] ... [[Codeless Options, Code Generators, Drag n' Drop|Codeless]] ... [[Hugging Face]] ... [[Algorithm Administration#AIOps/MLOps|AIOps/MLOps]] ... [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)|AIaaS/MLaaS]] |
| − | |||
* [http://research.google/pubs/pub43146/ Machine Learning: The High Interest Credit Card of Technical Debt | | D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young -] [[Google]] Research | * [http://research.google/pubs/pub43146/ Machine Learning: The High Interest Credit Card of Technical Debt | | D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young -] [[Google]] Research | ||
* [http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Hidden Technical Debt in Machine Learning Systems D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J. Crespo, and D. Dennison -] [[Google]] Research | * [http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Hidden Technical Debt in Machine Learning Systems D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J. Crespo, and D. Dennison -] [[Google]] Research | ||
Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. [http://research.google/pubs/pub46555/ The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley -] [[Google]] Research Full Stack Deep Learning | Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. [http://research.google/pubs/pub46555/ The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley -] [[Google]] Research Full Stack Deep Learning | ||
| + | |||
| + | |||
| + | http://millengustavo.github.io/assets/images/ml_test_production/systems_comparison.PNG | ||
{|<!-- T --> | {|<!-- T --> | ||
| Line 53: | Line 57: | ||
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| + | |||
| + | |||
| + | {|<!-- T --> | ||
| + | | valign="top" | | ||
| + | {| class="wikitable" style="width: 550px;" | ||
| + | || | ||
| + | http://millengustavo.github.io/assets/images/ml_test_production/data.PNG | ||
| + | |} | ||
| + | |<!-- M --> | ||
| + | | valign="top" | | ||
| + | {| class="wikitable" style="width: 550px;" | ||
| + | || | ||
| + | http://millengustavo.github.io/assets/images/ml_test_production/model.PNG | ||
| + | |} | ||
| + | |}<!-- B --> | ||
| + | {|<!-- T --> | ||
| + | | valign="top" | | ||
| + | {| class="wikitable" style="width: 550px;" | ||
| + | || | ||
| + | http://millengustavo.github.io/assets/images/ml_test_production/infra.PNG | ||
| + | |} | ||
| + | |<!-- M --> | ||
| + | | valign="top" | | ||
| + | {| class="wikitable" style="width: 550px;" | ||
| + | || | ||
| + | http://millengustavo.github.io/assets/images/ml_test_production/monitor.PNG | ||
| + | |} | ||
| + | |}<!-- B --> | ||
| + | |||
| + | http://millengustavo.github.io/assets/images/ml_test_production/score.PNG | ||
Latest revision as of 20:39, 26 April 2024
YouTube search... ...Google search
- Strategy & Tactics ... Project Management ... Best Practices ... Checklists ... Project Check-in ... Evaluation ... Measures
- AI Solver ... Algorithms ... Administration ... Model Search ... Discriminative vs. Generative ... Train, Validate, and Test
- Risk, Compliance and Regulation ... Ethics ... Privacy ... Law ... AI Governance ... AI Verification and Validation
- Artificial General Intelligence (AGI) to Singularity ... Curious Reasoning ... Emergence ... Moonshots ... Explainable AI ... Automated Learning
- Cybersecurity: Evaluating & Selling
- Data Science ... Governance ... Preprocessing ... Exploration ... Interoperability ... Master Data Management (MDM) ... Bias and Variances ... Benchmarks ... Datasets
- Automated Scoring
- Development ... Notebooks ... AI Pair Programming ... Codeless ... Hugging Face ... AIOps/MLOps ... AIaaS/MLaaS
- Machine Learning: The High Interest Credit Card of Technical Debt | | D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young - Google Research
- Hidden Technical Debt in Machine Learning Systems D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J. Crespo, and D. Dennison - Google Research
Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley - Google Research Full Stack Deep Learning
|
|
|
|
|
|