Difference between revisions of "Evaluation"

From
Jump to: navigation, search
m
m
 
(16 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
|title=PRIMO.ai
 
|title=PRIMO.ai
 
|titlemode=append
 
|titlemode=append
|keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, Microsoft, Azure, Amazon, AWS  
+
|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
+
 
 +
<!-- Google tag (gtag.js) -->
 +
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script>
 +
<script>
 +
  window.dataLayer = window.dataLayer || [];
 +
  function gtag(){dataLayer.push(arguments);}
 +
  gtag('js', new Date());
 +
 
 +
  gtag('config', 'G-4GCWLBVJ7T');
 +
</script>
 
}}
 
}}
[https://www.youtube.com/results?search_query=ai+Technical+Assessment+Evaluation+project+review+performance YouTube search...]
+
[https://www.youtube.com/results?search_query=ai+Technical+Assessment+Evaluation+project+review+performance YouTube]
[https://www.quora.com/search?q=ai%20Technical%20Assessment%20Evaluation%20project%20review%20performance ... Quora search]
+
[https://www.quora.com/search?q=ai%20Technical%20Assessment%20Evaluation%20project%20review%20performance ... Quora]
 
[https://www.google.com/search?q=ai+Technical+Assessment+Evaluation+project+review+performance ...Google search]
 
[https://www.google.com/search?q=ai+Technical+Assessment+Evaluation+project+review+performance ...Google search]
 
[https://news.google.com/search?q=ai+Technical+Assessment+Evaluation+project+review+performance ...Google News]
 
[https://news.google.com/search?q=ai+Technical+Assessment+Evaluation+project+review+performance ...Google News]
 
[https://www.bing.com/news/search?q=ai+Technical+Assessment+Evaluation+project+review+performance&qft=interval%3d%228%22 ...Bing News]
 
[https://www.bing.com/news/search?q=ai+Technical+Assessment+Evaluation+project+review+performance&qft=interval%3d%228%22 ...Bing News]
 +
 +
* [[Strategy & Tactics]] ... [[Project Management]] ... [[Best Practices]] ... [[Checklists]] ... [[Project Check-in]] ... [[Evaluation]] ... [[Evaluation - Measures|Measures]]
 +
 +
 +
<hr>
  
 
<center><b>Prompts for assessing AI projects</b></center>  
 
<center><b>Prompts for assessing AI projects</b></center>  
Line 21: Line 35:
 
* What mission outcome(s) will be benefited by the [[What is AI?|AI investment]], e.g. to increase revenue ([[Marketing|marketing]]), to be more competitive ([[Moonshots|gain capability]]), to increase performance ([[Anomaly Detection|detection]], [[Robotics|automation]], discovery, reduce costs ([[Agriculture|optimization]], [[Operations & Maintenance|predictive maintenance]], [[Forecasting|reduce inventory]]), [[Drug Discovery|time reduction]], provide personalization ([[Recommendation|recommendations]]), [[Risk, Compliance and Regulation|avoid risk of non-compliance]], better communication ([[Assistants|user interface]], [[Natural Language Processing (NLP)#Natural Language Understanding (NLU)|natural-language understanding]], [[Telecommunications|telecommunications]]), broader and better integration ([[Internet of Things (IoT)]], [[Smart Cities|smart cities]]), or [[Case Studies|other outcome(s)]]?
 
* What mission outcome(s) will be benefited by the [[What is AI?|AI investment]], e.g. to increase revenue ([[Marketing|marketing]]), to be more competitive ([[Moonshots|gain capability]]), to increase performance ([[Anomaly Detection|detection]], [[Robotics|automation]], discovery, reduce costs ([[Agriculture|optimization]], [[Operations & Maintenance|predictive maintenance]], [[Forecasting|reduce inventory]]), [[Drug Discovery|time reduction]], provide personalization ([[Recommendation|recommendations]]), [[Risk, Compliance and Regulation|avoid risk of non-compliance]], better communication ([[Assistants|user interface]], [[Natural Language Processing (NLP)#Natural Language Understanding (NLU)|natural-language understanding]], [[Telecommunications|telecommunications]]), broader and better integration ([[Internet of Things (IoT)]], [[Smart Cities|smart cities]]), or [[Case Studies|other outcome(s)]]?
 
* Would you classify the AI investment as being [https://www.linkedin.com/pulse/you-disruptive-evolutionary-revolutionary-so-should-survey-d-eon/ evolutionary, revolutionary, or disruptive]?  
 
* Would you classify the AI investment as being [https://www.linkedin.com/pulse/you-disruptive-evolutionary-revolutionary-so-should-survey-d-eon/ evolutionary, revolutionary, or disruptive]?  
* Was market research performed, what were the results?  What [[Moonshots#Emergence from Analogies|similar functionality exists in other solutions where lessons can be applied to the AI investment?]] Can the hypothesis be tested?  Playing devil's advocate, could there be a flaw in the analogical reasoning?   
+
* Was market research performed, what were the results?  What [[Emergence#Emergence from Analogies|similar functionality exists in other solutions where lessons can be applied to the AI investment?]] Can the hypothesis be tested?  Playing devil's advocate, could there be a flaw in the analogical reasoning?   
 
* Have opportunistic AI aspects of the [[Enterprise Architecture (EA)|end-to-end mission process(es)]] been reviewed?
 
* Have opportunistic AI aspects of the [[Enterprise Architecture (EA)|end-to-end mission process(es)]] been reviewed?
** Was a [[Framing Context|knowledge-based]] approach used for the review?  If AI was used for optimizing or simulating the process?
+
** Was a [[Context|knowledge-based]] approach used for the review?  If AI was used for optimizing or simulating the process?
 
** For each aspect [[Human-in-the-Loop (HITL) Learning#Augmented Intelligence|how does the AI augment human users?]]
 
** For each aspect [[Human-in-the-Loop (HITL) Learning#Augmented Intelligence|how does the AI augment human users?]]
 
* Does the [[Strategy & Tactics#Business Case|business case]] for the AI investment define clear objectives?   
 
* Does the [[Strategy & Tactics#Business Case|business case]] for the AI investment define clear objectives?   
Line 84: Line 98:
 
* Does [[AI Governance]] implement a risk-based approach, e.g. greater consideration or controls for high risk use cases?
 
* Does [[AI Governance]] implement a risk-based approach, e.g. greater consideration or controls for high risk use cases?
 
* What are the [[Enterprise Architecture (EA)|AI architecture]] specifics, e.g. [[Ensemble Learning]] methods used, [[Graph Convolutional Network (GCN), Graph Neural Networks (Graph Nets), Geometric Deep Learning]], [[Digital Twin]],  [[Decentralized: Federated & Distributed]]?
 
* What are the [[Enterprise Architecture (EA)|AI architecture]] specifics, e.g. [[Ensemble Learning]] methods used, [[Graph Convolutional Network (GCN), Graph Neural Networks (Graph Nets), Geometric Deep Learning]], [[Digital Twin]],  [[Decentralized: Federated & Distributed]]?
* Is the [[Neuroscience|wetware/brain]] or hardware involved, e.g. [[Internet of Things (IoT)]]; [[Signal Processing|physical sensors]], [[Javascript|mobile phones]], [[Screening; Passenger, Luggage, & Cargo|screening devices]], [[Image Retrieval / Object Detection|cameras/surveillance]], [[Healthcare|medical instrumentation]], [[Robotics|robots]], [[Transportation (Autonomous Vehicles)|autonomous vehicles]], [[Autonomous Drones|drones]], [[Quantum|quantum computing]], [[Assistants|assistants/chatbots]]?
+
* Is the [[Neuroscience|wetware/brain]] or hardware involved, e.g. [[Internet of Things (IoT)]]; [[Signal Processing|physical sensors]], [[JavaScript|mobile phones]], [[Screening; Passenger, Luggage, & Cargo|screening devices]], [[Vision|cameras/surveillance]], [[Healthcare|medical instrumentation]], [[Robotics|robots]], [[Transportation (Autonomous Vehicles)|autonomous vehicles]], [[Autonomous Drones|drones]], [[Quantum|quantum computing]], [[Assistants|assistants/chatbots]]?
 
* What [[Learning Techniques|learning technique(s) are or will be implemented?]] If a [[Transfer Learning| a transfer process]] is used, which model(s) and what mission specific [[Datasets|dataset(s)]] are used to tune the AI model?
 
* What [[Learning Techniques|learning technique(s) are or will be implemented?]] If a [[Transfer Learning| a transfer process]] is used, which model(s) and what mission specific [[Datasets|dataset(s)]] are used to tune the AI model?
* What [[PRIMO.ai#Algorithms|AI algorithms/model type(s) are used?]]  [[Regression]], [[K-Nearest Neighbors (KNN)]], [[Deep Neural Network (DNN)]], [[Natural Language Processing (NLP)]], [[Association Rule Learning]], etc.   
+
* What [[PRIMO.ai#Algorithms|AI algorithms/model type(s) are used?]]  [[Regression]], [[K-Nearest Neighbors (KNN)]], [[Neural Network#Deep Neural Network (DNN)|Deep Neural Network (DNN)]], [[Natural Language Processing (NLP)]], [[Association Rule Learning]], etc.   
 
* Do requirements trace to tests?  
 
* Do requirements trace to tests?  
 
** If using [[Evaluating Machine Learning Models|machine learning, how are the models evaluated?]]  
 
** If using [[Evaluating Machine Learning Models|machine learning, how are the models evaluated?]]  
Line 112: Line 126:
 
*** Will any [[Data Quality#Data Augmentation, Data Labeling, and Auto-Tagging|data labeling be required?  Is the data augmented? Is auto-tagging used?]] What data augmentation tools are/will be used?   
 
*** Will any [[Data Quality#Data Augmentation, Data Labeling, and Auto-Tagging|data labeling be required?  Is the data augmented? Is auto-tagging used?]] What data augmentation tools are/will be used?   
 
*** Have the key features/data attributes to be used in the AI model been identified?
 
*** Have the key features/data attributes to be used in the AI model been identified?
*** Will the labeling be enabled by merging domain knowledge with [[Natural Language Processing (NLP)#Ontology|ontologies]]?  If so, have concepts and associations been identified?  
+
*** Will the labeling be enabled by merging domain knowledge with [[Graph#Ontology|ontologies]]?  If so, have concepts and associations been identified?  
 
*** How good is the quality of the data labeling? How close is the [[Data Science#Ground Truth|ground truth]] to being a gold standard?
 
*** How good is the quality of the data labeling? How close is the [[Data Science#Ground Truth|ground truth]] to being a gold standard?
 
*** What [[Feature Exploration/Learning|data/feature exploration/engineering processes and tools are in place or being considered?]]  
 
*** What [[Feature Exploration/Learning|data/feature exploration/engineering processes and tools are in place or being considered?]]  
Line 123: Line 137:
 
* What tool(s) are used or will be used for model management?
 
* What tool(s) are used or will be used for model management?
 
** How are [[Algorithm Administration#Hyperparameter|Hyperparameters]] managed? What optimizers are used, e.g. [[Algorithm Administration#Automated Learning|automated learning (AutoML)]]?  
 
** How are [[Algorithm Administration#Hyperparameter|Hyperparameters]] managed? What optimizers are used, e.g. [[Algorithm Administration#Automated Learning|automated learning (AutoML)]]?  
** What components, e.g. optimizer, tuner, training, [[Algorithm Administration#Versioning|versioning]], model dependencies; e.g. training data, [[Datasets|dataset(s)]], historical lineage, [[Publishing#Model Publishing|publishing]], performance evaluations, and model storing are integrated in the model management tool(s)?  
+
** What components, e.g. optimizer, tuner, training, [[Algorithm Administration#Versioning|versioning]], model dependencies; e.g. training data, [[Datasets|dataset(s)]], historical lineage, [[Writing/Publishing#Model Publishing|publishing]], performance evaluations, and model storing are integrated in the model management tool(s)?  
 
** What is the reuse strategy? Is there a single POC for the reuse process/tools?  
 
** What is the reuse strategy? Is there a single POC for the reuse process/tools?  
 
*** Are the AI models published (repo, marketplace) for reuse, if so where?
 
*** Are the AI models published (repo, marketplace) for reuse, if so where?
Line 136: Line 150:
 
* [[PRIMO.ai#Development & Implementation|What is the development & implementation plan]]?     
 
* [[PRIMO.ai#Development & Implementation|What is the development & implementation plan]]?     
 
** What foundational capabilities are defined or in place for the AI investment? infrastructure platform, cloud resources?   
 
** What foundational capabilities are defined or in place for the AI investment? infrastructure platform, cloud resources?   
*** What languages & scripting are/will be used? e.g. [[Python]], [[Javascript]], [[PyTorch]]  
+
*** What languages & scripting are/will be used? e.g. [[Python]], [[JavaScript]], [[PyTorch]]  
 
*** What [[Libraries & Frameworks]] are used?
 
*** What [[Libraries & Frameworks]] are used?
 
*** Are [[Notebooks|notebooks]] used?  If so, is [[Jupyter]] supported?
 
*** Are [[Notebooks|notebooks]] used?  If so, is [[Jupyter]] supported?
Line 146: Line 160:
 
*** What tools are used for the [[Algorithm Administration#AIOps/MLOps|AIOps]]?  Please identify those on-premises and online services?
 
*** What tools are used for the [[Algorithm Administration#AIOps/MLOps|AIOps]]?  Please identify those on-premises and online services?
 
*** Are the AI languages, libraries, scripting, and [[Algorithm Administration#AIOps/MLOps|AIOps]] applications registered in the organization?
 
*** Are the AI languages, libraries, scripting, and [[Algorithm Administration#AIOps/MLOps|AIOps]] applications registered in the organization?
** Are the processes and decisions [[Enterprise Architecture (EA)|architecture]] driven to allow for end-to-end visibility and allow for dependency management?  Is information mapped to the intended use to allow [[Analytics|analytics]] and [[Visualization|visualizations]] framed in [[Framing Context|context]]?
+
** Are the processes and decisions [[Enterprise Architecture (EA)|architecture]] driven to allow for end-to-end visibility and allow for dependency management?  Is information mapped to the intended use to allow [[Analytics|analytics]] and [[Visualization|visualizations]] framed in [[context]]?
 
*** Does the AI investment depict the [[Algorithm Administration#AIOps/MLOps|AIOps]] pipeline/toolchain applications in its architecture, e.g tech stack?
 
*** Does the AI investment depict the [[Algorithm Administration#AIOps/MLOps|AIOps]] pipeline/toolchain applications in its architecture, e.g tech stack?
 
*** Does the [[Algorithm Administration#AIOps/MLOps|SecDevOps]] depict the AI investment in its architecture and how the health metrics are depicted?   
 
*** Does the [[Algorithm Administration#AIOps/MLOps|SecDevOps]] depict the AI investment in its architecture and how the health metrics are depicted?   
Line 185: Line 199:
 
** [[Evaluation - Measures#Specificity|Specificity]]
 
** [[Evaluation - Measures#Specificity|Specificity]]
 
** [[Project Management#Return on Investment (ROI)|Return on Investment (ROI)]]
 
** [[Project Management#Return on Investment (ROI)|Return on Investment (ROI)]]
 +
* [[Predictive Analytics]] ... [[Operations & Maintenance|Predictive Maintenance]] ... [[Forecasting]] ... [[Market Trading]] ... [[Sports Prediction]] ... [[Marketing]] ... [[Politics]] ... [[Excel#Excel - Forecasting|Excel]]
 
* [[AI Governance]] / [[Algorithm Administration]]
 
* [[AI Governance]] / [[Algorithm Administration]]
 
* [[AI Verification and Validation]]
 
* [[AI Verification and Validation]]
Line 192: Line 207:
 
* [[ML Test Score]]
 
* [[ML Test Score]]
 
* [[Cybersecurity: Evaluating & Selling]]
 
* [[Cybersecurity: Evaluating & Selling]]
* [[Strategy & Tactics]]
 
 
* [[Automated Scoring]]
 
* [[Automated Scoring]]
 
* [[Risk, Compliance and Regulation]]
 
* [[Risk, Compliance and Regulation]]

Latest revision as of 22:09, 5 December 2023

YouTube ... Quora ...Google search ...Google News ...Bing News



Prompts for assessing AI projects



What challenge does the AI investment solve?

How does the AI meet the challenge?

Who is providing leadership?

  • Is leadership's AI strategy documented and articulated well?
  • Does the AI investment strategy align with the organization's overall strategy, culture, and values? Does the organization appreciate experimental processes?
  • Is there a time constraint? Does the schedule meet the Technology Readiness Level (TRL) of the AI investment?
  • Is the AI investment properly resourced? budgeted, trained staff with key positions filled?
  • Responsibility clearly defined and communicated for AI research, performing data science, applied machine intelligence engineering, qualitative assurance, software development, implementing foundational capabilities, user experience, change management, configuration management, security, backup/contingency, domain expertise, and project management
  • Of these identified responsibilities which situations are they outsourced? What strategy is incorporated to convey the AI investment knowledge to the organization?
  • Is the organization positioned or positioning to scale its current state with AI?

Are best practices being followed?

What Laws, Regulations and Policies (LRPs) pertain, e.g. GDPR??

  • Are use cases testable and traceable to requirements, including LRPs?
  • When was the last time compliance requirements and regulations were examined? What adjustments were/must be made?
  • Does the AI investment require testing by external assessors to ensure compliance and/or auditing requirements?

What portion of the AI is developed inhouse and what is/will be procured?

  • If the AI is procured/outsourced, e.g. embedded in sensor product, what items are included in the contract to future proof the solution?
  • Contract items to protect organization reuse data rights?
  • Does acceptance criteria include a proof of capability?
  • How well do a vendor's service/product(s) and/or client references compare with the AI investment objectives?
  • How is/was the effort estimated? If procured AI, what factors were used to approximate the needed integration resources?

How is AI success measured?

  • What are the significant measures that indicate success? Are tradeoff rationale documented, e.g. accuracy vs speed?
  • Are the ways the mission is being measured clear, realistic, and documented? Specifically what are the AI investment's performance measures?
  • What is the Return on Investment (ROI)? Is the AI investment on track with original ROI target?
  • If there is/was an Analysis of Alternatives how were these measures used? What were the findings?
  • What mission metrics will be impacted with the AI investment? What drivers/measures have the most bearing? Of these performance indicators which can be used as leading indicators of the health of the AI investment?
  • What are the specific decisions and activities to impact each driver/measure?
  • What assumptions are being made? Of these assumptions, what constraints are anticipated?
  • Where does the AI investment fit in the portfolio? Are there possible synergies with other aligned efforts in the portfolio? Are there other related AI investments? If so, is this AI investment dependent on the other investment(s)? What investments require this AI investment to be successful? If so, how? Are there mitigation plans in place?
  • How would you be able to tell if the AI investment was working properly?
  • Is/will A/B testing or multivariate testing be performed?

What AI governance is in place?

What is the algorithm administration strategy?

  • What is the deployment vision? What attributes are being used to size the investment, count of users, queries, installations, etc.? What is the Minimum Viable Product (MVP) version of the AI investment that has enough features to satisfy early users and provide feedback for future investment development.? If an incremental rollout, how what is the strategy, portion of the users, markets, locations, capabilities?
  • What tool(s) are used or will be used for model management?
    • How are Hyperparameters managed? What optimizers are used, e.g. automated learning (AutoML)?
    • What components, e.g. optimizer, tuner, training, versioning, model dependencies; e.g. training data, dataset(s), historical lineage, publishing, performance evaluations, and model storing are integrated in the model management tool(s)?
    • What is the reuse strategy? Is there a single POC for the reuse process/tools?
      • Are the AI models published (repo, marketplace) for reuse, if so where?
      • Is the AI model reused from a repository (repo, marketplace)? If so, which one(s)?
    • Is Master Data Management (MDM) in place? What tools are available or being considered?
      • Is data lineage managed?
      • What data cataloging capabilities exists today? Future capabilities?
      • How are versioning|data versions controlled?
      • How are the dataset(s) used for AI training, testing and validation managed?
      • Are logs kept on which data is used for different executions/training so that the information used is traceable?
      • How is the access to the information guaranteed? Are the dataset(s) for AI published (repo, marketplace) for reuse, if so where?
  • What is the development & implementation plan?
    • What foundational capabilities are defined or in place for the AI investment? infrastructure platform, cloud resources?
    • How is the AI investment deployed?
      • What is the plan for model serving? For each use case, is the serving batched or streamed? If applicable, have REST endpoints been defined and exposed?
      • Is the AI investment implementing an AIOps pipeline/toolchain?
      • What tools are used for the AIOps? Please identify those on-premises and online services?
      • Are the AI languages, libraries, scripting, and AIOps applications registered in the organization?
    • Are the processes and decisions architecture driven to allow for end-to-end visibility and allow for dependency management? Is information mapped to the intended use to allow analytics and visualizations framed in context?
      • Does the AI investment depict the AIOps pipeline/toolchain applications in its architecture, e.g tech stack?
      • Does the SecDevOps depict the AI investment in its architecture and how the health metrics are depicted?
      • Is algorithm administration reflected in the AIOps pipeline/toolchain processes/architecture?
    • How is production readiness determined?
      • Does the team use ML Test Score for production readiness?
        • What are the minimum scores for Data, Model, ML Infrastructure, and Monitoring tests?
        • What score qualifies to pass into production? What is the rationale for passing if less than exceptional (score of >5)?
        • What were the lessons learned? Were adjustments made to move to a higher score? What were the adjustments?
      • Who makes the determination when the AI investment is deployed/refreshed?
      • How does the team ready for cybersecurity? use the MITRE ATT&CK™ Framework? ...use the GSA DevSecOps Guide?

How are changes identified and managed?




References


Nature of risks inherent to AI applications: We believe that the challenge in governing AI is less about dealing with completely new types of risk and more about existing risks either being harder to identify in an effective and timely manner, given the complexity and speed of AI solutions, or manifesting themselves in unfamiliar ways. As such, firms do not require completely new processes for dealing with AI, but they will need to enhance existing ones to take into account AI and fill the necessary gaps. The likely impact on the level of resources required, as well as on roles and responsibilities, will also need to be addressed. AI and risk management: Innovating with confidence | Deloitte

How Should We Evaluate Machine Learning for AI?: Percy Liang
YouTube: https://www.youtube.com/watch?v=7CcSm0PAr-Y

Machine learning has undoubtedly been hugely successful in driving progress in AI, but it implicitly brings with it the train-test evaluation paradigm. This standard evaluation only encourages behavior that is good on average; it does not ensure robustness as demonstrated by adversarial examples, and it breaks down for tasks such as dialogue that are interactive or do not have a correct answer. In this talk, I will describe alternative evaluation paradigms with a focus on natural language understanding tasks, and discuss ramifications for guiding progress in AI in meaningful directions. Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction. Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).

Machine Learning, Technical Debt, and You - D. Sculley (Google)
YouTube: https://www.youtube.com/watch?v=V18AsBIHlWs

Machine Learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. In this talk, we'll argue it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns. We then show how to pay down ML technical debt by following a set of recommended best practices for testing and monitoring needed for real world systems. D. Sculley is a Senior Staff Software Engineer at Google