Difference between revisions of "Evaluation"

From
Jump to: navigation, search
m (What is the model management strategy?)
m (What is the model management strategy?)
Line 116: Line 116:
 
* What tools are used or will be used for model management?
 
* What tools are used or will be used for model management?
 
** How are [[Hyperparameter|hyperparmeters]] managed? What optimizers are used, e.g. [[Automated Machine Learning (AML) - AutoML|automated learning (AutoML)]]?  
 
** How are [[Hyperparameter|hyperparmeters]] managed? What optimizers are used, e.g. [[Automated Machine Learning (AML) - AutoML|automated learning (AutoML)]]?  
** What components, e.g. optimizer, tuner, training, [[Master Data Management (MDM) / Feature Store / Data Lineage / Data Catalog#Versioning|versioning]], model dependencies, [[Publishing#Model Publishing|publishing]], performance evaluating and storing are integrated in the model management tool?  
+
** What components, e.g. optimizer, tuner, training, [[Master Data Management (MDM) / Feature Store / Data Lineage / Data Catalog#Versioning|versioning]], model dependencies, e.g. training data, test datasets, [[Publishing#Model Publishing|publishing]], performance evaluations, and storing are integrated in the model management tool?  
 
** Are the AI models published (repo, marketplace) for reuse, if so where?
 
** Are the AI models published (repo, marketplace) for reuse, if so where?
 
** Is the AI model reused from a repository (repo, marketplace)?  If so, which one?  How are you notified of updates? How often is the repository checked for updates?  
 
** Is the AI model reused from a repository (repo, marketplace)?  If so, which one?  How are you notified of updates? How often is the repository checked for updates?  

Revision as of 09:07, 27 September 2020

YouTube search... ...Google search

Questions



What challenge does the AI investment solve?

How does the AI meet the challenge?

Is the right leadership in place?

  • Is leadership's AI strategy documented and articulated well?
  • Does the AI investment strategy align with the organization's overall strategy, culture, and values?
  • Is there a time constraint? Does the schedule meet the Technology Readiness Level (TRL) of the AI investment?
  • Is the AI investment properly resourced? budgeted, trained staff with key positions filled?
  • Responsibility clearly defined and communicated for AI research, performing data science, applied machine intelligence engineering, qualitative assurance, software development, implementing foundational capabilities, user experience, change management, configuration management, security, backup/contingency, domain expertise, and project management
  • Of these identified responsibilities which situations are they outsourced? What strategy is incorporated to convey the AI investment knowledge to the organization?
  • Is the organization positioned or positioning to scale its current state with AI?

Are best practices being followed?

What Laws, Regulations and Policies (LRPs) pertain, e.g. GDPR??

What portion of the AI is developed inhouse and what is/will be procured?

  • If the AI is procured/outsourced, e.g. embedded in sensor product, what items are included in the contract to future proof the solution?
  • Contract items to protect organization reuse data rights?
  • Does acceptance criteria include a proof of capability?
  • How well do a vendor's service/product(s) and/or client references compare with the AI investment objectives?
  • How is/was the effort estimated? If procured AI, what factors were used to approximate the needed integration resources?

How is AI success measured?

  • What are the significant measures that indicate success?
  • Are the ways the mission is being measured clear, realistic, and documented? Specifically what are the AI investment's performance measures?
  • Are the measures being used correctly?
  • What is the Return on Investment (ROI)? Is the AI investment on track with original ROI target?
  • If there is/was an Analysis of Alternatives how were these measures used? What were the findings?
  • What mission metrics will be impacted with the AI investment? What drivers/measures have the most bearing? Of these performance indicators which can be used as leading indicators of the health of the AI investment?
  • What are the specific decisions and activities to impact each driver/measure?
  • What assumptions are being made? Of these assumptions, what constraints are anticipated?
  • Are there other related AI investments? If so, is this AI investment dependent on the other investment(s)? What investments require this AI investment to be successful? If so, how? Are there mitigation plans in place?
  • How would you be able to tell if the AI investment was working properly?
  • What benchmarks are the AI model compared/scored? e.g. Global Vectors for Word Representation (GloVe)
  • How perfect does AI have to be to trust it?
  • What is the inference/prediction rate performance metric for the AI investment?
  • Is/will A/B testing or multivariate testing be performed?

What AI governance is in place?

What is the data governance process?

  • Is there data management plan(ning)? Does data planning address metadata for dataflows and data transitions?
  • Has the data been identified for current AI investment? For future use AI investment(s)?
    • What are the possible constraints or challenges in accessing or incorporating the identified data?
    • Are the internal data resources available and accessible? What processes need to change to best obtain the data?
    • For external data resources, have they been sourced with contracts in place to make the data available and accessible?
    • Are permissions in place to use the data, with privacy and security restrictions considered and mitigated?
    • What is the expected size of the data to be used for training? What is the ratio of observations(rows) to features (columns)?
    • How good is the quality of the data; skewed, completeness, clean? If there a data management plan, is there a section on data quality?
    • How are the dataset(s) used are assured to represent the problem space?
    • What Key Performance Indicators (KPI) can the data potentially drive to achieve key mission objective(s)? What data is missing in order to establish the Key Performance Indicators (KPI)?
    • Is there sufficient amount of data available? If temporal model, does the data have a rich history set? Does the historical data cover periodic and other critical events?
    • Does the data have a refresh schedule? Does the data punctual; arrives on time, or ready to be pulled?
    • Is there an effort to identify unintended feedback loop(s)?
  • For each data set, has the information been determined to be structured, semi-structured, unstructured?
  • What data quality checks are in place? What a tool are in place or being considered?

What is the model management strategy?

  • What tools are used or will be used for model management?
    • How are hyperparmeters managed? What optimizers are used, e.g. automated learning (AutoML)?
    • What components, e.g. optimizer, tuner, training, versioning, model dependencies, e.g. training data, test datasets, publishing, performance evaluations, and storing are integrated in the model management tool?
    • Are the AI models published (repo, marketplace) for reuse, if so where?
    • Is the AI model reused from a repository (repo, marketplace)? If so, which one? How are you notified of updates? How often is the repository checked for updates?
    • Is Master Data Management (MDM) in place? What tools are available or being considered?
      • Is data lineage managed?
      • What data cataloging capabilities exists today? Future capabilities?
      • How are data versions controlled?
      • How are the dataset(s) used for AI training, testing and validation managed?
      • Are logs kept on which data is used for different executions/training so that the information used is traceable?
      • How is the access to the information guaranteed? Are the dataset(s) for AI published (repo, marketplace) for reuse, if so where?

What is the development & implementation strategy?

How is production readiness determined?

  • Does the team use ML Test Score for production readiness?
    • What are the minimum scores for Data, Model, ML Infrastructure, and Monitoring tests?
    • What score qualifies to pass into production? What is the rationale for passing if less than exceptional (score of >5)?
    • What were the lessons learned? Were adjustments made to move to a higher score? What were the adjustments?
  • Who makes the determination when the AI investment is deployed/refreshed?

How are changes identified and managed?




References


Nature of risks inherent to AI applications: We believe that the challenge in governing AI is less about dealing with completely new types of risk and more about existing risks either being harder to identify in an effective and timely manner, given the complexity and speed of AI solutions, or manifesting themselves in unfamiliar ways. As such, firms do not require completely new processes for dealing with AI, but they will need to enhance existing ones to take into account AI and fill the necessary gaps. The likely impact on the level of resources required, as well as on roles and responsibilities, will also need to be addressed. AI and risk management: Innovating with confidence | Deloitte

How Should We Evaluate Machine Learning for AI?: Percy Liang
YouTube: http://www.youtube.com/watch?v=7CcSm0PAr-Y

Machine learning has undoubtedly been hugely successful in driving progress in AI, but it implicitly brings with it the train-test evaluation paradigm. This standard evaluation only encourages behavior that is good on average; it does not ensure robustness as demonstrated by adversarial examples, and it breaks down for tasks such as dialogue that are interactive or do not have a correct answer. In this talk, I will describe alternative evaluation paradigms with a focus on natural language understanding tasks, and discuss ramifications for guiding progress in AI in meaningful directions. Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction. Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).

Machine Learning, Technical Debt, and You - D. Sculley (Google)
YouTube: http://www.youtube.com/watch?v=V18AsBIHlWs

Machine Learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. In this talk, we'll argue it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns. We then show how to pay down ML technical debt by following a set of recommended best practices for testing and monitoring needed for real world systems. D. Sculley is a Senior Staff Software Engineer at Google