Difference between revisions of "Evaluation"

Latest revision as of 23:09, 5 December 2023

YouTube ... Quora ...Google search ...Google News ...Bing News

Strategy & Tactics ... Project Management ... Best Practices ... Checklists ... Project Check-in ... Evaluation ... Measures

Prompts for assessing AI projects

1 What challenge does the AI investment solve?
2 How does the AI meet the challenge?
3 Who is providing leadership?
4 Are best practices being followed?
5 What Laws, Regulations and Policies (LRPs) pertain, e.g. GDPR??
6 What portion of the AI is developed inhouse and what is/will be procured?
7 How is AI success measured?
8 What AI governance is in place?
9 What is the algorithm administration strategy?
10 How are changes identified and managed?

What challenge does the AI investment solve?

Has the problem been clearly defined?
What mission outcome(s) will be benefited by the AI investment, e.g. to increase revenue (marketing), to be more competitive (gain capability), to increase performance (detection, automation, discovery, reduce costs (optimization, predictive maintenance, reduce inventory), time reduction, provide personalization (recommendations), avoid risk of non-compliance, better communication (user interface, natural-language understanding, telecommunications), broader and better integration (Internet of Things (IoT), smart cities), or other outcome(s)?
Would you classify the AI investment as being evolutionary, revolutionary, or disruptive?
Was market research performed, what were the results? What similar functionality exists in other solutions where lessons can be applied to the AI investment? Can the hypothesis be tested? Playing devil's advocate, could there be a flaw in the analogical reasoning?
Have opportunistic AI aspects of the end-to-end mission process(es) been reviewed?
- Was a knowledge-based approach used for the review? If AI was used for optimizing or simulating the process?
- For each aspect how does the AI augment human users?
Does the business case for the AI investment define clear objectives?
Whose need(s) is the AI investment addressing?
Is there a brochure-type version of requirements shared with stakeholders? Is dialog with stakeholders ongoing?

How does the AI meet the challenge?

What AI is being implemented? Descriptive (what happened?), Diagnostic (why did it happen?), Predictive/Preventive (what could happen?), Prescriptive (what should happen?), Cognitive (what steps should be taken, Cybersecurity?)
What algorithms are used or are being considered? How was/will the choice selected?
What learning techniques have/are planned for the AI investment, e.g. Human-in-the-Loop (HITL) Learning?
How was feasibility determined? Were there AI pilot(s) prior current investment?

Who is providing leadership?

Is leadership's AI strategy documented and articulated well?
Does the AI investment strategy align with the organization's overall strategy, culture, and values? Does the organization appreciate experimental processes?
Is there a time constraint? Does the schedule meet the Technology Readiness Level (TRL) of the AI investment?
Is the AI investment properly resourced? budgeted, trained staff with key positions filled?
Responsibility clearly defined and communicated for AI research, performing data science, applied machine intelligence engineering, qualitative assurance, software development, implementing foundational capabilities, user experience, change management, configuration management, security, backup/contingency, domain expertise, and project management
Of these identified responsibilities which situations are they outsourced? What strategy is incorporated to convey the AI investment knowledge to the organization?
Is the organization positioned or positioning to scale its current state with AI?

Are best practices being followed?

Are best practices documented/referenced?
Is cybersecurity a component of best practices?
Is the team trained in the best practices, e.g. AI Governance, Data Governance, AIOps?
What checklists are used?
Is there a product roadmap?

What Laws, Regulations and Policies (LRPs) pertain, e.g. GDPR??

Are use cases testable and traceable to requirements, including LRPs?
When was the last time compliance requirements and regulations were examined? What adjustments were/must be made?
Does the AI investment require testing by external assessors to ensure compliance and/or auditing requirements?

What portion of the AI is developed inhouse and what is/will be procured?

If the AI is procured/outsourced, e.g. embedded in sensor product, what items are included in the contract to future proof the solution?
Contract items to protect organization reuse data rights?
Does acceptance criteria include a proof of capability?
How well do a vendor's service/product(s) and/or client references compare with the AI investment objectives?
How is/was the effort estimated? If procured AI, what factors were used to approximate the needed integration resources?

How is AI success measured?

What are the significant measures that indicate success? Are tradeoff rationale documented, e.g. accuracy vs speed?
Are the ways the mission is being measured clear, realistic, and documented? Specifically what are the AI investment's performance measures?
What is the Return on Investment (ROI)? Is the AI investment on track with original ROI target?
If there is/was an Analysis of Alternatives how were these measures used? What were the findings?
What mission metrics will be impacted with the AI investment? What drivers/measures have the most bearing? Of these performance indicators which can be used as leading indicators of the health of the AI investment?
What are the specific decisions and activities to impact each driver/measure?
What assumptions are being made? Of these assumptions, what constraints are anticipated?
Where does the AI investment fit in the portfolio? Are there possible synergies with other aligned efforts in the portfolio? Are there other related AI investments? If so, is this AI investment dependent on the other investment(s)? What investments require this AI investment to be successful? If so, how? Are there mitigation plans in place?
How would you be able to tell if the AI investment was working properly?
- Have the baseline(s) for model performance been established? What benchmarks are the AI model compared/scored? e.g. Global Vectors for Word Representation (GloVe)
- How perfect does AI have to be to trust it?
- What is the inference/prediction rate performance metric for the AI investment?
  - What is the current inference/prediction/ True Positive Rate (TPR)?
  - What is the False Positive Rate (FPR)? How does AI reduce false-positives without increasing false negatives?
  - Is there a Receiver Operating Characteristic (ROC) curve; plotting the True Positive Rate (TPR) against the False Positive Rate (FPR)?
Is/will A/B testing or multivariate testing be performed?

What AI governance is in place?

Does AI Governance implement a risk-based approach, e.g. greater consideration or controls for high risk use cases?
What are the AI architecture specifics, e.g. Ensemble Learning methods used, Graph Convolutional Network (GCN), Graph Neural Networks (Graph Nets), Geometric Deep Learning, Digital Twin, Decentralized: Federated & Distributed?
Is the wetware/brain or hardware involved, e.g. Internet of Things (IoT); physical sensors, mobile phones, screening devices, cameras/surveillance, medical instrumentation, robots, autonomous vehicles, drones, quantum computing, assistants/chatbots?
What learning technique(s) are or will be implemented? If a a transfer process is used, which model(s) and what mission specific dataset(s) are used to tune the AI model?
What AI algorithms/model type(s) are used? Regression, K-Nearest Neighbors (KNN), Deep Neural Network (DNN), Natural Language Processing (NLP), Association Rule Learning, etc.
Do requirements trace to tests?
- If using machine learning, how are the models evaluated?
- Has an error analysis been performed to reveal failure scenarios?
- How is troubleshooting accomplished? How transparent is the development process?
- How is bias accounted for in the AI process? What assurance is provided that the model (algorithm) is not biased?
- Is one of the mission's goals to be able to understand the AI in terms of inputs and their relationship impacts outcome (prediction)? Is model (implemented or to be implemented) explainable? Interpretable? How so? Are stakeholders used? How?
What is the data governance process? How are data silos governed? What data controls and policies are in place today? Planned?
- Is there data management plan(ning)? Does data planning address metadata for dataflows and data transitions?
  - Are the internal data resources available and accessible? What processes need to change to best obtain the data?
  - For external data resources, have they been sourced with contracts in place to make the data available and accessible?
- Has ground truth been defined? Has the source(s) of data been identified for current AI investment? addressing ambiguous data, for future use AI investment(s)?
  - What are the possible constraints or challenges in accessing or incorporating the identified data?
  - Are permissions in place to use the data, with privacy?
  - Are security restrictions considered and mitigated? What data needs protection? How is it protected with remote work?
  - What is the expected size of the data to be used for training? What is the ratio of observations(rows) to features (columns)?
  - How good is the quality of the data; skewed, completeness, duplication, timeliness (vs outdated), clean? If there a data management plan, is there a section on data quality?
  - How are the dataset(s) used are assured to represent the problem space?
  - How does the (proposed) process eliminate the injection of fake data into the process?
  - What Key Performance Indicators (KPI) can the data potentially drive to achieve key mission objective(s)? What data is missing in order to establish the Key Performance Indicators (KPI)?
  - Is there sufficient amount of data available? If temporal model, does the data have a rich history set? Does the historical data cover periodic and other critical events?
  - Does the data have a refresh schedule? Does the data punctual; arrives on time, or ready to be pulled?
  - Is there an effort to identify unintended feedback loop(s)?
- For each data content, has the information been determined to be structured, semi-structured, unstructured?
  - Will any data labeling be required? Is the data augmented? Is auto-tagging used? What data augmentation tools are/will be used?
  - Have the key features/data attributes to be used in the AI model been identified?
  - Will the labeling be enabled by merging domain knowledge with ontologies? If so, have concepts and associations been identified?
  - How good is the quality of the data labeling? How close is the ground truth to being a gold standard?
  - What data/feature exploration/engineering processes and tools are in place or being considered?
  - If needed, what are the algorithms used to combine AI features? What is the approximate number of features used?
  - What is the process of the removal of features/data that is believed are not relevant?
- What data quality checks are in place? What a tool are in place or being considered?

What is the algorithm administration strategy?

What is the deployment vision? What attributes are being used to size the investment, count of users, queries, installations, etc.? What is the Minimum Viable Product (MVP) version of the AI investment that has enough features to satisfy early users and provide feedback for future investment development.? If an incremental rollout, how what is the strategy, portion of the users, markets, locations, capabilities?
What tool(s) are used or will be used for model management?
- How are Hyperparameters managed? What optimizers are used, e.g. automated learning (AutoML)?
- What components, e.g. optimizer, tuner, training, versioning, model dependencies; e.g. training data, dataset(s), historical lineage, publishing, performance evaluations, and model storing are integrated in the model management tool(s)?
- What is the reuse strategy? Is there a single POC for the reuse process/tools?
  - Are the AI models published (repo, marketplace) for reuse, if so where?
  - Is the AI model reused from a repository (repo, marketplace)? If so, which one(s)?
- Is Master Data Management (MDM) in place? What tools are available or being considered?
  - Is data lineage managed?
  - What data cataloging capabilities exists today? Future capabilities?
  - How are versioning|data versions controlled?
  - How are the dataset(s) used for AI training, testing and validation managed?
  - Are logs kept on which data is used for different executions/training so that the information used is traceable?
  - How is the access to the information guaranteed? Are the dataset(s) for AI published (repo, marketplace) for reuse, if so where?
What is the development & implementation plan?
- What foundational capabilities are defined or in place for the AI investment? infrastructure platform, cloud resources?
  - What languages & scripting are/will be used? e.g. Python, JavaScript, PyTorch
  - What Libraries & Frameworks are used?
  - Are notebooks used? If so, is Jupyter supported?
  - What visualizations are used for development? For AI investment user(s)?
  - Will the AI investment leverage Machine Learning as a Service (MLaaS)? Or be offered as a MLaaS?
- How is the AI investment deployed?
  - What is the plan for model serving? For each use case, is the serving batched or streamed? If applicable, have REST endpoints been defined and exposed?
  - Is the AI investment implementing an AIOps pipeline/toolchain?
  - What tools are used for the AIOps? Please identify those on-premises and online services?
  - Are the AI languages, libraries, scripting, and AIOps applications registered in the organization?
- Are the processes and decisions architecture driven to allow for end-to-end visibility and allow for dependency management? Is information mapped to the intended use to allow analytics and visualizations framed in context?
  - Does the AI investment depict the AIOps pipeline/toolchain applications in its architecture, e.g tech stack?
  - Does the SecDevOps depict the AI investment in its architecture and how the health metrics are depicted?
  - Is algorithm administration reflected in the AIOps pipeline/toolchain processes/architecture?
- How is production readiness determined?
  - Does the team use ML Test Score for production readiness?
    - What are the minimum scores for Data, Model, ML Infrastructure, and Monitoring tests?
    - What score qualifies to pass into production? What is the rationale for passing if less than exceptional (score of >5)?
    - What were the lessons learned? Were adjustments made to move to a higher score? What were the adjustments?
  - Who makes the determination when the AI investment is deployed/refreshed?
  - How does the team ready for cybersecurity? use the MITRE ATT&CK™ Framework? ...use the GSA DevSecOps Guide?

How are changes identified and managed?

What capabilities are in place to identify, track, and notify changes?
How often is the deployed AI process monitored or measures re-evaluated?
Does the deployed AI investment collecting learning data? If so, how frequently are the algorithms updated?
What aspects of the AI investment are being monitored, e.g. performance, model functionality, system, data (pipeline)?
If the AI model reused from a repository (repo, marketplace), how is the team notified of updates? How often is the repository checked for updates?
Are the end-to-end visibility and bottleneck risks for AIOps pipeline/toolchain reflected in the risk register with mitigation strategy for each risk? Do mitigations tend to address symptoms only, or do the mitigations lead to improving root cause via analysis?
When the AI model is updated, how is it determined that the performance was indeed increased for the better?
What capabilities are in place to perceive, notify, and address operational environment changes? Detect and remediate drift, when the AI degrades over time due to data and the model is no longer effective in the environment?
Is there a mechanism (automated, assisted, or manual) to provide change/event causation? Does the mechanism use AI, e.g. anomaly detection?
Are response plans, procedures and training in place to address AI attack or failure incidents? How are AI investment’s models audited for security vulnerabilities?
How is the team notified changes? Active flagging/messaging (push) and passive health dashboard (polling)? How does the team use the end-to-end information to optimize the organization's resources and process/service(s)?
Has role/job displacement due to automation and/or AI implementation being addressed?

References

Strategy & Tactics ... Project Management ... Best Practices ... Checklists ... Project Check-in ... Evaluation ... Measures
- Accuracy
- Precision & Recall (Sensitivity)
- Specificity
- Return on Investment (ROI)
Predictive Analytics ... Predictive Maintenance ... Forecasting ... Market Trading ... Sports Prediction ... Marketing ... Politics ... Excel
AI Governance / Algorithm Administration
AI Verification and Validation
Generative AI for Business Analysis
Leadership
Procuring
ML Test Score
Cybersecurity: Evaluating & Selling
Automated Scoring
Risk, Compliance and Regulation
Screening; Passenger, Luggage, & Cargo
Guidance on the AI auditing framework | Information Commissioner's Office (ICO)
Technology Readiness Assessments (TRA) Guide | US GAO ...used to evaluate the maturity of technologies and whether they are developed enough to be incorporated into a system without too much risk.
Cybersecurity Reference and Resource Guide | DOD
Joint Capabilities Integration and Development System (JCIDS) | DOD
Five ways to evaluate AI systems | Felix Wetzel - Recruiting Daily
Cyber Security Evaluation Tool (CSET®) ...provides a systematic, disciplined, and repeatable approach for evaluating an organization’s security posture.
3 Common Technical Debts in Machine Learning and How to Avoid Them | Derek Chia - Towards Data Science
Why you should care about debugging machine learning models | Patrick Hall and Andrew Burt - O'reilly
How to Assess an Artificial Intelligence Product or Solution (Even if You’re Not an AI Expert) | Daniel Faggella - Emerj

Nature of risks inherent to AI applications: We believe that the challenge in governing AI is less about dealing with completely new types of risk and more about existing risks either being harder to identify in an effective and timely manner, given the complexity and speed of AI solutions, or manifesting themselves in unfamiliar ways. As such, firms do not require completely new processes for dealing with AI, but they will need to enhance existing ones to take into account AI and fill the necessary gaps. The likely impact on the level of resources required, as well as on roles and responsibilities, will also need to be addressed. AI and risk management: Innovating with confidence | Deloitte

How Should We Evaluate Machine Learning for AI?: Percy Liang YouTube: https://www.youtube.com/watch?v=7CcSm0PAr-Y Machine learning has undoubtedly been hugely successful in driving progress in AI, but it implicitly brings with it the train-test evaluation paradigm. This standard evaluation only encourages behavior that is good on average; it does not ensure robustness as demonstrated by adversarial examples, and it breaks down for tasks such as dialogue that are interactive or do not have a correct answer. In this talk, I will describe alternative evaluation paradigms with a focus on natural language understanding tasks, and discuss ramifications for guiding progress in AI in meaningful directions. Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction. Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).

Machine Learning, Technical Debt, and You - D. Sculley (Google) YouTube: https://www.youtube.com/watch?v=V18AsBIHlWs Machine Learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. In this talk, we'll argue it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns. We then show how to pay down ML technical debt by following a set of recommended best practices for testing and monitoring needed for real world systems. D. Sculley is a Senior Staff Software Engineer at Google

Difference between revisions of "Evaluation"

Latest revision as of 23:09, 5 December 2023

Contents

What challenge does the AI investment solve?

How does the AI meet the challenge?

Who is providing leadership?

Are best practices being followed?

What Laws, Regulations and Policies (LRPs) pertain, e.g. GDPR??

What portion of the AI is developed inhouse and what is/will be procured?

How is AI success measured?

What AI governance is in place?

What is the algorithm administration strategy?

How are changes identified and managed?

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

@@ Line 2: / Line 2: @@
 |title=PRIMO.ai
 |titlemode=append
-|keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, Microsoft, Azure, Amazon, AWS
+|keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
-|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
+<!-- Google tag (gtag.js) -->
+<script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script>
+<script>
+  window.dataLayer = window.dataLayer || [];
+  function gtag(){dataLayer.push(arguments);}
+  gtag('js', new Date());
+  gtag('config', 'G-4GCWLBVJ7T');
+</script>
 }}
-[http://www.youtube.com/results?search_query=Technical+Assessment+Evaluation+Performance+artificial+intelligence+Deep+Machine+Learning YouTube search...]
+[https://www.youtube.com/results?search_query=ai+Technical+Assessment+Evaluation+project+review+performance YouTube]
-[http://www.google.com/search?q=Technical+Assessment+Evaluation+Performance+artificial+intelligence+Deep+Machine+Learning ...Google search]
+[https://www.quora.com/search?q=ai%20Technical%20Assessment%20Evaluation%20project%20review%20performance ... Quora]
+[https://www.google.com/search?q=ai+Technical+Assessment+Evaluation+project+review+performance ...Google search]
+[https://news.google.com/search?q=ai+Technical+Assessment+Evaluation+project+review+performance ...Google News]
+[https://www.bing.com/news/search?q=ai+Technical+Assessment+Evaluation+project+review+performance&qft=interval%3d%228%22 ...Bing News]
-* Evaluation
+* [[Strategy & Tactics]] ... [[Project Management]] ... [[Best Practices]] ... [[Checklists]] ... [[Project Check-in]] ... [[Evaluation]] ... [[Evaluation - Measures|Measures]]
-** [[Evaluation - Measures]]
-*** [[Evaluation - Measures#Accuracy|Accuracy]]
-*** [[Evaluation - Measures#Precision & Recall (Sensitivity)|Precision & Recall (Sensitivity)]]
-*** [[Evaluation - Measures#Specificity|Specificity]]
-*** [[Benchmarks]]
-** [[Bias and Variances]]
-** [[Explainable Artificial Intelligence (XAI)]]
-** [[Train, Validate, and Test]]
-** [[Model Monitoring]]
-* [[AIOps / MLOps]]
-* [[Automated Scoring]]
-* [[Imbalanced Data]]
-* [http://recruitingdaily.com/five-ways-to-evaluate-ai-systems/ Five ways to evaluate AI systems | Felix Wetzel - Recruiting Daily]
-* [http://github.com/cisagov/cset/releases Cyber Security Evaluation Tool (CSET®)] ...provides a systematic, disciplined, and repeatable approach for evaluating an organization’s security posture.
-* [http://towardsdatascience.com/3-common-technical-debts-in-machine-learning-and-how-to-avoid-them-17f1d7e8a428 3 Common Technical Debts in Machine Learning and How to Avoid Them | Derek Chia - Towards Data Science]
-Many products today leverage artificial intelligence for a wide range of industries, from healthcare to marketing. However, most business leaders who need to make strategic and procurement decisions about these technologies have no formal AI background or academic training in data science. The purpose of this article is to give business people with no AI expertise a general guideline on how to assess an AI-related product to help decide whether it is potentially relevant to their business. [http://emerj.com/ai-sector-overviews/how-to-assess-an-artificial-intelligence-product-or-solution-for-non-experts/ How to Assess an Artificial Intelligence Product or Solution (Even if You’re Not an AI Expert) | Daniel Faggella - Emerj]
 <hr>
-<center><b> Assessment Questions </b>- Artificial Intelligence (AI) / Machine Learning (ML) / Machine Intelligence (MI) </center>
+<center><b>Prompts for assessing AI projects</b></center>
 <hr>
+<hr>
+=== What challenge does the AI investment solve?  ===
+* Has the problem been clearly defined?
+* What mission outcome(s) will be benefited by the [[What is AI?|AI investment]], e.g. to increase revenue ([[Marketing|marketing]]), to be more competitive ([[Moonshots|gain capability]]), to increase performance ([[Anomaly Detection|detection]], [[Robotics|automation]], discovery, reduce costs ([[Agriculture|optimization]], [[Operations & Maintenance|predictive maintenance]], [[Forecasting|reduce inventory]]), [[Drug Discovery|time reduction]], provide personalization ([[Recommendation|recommendations]]), [[Risk, Compliance and Regulation|avoid risk of non-compliance]], better communication ([[Assistants|user interface]], [[Natural Language Processing (NLP)#Natural Language Understanding (NLU)|natural-language understanding]], [[Telecommunications|telecommunications]]), broader and better integration ([[Internet of Things (IoT)]], [[Smart Cities|smart cities]]), or [[Case Studies|other outcome(s)]]?
+* Would you classify the AI investment as being [https://www.linkedin.com/pulse/you-disruptive-evolutionary-revolutionary-so-should-survey-d-eon/ evolutionary, revolutionary, or disruptive]?
+* Was market research performed, what were the results?  What [[Emergence#Emergence from Analogies|similar functionality exists in other solutions where lessons can be applied to the AI investment?]] Can the hypothesis be tested?   Playing devil's advocate, could there be a flaw in the analogical reasoning?
+* Have opportunistic AI aspects of the [[Enterprise Architecture (EA)|end-to-end mission process(es)]] been reviewed?
+** Was a [[Context|knowledge-based]] approach used for the review?  If AI was used for optimizing or simulating the process?
+** For each aspect [[Human-in-the-Loop (HITL) Learning#Augmented Intelligence|how does the AI augment human users?]]
+* Does the [[Strategy & Tactics#Business Case|business case]] for the AI investment define clear objectives?
+* Whose need(s) is the AI investment addressing?
+* Is there a brochure-type version of [[Requirements Management|requirements]] shared with stakeholders? Is dialog with stakeholders ongoing?
+=== How does the AI meet the challenge? ===
+* What AI is being implemented?  [[What is AI? | Descriptive (what happened?), Diagnostic (why did it happen?), Predictive/Preventive (what could happen?), Prescriptive (what should happen?), Cognitive (what steps should be taken]], [[Cybersecurity]]?)
+* What [[PRIMO.ai#Algorithms|algorithms are used or are being considered?]]  How was/will the choice selected?
+* What [[Learning Techniques|learning techniques]] have/are planned for the AI investment, e.g. [[Human-in-the-Loop (HITL) Learning]]?
+* How was feasibility determined? Were there AI pilot(s) prior current investment?
+=== Who is providing [[Leadership|leadership]]? ===
+* Is [[Leadership|leadership]]'s [[Strategy & Tactics | AI strategy]] documented and articulated well?
+* Does the AI investment strategy align with the organization's overall strategy, culture, and values? Does the organization appreciate experimental processes?
+* Is there a time constraint?  Does the schedule meet the [https://en.wikipedia.org/wiki/Technology_readiness_level Technology Readiness Level (TRL)] of the AI investment?
+* Is the AI investment properly resourced?  budgeted, [[Education|trained]] staff with [[Human Resources (HR)|key positions filled]]?
+* Responsibility clearly defined and communicated for AI research, performing data science, applied machine intelligence engineering, qualitative assurance, software [[development]], implementing foundational capabilities, user experience, change management, configuration management, security, backup/contingency, domain expertise, and project management
+* Of these identified responsibilities which situations are they outsourced? What strategy is incorporated to convey the AI investment knowledge to the organization?
+* Is the organization positioned or positioning to scale its current state with AI?
+=== Are [[Best Practices|best practices]] being followed?  ===
+* Are [[Best Practices|best practices]] documented/referenced?
+* Is [[Cybersecurity: Evaluating & Selling|cybersecurity]] a component of [[Best Practices|best practices]]?
+* Is the team [[Courses & Certifications |trained]] in the [[Best Practices|best practices]], e.g. [[AI Governance]], [[Data Governance]], [[Algorithm Administration#AIOps/MLOps|AIOps]]?
+* What [[Checklists|checklists]] are used?
+* Is there a [[Project Management#Product Roadmap|product roadmap?]]
+=== What [[Law#Artificial Intelligence Law|Laws, Regulations and Policies (LRPs)]] pertain, e.g. [[Privacy#General Data Protection Regulations (GDPR)|GDPR]]?? ===
+* Are use cases testable and traceable to requirements, including [[Law#Artificial Intelligence Law|LRPs]]?
+* When was the last time [[Risk, Compliance and Regulation|compliance requirements and regulations]] were examined?  What adjustments were/must be made?
+* Does the AI investment require testing by external assessors to ensure compliance and/or auditing requirements?
-* What challenge does the AI solve?
+=== What portion of the AI is developed inhouse and what is/will be [[Procuring |procured?]] ===
-* Is the intent of AI to increase performance (detection), reduce costs (predictive maintenance, reduce inventory) , decrease response time, or other outcome(s)?
+* If the AI is [[Procuring | procured/outsourced]], e.g. embedded in sensor product, what items are included in the contract to future proof the solution?
-* What is the clear and realistic way of measuring the success of the AI initiative?
-* Does the AI reside in a [[Evaluation#Buying| procured item/application/solution or developed in house]]?
-* If the AI is [[Evaluation#Buying| procured]], e.g. embedded in sensor product, what items are included in the contract to future proof the solution?
-* Let the organization use implementation to gain better capability in the future?
 * Contract items to protect organization reuse data rights?
-* What analytics is the AI resolving?  Descriptive (what happened?), Diagnostic (why did it happen?), Predictive/Preventive (what could happen?), Prescriptive (what should happen?), Cognitive (what steps should be taken?)
+* Does acceptance criteria include a proof of capability?
-* Are [[Evaluation#Best Practices| Best Practices]] being followed?
+* How well do a vendor's service/product(s) and/or client references compare with the AI investment objectives?
-* What is the [[Evaluation#ML Test Score| ML Test Score?]]
+* How is/was the effort estimated?  If procured AI, what factors were used to  approximate the needed integration resources?
-* What is the current inference/prediction/true positive rate (TPR) rate?
-* How perfect does AI have to be to trust it?  What is the inference/prediction rate performance metric for the Program?
+=== How is AI success measured? ===
-* What is the false-positive rate?  How does AI reduce false-positives without increasing false negatives?  What is the false-positive rate performance metric for the Program?   Is there a Receiver Operating Characteristic (ROC) curve; plotting the true positive rate (TPR) against the false positive rate (FPR) ?
+* What are the significant [[Evaluation - Measures| measures]] that indicate success? Are tradeoff rationale documented, e.g. accuracy vs speed?
-* Has the data been identified for AI (current application or for future use) initiative(s)?   Is the data labelled, or require manual labeling?
+* Are the ways the mission is being measured clear, realistic, and documented?  Specifically what are the AI investment's performance measures?
-* Have the key features to be used in the AI model been identified?  If needed, what are the algorithms used to combine AI features?  What is the approximate number of features used?
+* What is the [[Project Management#Return on Investment (ROI)|Return on Investment (ROI)]]? Is the AI investment on track with original [[Project Management#Return on Investment (ROI)|ROI]] target?
-* How are the [[Datasets|dataset(s)]] used for AI training, testing and Validation managed?  Are logs kept on which data is used for different executions/training so that the information used is traceable?  How is the access to the information guaranteed?
+* If there is/was an [https://en.wikipedia.org/wiki/Analysis_of_Alternatives Analysis of Alternatives] how were these measures used?  What were the findings?
-* Are the [[Datasets|dataset(s)]] for AI published (repo, marketplace) for reuse, if so where?
+* What mission metrics will be impacted with the AI investment?  What drivers/[[Evaluation - Measures|measures]] have the most bearing? Of these performance indicators which can be used as [https://sloanreview.mit.edu/projects/leading-with-next-generation-key-performance-indicators/ leading indicators] of the health of the AI investment?
-* What AI model type(s) are used?  Regression, K-Nearest Neighbors (KNN), Graph Neural Networks, reinforcement, rule-based
+* What are the specific decisions and activities to impact each driver/[[Evaluation - Measures|measure]]?
-* What are the AI architecture specifics, e.g. ensemble methods used, graph network, or distributed learning?
+* What assumptions are being made?  Of these assumptions, what constraints are anticipated?
-* Are the AI models published (repo, marketplace) for reuse, if so where?
+* Where does the AI investment fit in the portfolio?  Are there possible synergies with other aligned efforts in the portfolio?  Are there other related AI investments? If so, is this AI investment dependent on the other investment(s)?  What investments require this AI investment to be successful?  If so, how?  Are there mitigation plans in place?
-* Is the AI model reused from a repository (repo, marketplace)?  If so, which one?  How are you notified of updates? How often is the repository checked for updates?
+* How would you be able to tell if the AI investment was working properly?
-* Is [[Transfer Learning]] used?  If so, which AI models are used?  What mission specific [[Datasets|dataset(s)]] are used to tune the AI model?
+** Have the baseline(s) for model performance been established? What [[Benchmarks|benchmarks]] are the AI model compared/scored?  e.g. [[Global Vectors for Word Representation (GloVe)]]
-* Are AI service(s) are used for inference/prediction?
+** How perfect does AI have to be to [[Explainable / Interpretable AI#Trust|trust]] it?
-* What AI languages, libraries, scripting, are implemented?
+** What is the inference/prediction rate performance metric for the AI investment?
-* What tools are used for the [[AIOps / MLOps]]?  Please identify those on-premises and online services?
+*** What is the current inference/prediction/[[Evaluation - Measures#Receiver Operating Characteristic (ROC) | True Positive Rate (TPR)]]?
-* Are the AI languages, libraries, scripting, and [[AIOps / MLOps]] applications registered in the organization?
+*** What is the [[Evaluation - Measures#Receiver Operating Characteristic (ROC) | False Positive Rate (FPR)]]?  How does AI reduce false-positives without increasing false negatives?
-* What optimizers are used?  Is augmented machine learning (AugML) or automated machine learning (AutoML) used?
+*** Is there a [[Evaluation - Measures#Receiver Operating Characteristic (ROC) |Receiver Operating Characteristic (ROC) curve]]; plotting the [[Evaluation - Measures#Receiver Operating Characteristic (ROC) | True Positive Rate (TPR)]] against the [[Evaluation - Measures#Receiver Operating Characteristic (ROC) | False Positive Rate (FPR)]]?
-* When the AI model is updated, how is it determined that the performance was indeed increased for the better?
+* Is/will [[AI Verification and Validation#A/B Testing|A/B testing]] or [[AI Verification and Validation#Multivariate Testing|multivariate testing]] be performed?
-* What [[Benchmarks|benchmark] standard(s) are the AI model compared/scored?  e.g. [[Global Vectors for Word Representation (GloVe)]]
-* How often is the deployed AI process monitored or measures re-evaluated?
+=== What [[AI Governance|AI governance]] is in place? ===
-* How is bias accounted for in the AI process?   How are the [[Datasetsdataset(s)]] used are assured to represent the problem space?  What is the process of the removal of features/data that is believed are not relevant? What assurance is provided that the model (algorithm) is not biased?
+* Does [[AI Governance]] implement a risk-based approach, e.g. greater consideration or controls for high risk use cases?
-* Is the model (implemented or to be implemented) explainable?  How so?
+* What are the [[Enterprise Architecture (EA)|AI architecture]] specifics, e.g. [[Ensemble Learning]] methods used, [[Graph Convolutional Network (GCN), Graph Neural Networks (Graph Nets), Geometric Deep Learning]], [[Digital Twin]],  [[Decentralized: Federated & Distributed]]?
-* Has role/job displacement due to automation and/or AI implementation being addressed?
+* Is the [[Neuroscience|wetware/brain]] or hardware involved, e.g. [[Internet of Things (IoT)]]; [[Signal Processing|physical sensors]], [[JavaScript|mobile phones]], [[Screening; Passenger, Luggage, & Cargo|screening devices]], [[Vision|cameras/surveillance]], [[Healthcare|medical instrumentation]], [[Robotics|robots]], [[Transportation (Autonomous Vehicles)|autonomous vehicles]], [[Autonomous Drones|drones]], [[Quantum|quantum computing]], [[Assistants|assistants/chatbots]]?
-* Are User and Entity Behavior Analytics (UEBA) and AI used to help to create a baseline for trusted workload access?
+* What [[Learning Techniques|learning technique(s) are or will be implemented?]] If a [[Transfer Learning| a transfer process]] is used, which model(s) and what mission specific [[Datasets|dataset(s)]] are used to tune the AI model?
-* Is AI being used for abnormality detection?  Security?
+* What [[PRIMO.ai#Algorithms|AI algorithms/model type(s) are used?]]  [[Regression]], [[K-Nearest Neighbors (KNN)]], [[Neural Network#Deep Neural Network (DNN)|Deep Neural Network (DNN)]], [[Natural Language Processing (NLP)]], [[Association Rule Learning]], etc.
-* Is AI used protect the Program against targeted attacks, often referred to as advanced targeted attacks (ATAs) or advanced persistent threats (APTs)?
+* Do requirements trace to tests?
-* If the Program is implementing AI, is the Program implementing an [[AIOps / MLOps]] pipeline/toolchain?
+** If using [[Evaluating Machine Learning Models|machine learning, how are the models evaluated?]]
-* Does the Program depict the [[AIOps / MLOps]] pipeline/toolchain applications in their tech stack?
+** Has an error analysis been performed to reveal failure scenarios?
-* Has the Program where AI is used in the SecDevOps architecture?  e.g. software testing
+** How is troubleshooting accomplished?  How transparent is the [[development]] process?
-* Does data management reflected in the [[AIOps / MLOps]] pipeline/toolchain processes/architecture?
+** How is [[Bias and Variances|bias accounted for in the AI process? What assurance is provided that the model (algorithm) is not biased?]]
-* Are the end-to-end visibility and bottleneck risks for [[AIOps / MLOps]] pipeline/toolchain reflected in the risk register with mitigation strategy for each risk?
+** Is one of the mission's goals to be able to understand the AI in terms of inputs and their relationship impacts outcome (prediction)?  Is model [[Explainable / Interpretable AI| (implemented or to be implemented) explainable?]] [[Explainable / Interpretable AI#Interpretable|Interpretable?]]  How so?  Are stakeholders used?  How?
+* What is the [[Data Governance|data governance]] process? How are data silos governed? What data controls and policies are in place today? Planned?
+** Is there data management plan(ning)?  Does data planning address metadata for dataflows and data transitions?
+*** Are the internal data resources available and accessible?  What processes need to change to best obtain the data?
+*** For external data resources, have they been sourced with contracts in place to make the data available and accessible?
+** Has [[Data Science#Ground Truth|ground truth]] been defined? Has the source(s) of [[Data Quality#Sourcing Data|data been identified for current AI investment?]] addressing ambiguous data, for future use AI investment(s)?
+*** What are the possible constraints or challenges in accessing or incorporating the identified data?
+*** Are permissions in place to use the data, with [[privacy]]?
+*** Are security restrictions considered and mitigated? What data needs protection? How is it protected with remote work?
+*** What is the expected size of the data to be used for training? What is the ratio of observations(rows) to features (columns)?
+*** How good is the [[Data Quality|quality of the data]]; [[Data Quality#Skewed Data|skewed]], [[Data Quality#Data Completeness|completeness]], duplication, timeliness (vs outdated), [[Data Quality#Data Cleaning|clean]]?  If there a data management plan, is there a section on [[Data Quality|data quality?]]
+*** How are the [[Datasets|dataset(s)]] used are assured to represent the problem space?
+*** How does the (proposed) process eliminate the injection of fake data into the process?
+*** What Key Performance Indicators (KPI) can the data potentially drive to achieve key mission objective(s)? What data is missing in order to establish the Key Performance Indicators (KPI)?
+*** Is there sufficient amount of data available? If temporal model, does the data have a rich history set? Does the historical data cover periodic and other critical events?
+*** Does the data have a refresh schedule?  Does the data punctual; arrives on time, or ready to be pulled?
+*** Is there an effort to identify unintended feedback loop(s)?
+** For each data content, has the information been determined to be [[Data Science#Structured, Semi-Structured, and Unstructured|structured, semi-structured, unstructured?]]
+*** Will any [[Data Quality#Data Augmentation, Data Labeling, and Auto-Tagging|data labeling be required?  Is the data augmented? Is auto-tagging used?]] What data augmentation tools are/will be used?
+*** Have the key features/data attributes to be used in the AI model been identified?
+*** Will the labeling be enabled by merging domain knowledge with [[Graph#Ontology|ontologies]]?  If so, have concepts and associations been identified?
+*** How good is the quality of the data labeling? How close is the [[Data Science#Ground Truth|ground truth]] to being a gold standard?
+*** What [[Feature Exploration/Learning|data/feature exploration/engineering processes and tools are in place or being considered?]]
+*** If needed, what are the algorithms used to combine AI features?  What is the approximate number of features used?
+*** What is the process of the removal of features/data that is believed are not relevant?
+** What [[Data Quality|data quality checks]] are in place?  What a tool are in place or being considered?
+=== What is the [[Algorithm Administration|algorithm administration]] strategy?  ===
+* What is the [https://en.wikipedia.org/wiki/Full_operating_capability deployment vision]?  What attributes are being used to size the investment, count of users, queries, installations, etc.? What is the [https://en.wikipedia.org/wiki/Minimum_viable_product Minimum Viable Product (MVP)] version of the AI investment that has enough features to satisfy early users and provide feedback for future investment [[development.]]? If an [https://en.wikipedia.org/wiki/Initial_operating_capability incremental rollout], how what is the strategy, portion of the users, markets, locations, capabilities?
+* What tool(s) are used or will be used for model management?
+** How are [[Algorithm Administration#Hyperparameter|Hyperparameters]] managed? What optimizers are used, e.g. [[Algorithm Administration#Automated Learning|automated learning (AutoML)]]?
+** What components, e.g. optimizer, tuner, training, [[Algorithm Administration#Versioning|versioning]], model dependencies; e.g. training data, [[Datasets|dataset(s)]], historical lineage, [[Writing/Publishing#Model Publishing|publishing]], performance evaluations, and model storing are integrated in the model management tool(s)?
+** What is the reuse strategy? Is there a single POC for the reuse process/tools?
+*** Are the AI models published (repo, marketplace) for reuse, if so where?
+*** Is the AI model reused from a repository (repo, marketplace)?  If so, which one(s)?
+** Is [[Algorithm Administration#Master Data Management (MDM)|Master Data Management (MDM)]] in place?  What tools are available or being considered?
+*** Is data lineage managed?
+*** What data cataloging capabilities exists today?  Future capabilities?
+*** How are [[Algorithm Administration#Versioning|versioning|data versions controlled?]]
+*** How are the [[Datasets|dataset(s)]] used for AI [[Train, Validate, and Test|training, testing and validation]] managed?
+*** Are logs kept on which data is used for different executions/training so that the information used is traceable?
+*** How is the access to the information guaranteed? Are the [[Datasets|dataset(s)]] for AI published (repo, marketplace) for reuse, if so where?
+* [[PRIMO.ai#Development & Implementation|What is the development & implementation plan]]?
+** What foundational capabilities are defined or in place for the AI investment? infrastructure platform, cloud resources?
+*** What languages & scripting are/will be used? e.g. [[Python]], [[JavaScript]], [[PyTorch]]
+*** What [[Libraries & Frameworks]] are used?
+*** Are [[Notebooks|notebooks]] used?  If so, is [[Jupyter]] supported?
+*** What [[Visualization|visualizations]] are used for [[development]]? [[Graphical Tools for Modeling AI Components|For AI investment user(s)?]]
+*** Will the AI investment leverage [[PPlatforms: AI/Machine Learning as a Service (AIaaS/MLaaS)|Machine Learning as a Service (MLaaS)? Or be offered as a MLaaS?]]
+** How is the AI investment deployed?
+*** What is the plan for model serving?  For each use case, is the serving batched or streamed?  If applicable, have REST endpoints been defined and exposed?
+*** Is the AI investment implementing an [[Algorithm Administration#AIOps/MLOps|AIOps]] pipeline/toolchain?
+*** What tools are used for the [[Algorithm Administration#AIOps/MLOps|AIOps]]?  Please identify those on-premises and online services?
+*** Are the AI languages, libraries, scripting, and [[Algorithm Administration#AIOps/MLOps|AIOps]] applications registered in the organization?
+** Are the processes and decisions [[Enterprise Architecture (EA)|architecture]] driven to allow for end-to-end visibility and allow for dependency management?  Is information mapped to the intended use to allow [[Analytics|analytics]] and [[Visualization|visualizations]] framed in [[context]]?
+*** Does the AI investment depict the [[Algorithm Administration#AIOps/MLOps|AIOps]] pipeline/toolchain applications in its architecture, e.g tech stack?
+*** Does the [[Algorithm Administration#AIOps/MLOps|SecDevOps]] depict the AI investment in its architecture and how the health metrics are depicted?
+*** Is [[Algorithm Administration|algorithm administration]] reflected in the [[Algorithm Administration#AIOps/MLOps|AIOps]] pipeline/toolchain processes/architecture?
+** How is production readiness determined?
+*** Does the team use [[ML Test Score]] for production readiness?
+**** What are the minimum scores for Data, Model, ML Infrastructure, and Monitoring tests?
+**** What score qualifies to pass into production? What is the rationale for passing if less than exceptional (score of >5)?
+**** What were the lessons learned?  Were adjustments made to move to a higher score?  What were the adjustments?
+*** Who makes the determination when the AI investment is deployed/refreshed?
+*** How does the team ready for cybersecurity?  use the [[Cybersecurity#MITRE ATT&CK™|MITRE ATT&CK™ Framework?]] ...use the [[Algorithm Administration#DevSecOps in Government|GSA DevSecOps Guide?]]
+=== How are changes identified and managed?  ===
+* What capabilities are in place to identify, track, and notify changes?
+* How often is the deployed AI process [[Algorithm Administration#Model Monitoring|monitored or measures re-evaluated?]]
+* Does the deployed AI investment collecting learning data? If so, how frequently are the algorithms updated?
+* What aspects of the AI investment are being monitored, e.g. [[Evaluation - Measures|performance]], [[Algorithm Administration#Model Monitoring|model]]  functionality, [[Algorithm Administration#AIOps/MLOps|system]], [[Data Governance|data]] (pipeline)?
+* If the AI model reused from a repository (repo, marketplace), how is the team notified of updates? How often is the repository checked for updates?
+* Are the end-to-end visibility and bottleneck risks for [[Algorithm Administration#AIOps/MLOps|AIOps]] pipeline/toolchain reflected in the risk register with mitigation strategy for each risk?  Do mitigations tend to address symptoms only, or do the mitigations lead to improving root cause via analysis?
+* When the AI model is updated, how is it [[AI Verification and Validation|determined that the performance was indeed increased for the better?]]
+* What capabilities are in place to perceive, notify, and address operational environment changes? [[Algorithm Administration#Model Monitoring|Detect and remediate drift, when the AI degrades over time due to data and the model is no longer effective in the environment?]]
+* Is there a mechanism (automated, assisted, or manual) to provide change/event [[Causation vs. Correlation|causation]]?  Does the mechanism use AI, e.g. [[Anomaly Detection|anomaly detection]]?
+* Are response plans, procedures and training in place to address AI attack or failure incidents?  How are AI investment’s models audited for security vulnerabilities?
+* How is the team notified changes? Active flagging/messaging (push) and passive health dashboard (polling)?  How does the team use the end-to-end information to optimize the organization's resources and process/service(s)?
+* Has [[Human Resources (HR)#AI to Identify Skill Gaps and Surface Hidden Expertise|role/job displacement due to automation and/or AI implementation being addressed?]]
+<hr><hr>
+<center><b>References</b></center>
-{|<!-- T -->
+<hr>
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>xoBWBNsjWoM</youtube>
-<b>Evaluating AI- and ML-Based Security Products
-</b><br>Anup Ghosh, Founder and CEO, Invincea Liam Randall, President, Critical Stack, A Division of Capital One  Chad Skipper, VP Competitive Intelligence and Product Testing, Cylance
-Mike Spanbauer, Vice President of Research and Strategy, NSS Labs With endless AI or machine learning product claims, buyers are left bewildered with how to test these claims. It falls to independent third-party test organizations to develop and update traditional test protocols to test and validate AI and ML product capability claims. This panel will tackle the key issues that third-party testing must address to validate AI and ML security products.
-|}
-|<!-- M -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>7CcSm0PAr-Y</youtube>
-<b>How Should We Evaluate Machine Learning for AI?: Percy Liang
-</b><br>Machine learning has undoubtedly been hugely successful in driving progress in AI, but it implicitly brings with it the train-test evaluation paradigm. This standard evaluation only encourages behavior that is good on average; it does not ensure robustness as demonstrated by adversarial examples, and it breaks down for tasks such as dialogue that are interactive or do not have a correct answer. In this talk, I will describe alternative evaluation paradigms with a focus on natural language understanding tasks, and discuss ramifications for guiding progress in AI in meaningful directions. Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011).  His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction.  Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning.  His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).
-|}
-|}<!-- B -->
-= <span id="ML Test Score"></span>ML Test Score =
-* [http://research.google/pubs/pub43146/ Machine Learning: The High Interest Credit Card of Technical Debt | | D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young -] [[Google]] Research
-* [http://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf Hidden Technical Debt in Machine Learning Systems D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J. Crespo, and D. Dennison -] [[Google]] Research
-Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. [http://research.google/pubs/pub46555/ The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley -] [[Google]] Research   Full Stack Deep Learning
+* [[Strategy & Tactics]] ... [[Project Management]] ... [[Best Practices]] ... [[Checklists]] ... [[Project Check-in]] ... [[Evaluation]] ... [[Evaluation - Measures|Measures]]
+** [[Evaluation - Measures#Accuracy|Accuracy]]
+** [[Evaluation - Measures#Precision & Recall (Sensitivity)|Precision & Recall (Sensitivity)]]
+** [[Evaluation - Measures#Specificity|Specificity]]
+** [[Project Management#Return on Investment (ROI)|Return on Investment (ROI)]]
+* [[Predictive Analytics]] ... [[Operations & Maintenance|Predictive Maintenance]] ... [[Forecasting]] ... [[Market Trading]] ... [[Sports Prediction]] ... [[Marketing]] ... [[Politics]] ... [[Excel#Excel - Forecasting|Excel]]
+* [[AI Governance]] / [[Algorithm Administration]]
+* [[AI Verification and Validation]]
+* [[Generative AI for Business Analysis]]
+* [[Leadership]]
+* [[Procuring]]
+* [[ML Test Score]]
+* [[Cybersecurity: Evaluating & Selling]]
+* [[Automated Scoring]]
+* [[Risk, Compliance and Regulation]]
+* [[Screening; Passenger, Luggage, & Cargo]]
+* [https://ico.org.uk/media/about-the-ico/consultations/2617219/guidance-on-the-ai-auditing-framework-draft-for-consultation.pdf Guidance on the AI auditing framework | Information Commissioner's Office (ICO)]
+* [https://www.gao.gov/products/GAO-20-48G Technology Readiness Assessments (TRA) Guide | US GAO]  ...used to evaluate the maturity of technologies and whether they are developed enough to be incorporated into a system without too much risk.
+* [https://dodcio.defense.gov/Portals/0/Documents/Cyber/2019%20Cybersecurity%20Resource%20and%20Reference%20Guide_DoD-CIO_Final_2020FEB07.pdf Cybersecurity Reference and Resource Guide |] [[Defense|DOD]]
+* [[Joint Capabilities Integration and Development System (JCIDS)]] | [[Defense|DOD]]
+* [https://recruitingdaily.com/five-ways-to-evaluate-ai-systems/ Five ways to evaluate AI systems | Felix Wetzel - Recruiting Daily]
+* [https://github.com/cisagov/cset/releases Cyber Security Evaluation Tool (CSET®)] ...provides a systematic, disciplined, and repeatable approach for evaluating an organization’s security posture.
+* [https://towardsdatascience.com/3-common-technical-debts-in-machine-learning-and-how-to-avoid-them-17f1d7e8a428 3 Common Technical Debts in Machine Learning and How to Avoid Them | Derek Chia - Towards Data Science]
+* [https://www.oreilly.com/radar/why-you-should-care-about-debugging-machine-learning-models/ Why you should care about debugging machine learning models | Patrick Hall and Andrew Burt - O'reilly]
+* [https://emerj.com/ai-sector-overviews/how-to-assess-an-artificial-intelligence-product-or-solution-for-non-experts/ How to Assess an Artificial Intelligence Product or Solution (Even if You’re Not an AI Expert) | Daniel Faggella - Emerj]
-{|<!-- T -->
+Nature of risks inherent to AI applications:  We believe that the challenge in governing AI is less about dealing with completely new types of risk and more about existing risks either being harder to identify in an effective and timely manner, given the complexity and speed of AI solutions, or manifesting themselves in unfamiliar ways. As such, firms do not require completely new processes for dealing with AI, but they will need to enhance existing ones to take into account AI and fill the necessary gaps. The likely impact on the level of resources required, as well as on roles and responsibilities, will also need to be addressed. [https://www2.deloitte.com/content/dam/Deloitte/nl/Documents/innovatie/deloitte-nl-innovate-lu-ai-and-risk-management.pdf AI and risk management: Innovating with confidence | Deloitte]
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>SIoYEd7VPDQ</youtube>
-<b>ML Test Score (2) - Testing & Deployment - Full Stack Deep Learning
-</b><br>How can you test your machine learning system?  A Rubric for Production Readiness and Technical Debt Reduction  is an exhaustive framework/checklist from practitioners at Google.
-- The paper presents a rubric as a set of 28 actionable tests and offers a scoring system to measure how ready for production a given machine learning system is. These are categorized into 4 sections: (1) data tests, (2) model tests, (3) ML infrastructure tests, and (4) monitoring tests. - The scoring system provides a vector for incentivizing ML system developers to achieve stable levels of reliability by providing a clear indicator of readiness and clear guidelines for how to improve.
-|}
-|<!-- M -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>obLHa21U0qo</youtube>
-<b>What is Your ML Score? - Tania Allard
-</b><br>Developer Advocate at [[Microsoft]] Using machine learning in real-world applications and production systems is complex. Testing, monitoring, and logging are key considerations for assessing the decay, current status, and production-readiness of machine learning systems. Where do you get started? Who is responsible for testing and monitoring? I’ll discuss the most frequent issues encountered in real-life ML applications and how you can make systems more robust. I’ll also provide a rubric with actionable examples to ensure quality and adequacy of a model in production.
-|}
-|}<!-- B -->
-= <span id="Buying"></span>Buying =
 {|<!-- T -->
 | valign="top" |
 {| class="wikitable" style="width: 550px;"
 ||
-<youtube>cRpuJcALfkA</youtube>
+<youtube>7CcSm0PAr-Y</youtube>
-<b>Build or buy AI? You're asking the wrong question
+<b>How Should We Evaluate Machine Learning for AI?: Percy Liang
-</b><br>Evan Kohn, chief business officer and head of marketing at Pypestream, talks with Tonya Hall about why companies need to turn to staffing for AI and building data sets.
+</b><br>YouTube: https://www.youtube.com/watch?v=7CcSm0PAr-Y
-|}
-|<!-- M -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>b2Yvf7poKbM</youtube>
-<b>Why you should Buy Open-Source AI
-</b><br>Considering an AI assistant in your home? Before you auto-buy that pretty picture in front of you, be sure to check out the open-source offerings as well.
-|}
-|}<!-- B -->
-= <span id="Best Practices"></span>Best Practices =
+Machine learning has undoubtedly been hugely successful in driving progress in AI, but it implicitly brings with it the train-test evaluation paradigm. This standard evaluation only encourages behavior that is good on average; it does not ensure robustness as demonstrated by adversarial examples, and it breaks down for tasks such as dialogue that are interactive or do not have a correct answer. In this talk, I will describe alternative evaluation paradigms with a focus on natural language understanding tasks, and discuss ramifications for guiding progress in AI in meaningful directions. Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011).  His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction.  Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning.  His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).
-* [http://developers.google.com/machine-learning/guides/rules-of-ml Rules of Machine Learning: Best Practices for ML Engineering | Martin Zinkevich - ][[Google]]
-{|<!-- T -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>VfcY0edoSLU</youtube>
-<b>Rules of ML
-</b><br>[[Google]] research scientist Martin Zinkevich
 |}
 |<!-- M -->
@@ Line 155: / Line 236: @@
 {| class="wikitable" style="width: 550px;"
 ||
-<youtube>N6tN48hCnE4</youtube>
+<youtube>V18AsBIHlWs</youtube>
-<b>Best Practices of In-Platform AI/ML Webinar
+<b>Machine Learning, Technical Debt, and You - D. Sculley ([[Google]])
-</b><br>Productive use of machine learning and artificial intelligence technologies is impossible without a platform that allows autonomous functioning of AI/ML mechanisms. In-platform AI/ML has a number of advantages that can be obtained via best practices by InterSystems. On this webinar, we will present: • MLOps as the natural paradigm for in-platform AI/ML
+</b><br>YouTube: https://www.youtube.com/watch?v=V18AsBIHlWs
-• A full cycle of AI/ML content development and in-platform deployment (including bidirectional integration of Jupyter with InterSystems IRIS)
-• New toolset added to ML Toolkit: integration and orchestration for Julia mathematical modeling environment
-• Automated AI/ML model selection and parameter determination via an SQL query
-• Cloud-enhanced ML
-• Featured use case demo: hospital readmission prediction (addresses running in InterSystems IRIS of the models trained outside the platform's control)
-|}
-|}<!-- B -->
-= Model Deployment Scoring =
+Machine Learning offers a fantastically powerful toolkit for building useful complex prediction systems quickly. In this talk, we'll argue it is dangerous to think of these quick wins as coming for free. Using the software engineering framework of technical debt, we find it is common to incur massive ongoing maintenance costs in real-world ML systems. We explore several ML-specific risk factors to account for in system design. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration issues, changes in the external world, and a variety of system-level anti-patterns.  We then show how to pay down ML technical debt by following a set of recommended best practices for testing and monitoring needed for real world systems. D. Sculley is a Senior Staff Software Engineer at [[Google]]
-{|<!-- T -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>gB0bTH-L6DE</youtube>
-<b>ML Model Deployment and Scoring on the Edge with Automatic ML & DF / Flink2Kafka
-</b><br>recorded on June 18, 2020.  Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow  Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments. H2O.ai's Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge.  Speakers:  James Medel (H2O.ai - Technical Community Maker)  Greg Keys (H2O.ai - Solution Engineer) Kafka 2 Flink - An Apache Love Story  This project has heavily inspired by two existing efforts from Data In Motion's FLaNK Stack and Data Artisan's blog on stateful streaming applications. The goal of this project is to provide insight into connecting an Apache Flink applications to Apache Kafka.  Speaker:  Ian R Brooks, PhD (Cloudera - Senior Solutions Engineer & Data)
-|}
-|<!-- M -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>q-VPALG6ogY</youtube>
-<b>Shawn Scully: Production and Beyond: Deploying and Managing Machine Learning Models
-</b><br>PyData NYC 2015  Machine learning has become the key component in building intelligence-infused applications. However, as companies increase the number of such deployments, the number of machine learning models that need to be created, maintained, monitored, tracked, and improved grow at a tremendous pace. This growth has lead to a huge (and well-documented) accumulation of technical debt.  Developing a machine learning application is an iterative process that involves building multiple models over a dataset. The dataset itself evolves over time as new features and new data points are collected. Furthermore, once deployed, the models require updates over time. Changes in models and datasets become difficult to track over time, and one can quickly lose track of which version of the model used which data and why it was subsequently replaced.  In this talk, we outline some of the key challenges in large-scale deployments of many interacting machine learning models. We then describe a methodology for management, monitoring, and optimization of such models in production, which helps mitigate the technical debt. In particular, we demonstrate how to: Track models and versions, and visualize their quality over time Track the provenance of models and datasets, and quantify how changes in data impact the models being served Optimize model ensembles in real time, based on changing data, and provide alerts when such ensembles no longer provide the desired accuracy.
 |}
 |}<!-- B -->