Revision as of 18:52, 6 September 2020

Evaluation
Cybersecurity: Evaluating & Selling
Strategy & Tactics
AIOps / MLOps
Automated Scoring
Imbalanced Data
Risk, Compliance and Regulation
Guidance on the AI auditing framework | Information Commissioner's Office (ICO)
Technology Readiness Assessments (TRA) Guide | US GAO ...used to evaluate the maturity of technologies and whether they are developed enough to be incorporated into a system without too much risk.
Cybersecurity Reference and Resource Guide | DOD
Five ways to evaluate AI systems | Felix Wetzel - Recruiting Daily
Cyber Security Evaluation Tool (CSET®) ...provides a systematic, disciplined, and repeatable approach for evaluating an organization’s security posture.
3 Common Technical Debts in Machine Learning and How to Avoid Them | Derek Chia - Towards Data Science

Many products today leverage artificial intelligence for a wide range of industries, from healthcare to marketing. However, most business leaders who need to make strategic and procurement decisions about these technologies have no formal AI background or academic training in data science. The purpose of this article is to give business people with no AI expertise a general guideline on how to assess an AI-related product to help decide whether it is potentially relevant to their business. How to Assess an Artificial Intelligence Product or Solution (Even if You’re Not an AI Expert) | Daniel Faggella - Emerj

Assessment Questions - Artificial Intelligence (AI) / Machine Learning (ML) / Machine Intelligence (MI)

What challenge does the AI solve?
- Is the intent of AI to increase performance (detection), reduce costs (predictive maintenance, reduce inventory) , decrease response time, or other outcome(s)?
- What analytics is the AI resolving? Descriptive (what happened?), Diagnostic (why did it happen?), Predictive/Preventive (what could happen?), Prescriptive (what should happen?), Cognitive (what steps should be taken?)
What is the Return on Investment (ROI)? Is the AI investment on track with original ROI target?
- What is the clear and realistic way of measuring the success of the AI investment?
Is the organization using the implementation to gain better capability in the future?
- Is the right Leadership in place?
- Is the organization positioned or positioning to scale its current state with AI?
Are Best Practices being followed? Is the team trained in the Best Practices?
What is the ML Test Score?
Does the AI reside in a procured item/application/solution or developed in house?
- If the AI is procured, e.g. embedded in sensor product, what items are included in the contract to future proof the solution?
- Contract items to protect organization reuse data rights?
What are the significant measures that indicate the AI investment is achieving success?
- What Evaluation - Measures are documented? Are the Measures being used correctly?
- How would you be able to tell if the AI investment was working properly?
- How perfect does AI have to be to trust it? What is the inference/prediction rate performance metric for the AI investment?
- What is the current inference/prediction/ True Positive Rate (TPR)?
- What is the False Positive Rate (FPR)? How does AI reduce false-positives without increasing false negatives?
- Is there a Receiver Operating Characteristic (ROC) curve; plotting the True Positive Rate (TPR) against the False Positive Rate (FPR)?
- When the AI model is updated, how is it determined that the performance was indeed increased for the better?
Is Master Data Management (MDM) in place? Data Plan?
- Has the data been identified for AI (current investment or for future use) investment(s)?
- Is the data labelled, or require manual labeling?
- Have the key features to be used in the AI model been identified? If needed, what are the algorithms used to combine AI features? What is the approximate number of features used?
- How are the dataset(s) used for AI training, testing and Validation managed? Are logs kept on which data is used for different executions/training so that the information used is traceable? How is the access to the information guaranteed?
- Are the dataset(s) for AI published (repo, marketplace) for reuse, if so where?
What AI Governance is in place?
- What are the AI architecture specifics, e.g. Ensemble Learning methods used, graph network, or Distributed learning?
- What AI model type(s) are used? Regression, K-Nearest Neighbors (KNN), [[Graph Convolutional Network (GCN), Graph Neural Networks (Graph Nets), Geometric Deep Learning|Graph Neural Networks], Reinforcement Learning (RL), Association Rule Learning, etc.
- Is Transfer Learning used? If so, which AI models are used? What mission specific dataset(s) are used to tune the AI model?
- Are the AI models published (repo, marketplace) for reuse, if so where?
- Is the AI model reused from a repository (repo, marketplace)? If so, which one? How are you notified of updates? How often is the repository checked for updates?
- Are AI service(s) are used for inference/prediction?
- What AI languages, Libraries & Frameworks, scripting, are implemented? Python, Javascript, PyTorch etc.
- What optimizers are used? Is augmented machine learning (AugML) or automated machine learning (AutoML) used?
- What benchmark standard(s) are the AI model compared/scored? e.g. Global Vectors for Word Representation (GloVe)
- How often is the deployed AI process monitored or measures re-evaluated?
- How is bias accounted for in the AI process? How are the Datasetsdataset(s) used are assured to represent the problem space? What is the process of the removal of features/data that is believed are not relevant? What assurance is provided that the model (algorithm) is not biased?
- Is the model (implemented or to be implemented) explainable? Interpretable? How so?
- Has role/job displacement due to automation and/or AI implementation being addressed?
- Are User and | Entity Behavior Analytics (UEBA) and AI used to help to create a baseline for trusted workload access?
Is AI being used for Cybersecurity?
- Is AI used protect the AI investment against targeted attacks, often referred to as advanced targeted attacks (ATAs) or advanced persistent threats (APTs)?
If the AI investment is implementing AI, is the AI investment implementing an AIOps / MLOps pipeline/toolchain?
- What tools are used for the AIOps / MLOps? Please identify those on-premises and online services?
- Are the AI languages, libraries, scripting, and AIOps / MLOps applications registered in the organization?
- Does the AI investment depict the AIOps / MLOps pipeline/toolchain applications in their tech stack?
- Has the AI investment where AI is used in the SecDevOps architecture? e.g. software testing
- Does data management reflected in the AIOps / MLOps pipeline/toolchain processes/architecture?
- Are the end-to-end visibility and bottleneck risks for AIOps / MLOps pipeline/toolchain reflected in the risk register with mitigation strategy for each risk?

How Should We Evaluate Machine Learning for AI?: Percy Liang Machine learning has undoubtedly been hugely successful in driving progress in AI, but it implicitly brings with it the train-test evaluation paradigm. This standard evaluation only encourages behavior that is good on average; it does not ensure robustness as demonstrated by adversarial examples, and it breaks down for tasks such as dialogue that are interactive or do not have a correct answer. In this talk, I will describe alternative evaluation paradigms with a focus on natural language understanding tasks, and discuss ramifications for guiding progress in AI in meaningful directions. Percy Liang is an Assistant Professor of Computer Science at Stanford University (B.S. from MIT, 2004; Ph.D. from UC Berkeley, 2011). His research spans machine learning and natural language processing, with the goal of developing trustworthy agents that can communicate effectively with people and improve over time through interaction. Specific topics include question answering, dialogue, program induction, interactive learning, and reliable machine learning. His awards include the IJCAI Computers and Thought Award (2016), an NSF CAREER Award (2016), a Sloan Research Fellowship (2015), and a Microsoft Research Faculty Fellowship (2014).

Managing AI Innovation: How should a manager evaluate a new process technology? In this video Paolo Messina addresses the issue of whether or not to innovate in a specific area of your machine learning project. To most new managers in machine learning or managers that have to make decisions about investments, the importance of data processes is only partially visible. The truth is most of your team resources usually are invested around filtering, cleansing and labeling the data. In this video we discuss why this process is at the same time laborious and costly and why it affects the process life time of your innovation. We discuss what traditional supervised learning, a flavor of machine learning, can pose limitations on what we can achieve and the time to delivery. We discuss all the factors to keep in consideration to mitigate or evaluate hot the data processing will affect a machine learning product or process pipeline. We suggest that active learning is emerging as an alternative to reduce costs for data pipeline and data labeling. However, before implementing such solutions managers, product managers and decision-makers will have to consider various factors. We show a decision process framework that could be useful to new managers or established managers. Namely, we look into: • Cost of integration • Delivery time • Technology limitations such as cognitive biases • Scalability and extensibility of the new process

ML Test Score

Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley - Google Research Full Stack Deep Learning

ML Test Score (2) - Testing & Deployment - Full Stack Deep Learning How can you test your machine learning system? A Rubric for Production Readiness and Technical Debt Reduction is an exhaustive framework/checklist from practitioners at Google. - The paper presents a rubric as a set of 28 actionable tests and offers a scoring system to measure how ready for production a given machine learning system is. These are categorized into 4 sections: (1) data tests, (2) model tests, (3) ML infrastructure tests, and (4) monitoring tests. - The scoring system provides a vector for incentivizing ML system developers to achieve stable levels of reliability by providing a clear indicator of readiness and clear guidelines for how to improve.

What is Your ML Score? - Tania Allard Developer Advocate at Microsoft Using machine learning in real-world applications and production systems is complex. Testing, monitoring, and logging are key considerations for assessing the decay, current status, and production-readiness of machine learning systems. Where do you get started? Who is responsible for testing and monitoring? I’ll discuss the most frequent issues encountered in real-life ML applications and how you can make systems more robust. I’ll also provide a rubric with actionable examples to ensure quality and adequacy of a model in production.

Procuring

Build or buy AI? You're asking the wrong question Evan Kohn, chief business officer and head of marketing at Pypestream, talks with Tonya Hall about why companies need to turn to staffing for AI and building data sets.

Why you should Buy Open-Source AI Considering an AI assistant in your home? Before you auto-buy that pretty picture in front of you, be sure to check out the open-source offerings as well.

Best Practices

Rules of Machine Learning: Best Practices for ML Engineering | Martin Zinkevich - Google

Rules of ML Google research scientist Martin Zinkevich

Best Practices of In-Platform AI/ML Webinar Productive use of machine learning and artificial intelligence technologies is impossible without a platform that allows autonomous functioning of AI/ML mechanisms. In-platform AI/ML has a number of advantages that can be obtained via best practices by InterSystems. On this webinar, we will present: • MLOps as the natural paradigm for in-platform AI/ML • A full cycle of AI/ML content development and in-platform deployment (including bidirectional integration of Jupyter with InterSystems IRIS) • New toolset added to ML Toolkit: integration and orchestration for Julia mathematical modeling environment • Automated AI/ML model selection and parameter determination via an SQL query • Cloud-enhanced ML • Featured use case demo: hospital readmission prediction (addresses running in InterSystems IRIS of the models trained outside the platform's control)

Model Deployment Scoring

ML Model Deployment and Scoring on the Edge with Automatic ML & DF / Flink2Kafka recorded on June 18, 2020. Machine Learning Model Deployment and Scoring on the Edge with Automatic Machine Learning and Data Flow Deploying Machine Learning models to the edge can present significant ML/IoT challenges centered around the need for low latency and accurate scoring on minimal resource environments. H2O.ai's Driverless AI AutoML and Cloudera Data Flow work nicely together to solve this challenge. Driverless AI automates the building of accurate Machine Learning models, which are deployed as light footprint and low latency Java or C++ artifacts, also known as a MOJO (Model Optimized). And Cloudera Data Flow leverage Apache NiFi that offers an innovative data flow framework to host MOJOs to make predictions on data moving on the edge. Speakers: James Medel (H2O.ai - Technical Community Maker) Greg Keys (H2O.ai - Solution Engineer) Kafka 2 Flink - An Apache Love Story This project has heavily inspired by two existing efforts from Data In Motion's FLaNK Stack and Data Artisan's blog on stateful streaming applications. The goal of this project is to provide insight into connecting an Apache Flink applications to Apache Kafka. Speaker: Ian R Brooks, PhD (Cloudera - Senior Solutions Engineer & Data)

Shawn Scully: Production and Beyond: Deploying and Managing Machine Learning Models PyData NYC 2015 Machine learning has become the key component in building intelligence-infused applications. However, as companies increase the number of such deployments, the number of machine learning models that need to be created, maintained, monitored, tracked, and improved grow at a tremendous pace. This growth has lead to a huge (and well-documented) accumulation of technical debt. Developing a machine learning application is an iterative process that involves building multiple models over a dataset. The dataset itself evolves over time as new features and new data points are collected. Furthermore, once deployed, the models require updates over time. Changes in models and datasets become difficult to track over time, and one can quickly lose track of which version of the model used which data and why it was subsequently replaced. In this talk, we outline some of the key challenges in large-scale deployments of many interacting machine learning models. We then describe a methodology for management, monitoring, and optimization of such models in production, which helps mitigate the technical debt. In particular, we demonstrate how to: Track models and versions, and visualize their quality over time Track the provenance of models and datasets, and quantify how changes in data impact the models being served Optimize model ensembles in real time, based on changing data, and provide alerts when such ensembles no longer provide the desired accuracy.

Leadership

Creatives

Artificial Intelligence: New Challenges for Leadership and Management The Future of Management in an Artificial Intelligence-Based World For more info about the conference: https://bit.ly/2J30TD3 -Dario Gil, Vice President of Science and Solutions, IBM Research -Tomo Noda, Founder and Chair, Shizenkan University Graduate School of Leadership and Innovation, Japan Moderator: Sandra Sieber, Professor, IESE

Herminia Ibarra: What Will Leadership Look Like In The Age of AI? Herminia Ibarra, the Charles Handy professor of organisational behaviour at the London Business School, delves into what talent looks like in the age of artificial intelligence. Leaders are people who move a company, organisation, or institution from its current to – ideally – something better. In the age of artificial intelligence and smart technologies, this means being able to actually make use of the vast technological capability that is out there, but is wildly under-used.

Who Makes AI Projects Successful Business leaders often have high expectations of AI/ML projects, and are sorely disappointed when things don't work out. AI implementations are more than just solving the technology problem. There are many other aspects to consider, and you'll need someone who has strong knowledge and background in business, technology (especially AI/ML), and data to guide the business on projects to take on, strategic direction, updates, and many other aspects. In this video, I call out the need for such a role because the underlying paradigm of software development is shifting. Here's what I can do to help you. I speak on the topics of architecture and AI, help you integrate AI into your organization, educate your team on what AI can or cannot do, and make things simple enough that you can take action from your new knowledge. I work with your organization to understand the nuances and challenges that you face, and together we can understand, frame, analyze, and address challenges in a systematic way so you see improvement in your overall business, is aligned with your strategy, and most importantly, you and your organization can incrementally change to transform and thrive in the future. If any of this sounds like something you might need, please reach out to me at dr.raj.ramesh@topsigma.com, and we'll get back in touch within a day. Thanks for watching my videos and for subscribing. www.topsigma.com www.linkedin.com/in/rajramesh

Lecture 2.7 Working with an AI team — [AI For Everyone \| Andrew Ng] AI For Everyone lectures by Andrew Ng and our own Learning Notes.

Return on Investment (ROI)

How to compute the ROI on AI projects? Figuring out the ROI on AI implementations can be challenging. We offer some guidance on how to do that in this video. You can use this framework to make sure that you consider the many aspects of ROI that are especially required for AI projects. Contact the authors at: mehran.irdmousa@mziaviation.com, dr.raj.ramesh@gmail.com

Getting to AI ROI: Finding Value in Your Unstructured Content Artificial Intelligence is definitely having its moment, but if you’re like most companies, you haven’t yet been able to capture ROI from these exciting technologies. It seems complicated, expensive, requires specialized talent, crazy data requirements, and more. Your boss may have dropped a vague missive onto your desk asking you to “figure out how AI can help enhance our business.” You have piles and piles of unstructured content—contracts, documents, feedback, but you haven’t been able to drive value from your data. Where to even start?

@@ Line 49: / Line 49: @@
 ** Is the right [[Evaluation#Leadership| Leadership]] in place?
 ** Is the organization positioned or positioning to scale its current state with AI?
+* Are [[Evaluation#Best Practices| Best Practices]] being followed?  Is the team trained in the [[Evaluation#Best Practices| Best Practices]]?
 * What is the [[Evaluation#ML Test Score| ML Test Score?]]
 * Does the AI reside in a [[Evaluation#Procuring| procured item/application/solution or developed in house]]?
 ** If the AI is [[Evaluation#Buying| procured]], e.g. embedded in sensor product, what items are included in the contract to future proof the solution?
 ** Contract items to protect organization reuse data rights?
-* Are [[Evaluation#Best Practices| Best Practices]] being followed?
+* What are the significant [[Evaluation - Measures| measures]] that indicate the AI investment is achieving success?
-* What are the significant [Evaluation - Measures| measures]] that indicate the AI investment is achieving success?
 ** What [[Evaluation - Measures]] are documented?  Are the [[Evaluation - Measures|Measures]] being used correctly?
 ** How would you be able to tell if the AI investment was working properly?

Difference between revisions of "Evaluation"

Revision as of 18:52, 6 September 2020

Contents

ML Test Score

Procuring

Best Practices

Model Deployment Scoring

Leadership

Return on Investment (ROI)

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools