Difference between revisions of "Evaluation"
m |
m |
||
| Line 26: | Line 26: | ||
Many products today leverage artificial intelligence for a wide range of industries, from healthcare to marketing. However, most business leaders who need to make strategic and procurement decisions about these technologies have no formal AI background or academic training in data science. The purpose of this article is to give business people with no AI expertise a general guideline on how to assess an AI-related product to help decide whether it is potentially relevant to their business. [http://emerj.com/ai-sector-overviews/how-to-assess-an-artificial-intelligence-product-or-solution-for-non-experts/ How to Assess an Artificial Intelligence Product or Solution (Even if You’re Not an AI Expert) | Daniel Faggella - Emerj] | Many products today leverage artificial intelligence for a wide range of industries, from healthcare to marketing. However, most business leaders who need to make strategic and procurement decisions about these technologies have no formal AI background or academic training in data science. The purpose of this article is to give business people with no AI expertise a general guideline on how to assess an AI-related product to help decide whether it is potentially relevant to their business. [http://emerj.com/ai-sector-overviews/how-to-assess-an-artificial-intelligence-product-or-solution-for-non-experts/ How to Assess an Artificial Intelligence Product or Solution (Even if You’re Not an AI Expert) | Daniel Faggella - Emerj] | ||
| + | |||
| + | * What challenge does the Machine Learning (ML) solve? | ||
| + | * Is the intent of Machine Learning (ML) to increase performance (detection), reduce costs (predictive maintenance, reduce inventory) , decrease response time, or other outcome(s)? | ||
| + | * What is the clear and realistic way of measuring the success of the Machine Learning (ML) initiative? | ||
| + | * Does the Machine Learning (ML) reside in a procured item/application/solution or developed in house? | ||
| + | * If the Machine Learning (ML) is procured, e.g. embedded in sensor product, what items are included in the contract to future proof the solution? * Let the organization use implementation to gain better capability in the future? | ||
| + | * Contract items to protect organization reuse data rights? | ||
| + | * What analytics is the Machine Learning (ML) resolving? Descriptive (what happened?), Diagnostic (why did it happen?), Predictive/Preventive (what could happen?), Prescriptive (what should happen?), Cognitive (what steps should be taken?) | ||
| + | * What is the current inference/prediction/true positive rate (TPR) rate? | ||
| + | * How perfect does Machine Learning (ML) have to be to trust it? What is the inference/prediction rate performance metric for the Program? | ||
| + | * What is the false-positive rate? How does Machine Learning (ML) reduce false-positives without increasing false negatives? What is the false-positive rate performance metric for the Program? Is there a Receiver Operating Characteristic (ROC) curve; plotting the true positive rate (TPR) against the false positive rate (FPR) ? | ||
| + | * Has the data been identified for Machine Learning (ML) (current application or for future use) initiative(s)? Is the data labelled, or require manual labeling? | ||
| + | * Have the key features to be used in the Machine Learning (ML) model been identified? If needed, what are the algorithms used to combine ML features? What is the approximate number of features used? | ||
| + | * How are the dataset(s) used for Machine Learning (ML) training, testing and Validation managed? Are logs kept on which data is used for different executions/training so that the information used is traceable? How is the access to the information guaranteed? | ||
| + | * Are the dataset(s) for Machine Learning (ML) published (repo, marketplace) for reuse, if so where? | ||
| + | * What Machine Learning (ML) model type(s) are used? Regression, K-Nearest Neighbors (KNN), Graph Neural Networks, reinforcement, rule-based | ||
| + | * What are the Machine Learning (ML) architecture specifics, e.g. ensemble methods used, graph network, or distributed learning? | ||
| + | * Are the Machine Learning (ML) models published (repo, marketplace) for reuse, if so where? | ||
| + | * Is the Machine Learning (ML) model reused from a repository (repo, marketplace)? If so, which one? How are you notified of updates? How often is the repository checked for updates? | ||
| + | * Is transfer learning used? If so, which Machine Learning (ML) models are used? What mission specific dataset(s) are used to tune the ML model? | ||
| + | * Are Machine Learning (ML) service(s) are used for inference/prediction? | ||
| + | * What Machine Learning (ML) languages, libraries, scripting, are implemented? | ||
| + | * What tools are used for the AIOps? Please identify those on-premises and online services? | ||
| + | * Are the Machine Learning (ML) languages, libraries, scripting, and AIOps applications registered in the DHS Technical Reference Model (TRM)? | ||
| + | * What optimizers are used? Is augmented machine learning (AugML) or automated machine learning (AutoML) used? | ||
| + | * When the Machine Learning (ML) model is updated, how is it determined that the performance was indeed increased for the better? | ||
| + | * What benchmark standard(s) are the Machine Learning (ML) model compared/scored? e.g. General Language Understanding Evaluation (GLUE) | ||
| + | * How often is the deployed Machine Learning (ML) process monitored or measures re-evaluated? | ||
| + | * How is bias accounted for in the Machine Learning (ML) process? How are the dataset(s) used are assured to represent the problem space? What is the process of the removal of features/data that is believed are not relevant? What assurance is provided that the model (algorithm) is not biased? | ||
| + | * Is the model (implemented or to be implemented) explainable? How so? | ||
| + | * Has role/job displacement due to automation and/or Machine Learning (ML) implementation being addressed? | ||
| + | * Are User and Entity Behavior Analytics (UEBA) and machine learning (ML) used to help to create a baseline for trusted workload access? | ||
| + | * Is machine learning (ML) being used for abnormality detection? Security? | ||
| + | * Is machine learning (ML) used protect the Program against targeted attacks, often referred to as advanced targeted attacks (ATAs) or advanced persistent threats (APTs)? | ||
| + | * If the Program is implementing machine learning (ML), is the Program implementing an AIOps pipeline/toolchain? | ||
| + | * Does the Program depict the AIOps pipeline/toolchain applications in their tech stack? | ||
| + | * Has the Program where AI is used in the SecDevOps architecture? e.g. software testing | ||
| + | * Does data management reflected in the AIOps pipeline/toolchain processes/architecture? | ||
| + | * Are the end-to-end visability and bottleneck risks for AIOps pipeline/toolchain reflected in the risk register with mitigation strategy for each risk? | ||
| + | |||
| + | |||
| + | |||
| + | |||
{|<!-- T --> | {|<!-- T --> | ||
Revision as of 09:26, 6 September 2020
YouTube search... ...Google search
- Evaluation
- AIOps / MLOps
- Automated Scoring
- Imbalanced Data
- Five ways to evaluate AI systems | Felix Wetzel - Recruiting Daily
- Cyber Security Evaluation Tool (CSET®) ...provides a systematic, disciplined, and repeatable approach for evaluating an organization’s security posture.
- 3 Common Technical Debts in Machine Learning and How to Avoid Them | Derek Chia - Towards Data Science
Many products today leverage artificial intelligence for a wide range of industries, from healthcare to marketing. However, most business leaders who need to make strategic and procurement decisions about these technologies have no formal AI background or academic training in data science. The purpose of this article is to give business people with no AI expertise a general guideline on how to assess an AI-related product to help decide whether it is potentially relevant to their business. How to Assess an Artificial Intelligence Product or Solution (Even if You’re Not an AI Expert) | Daniel Faggella - Emerj
- What challenge does the Machine Learning (ML) solve?
- Is the intent of Machine Learning (ML) to increase performance (detection), reduce costs (predictive maintenance, reduce inventory) , decrease response time, or other outcome(s)?
- What is the clear and realistic way of measuring the success of the Machine Learning (ML) initiative?
- Does the Machine Learning (ML) reside in a procured item/application/solution or developed in house?
- If the Machine Learning (ML) is procured, e.g. embedded in sensor product, what items are included in the contract to future proof the solution? * Let the organization use implementation to gain better capability in the future?
- Contract items to protect organization reuse data rights?
- What analytics is the Machine Learning (ML) resolving? Descriptive (what happened?), Diagnostic (why did it happen?), Predictive/Preventive (what could happen?), Prescriptive (what should happen?), Cognitive (what steps should be taken?)
- What is the current inference/prediction/true positive rate (TPR) rate?
- How perfect does Machine Learning (ML) have to be to trust it? What is the inference/prediction rate performance metric for the Program?
- What is the false-positive rate? How does Machine Learning (ML) reduce false-positives without increasing false negatives? What is the false-positive rate performance metric for the Program? Is there a Receiver Operating Characteristic (ROC) curve; plotting the true positive rate (TPR) against the false positive rate (FPR) ?
- Has the data been identified for Machine Learning (ML) (current application or for future use) initiative(s)? Is the data labelled, or require manual labeling?
- Have the key features to be used in the Machine Learning (ML) model been identified? If needed, what are the algorithms used to combine ML features? What is the approximate number of features used?
- How are the dataset(s) used for Machine Learning (ML) training, testing and Validation managed? Are logs kept on which data is used for different executions/training so that the information used is traceable? How is the access to the information guaranteed?
- Are the dataset(s) for Machine Learning (ML) published (repo, marketplace) for reuse, if so where?
- What Machine Learning (ML) model type(s) are used? Regression, K-Nearest Neighbors (KNN), Graph Neural Networks, reinforcement, rule-based
- What are the Machine Learning (ML) architecture specifics, e.g. ensemble methods used, graph network, or distributed learning?
- Are the Machine Learning (ML) models published (repo, marketplace) for reuse, if so where?
- Is the Machine Learning (ML) model reused from a repository (repo, marketplace)? If so, which one? How are you notified of updates? How often is the repository checked for updates?
- Is transfer learning used? If so, which Machine Learning (ML) models are used? What mission specific dataset(s) are used to tune the ML model?
- Are Machine Learning (ML) service(s) are used for inference/prediction?
- What Machine Learning (ML) languages, libraries, scripting, are implemented?
- What tools are used for the AIOps? Please identify those on-premises and online services?
- Are the Machine Learning (ML) languages, libraries, scripting, and AIOps applications registered in the DHS Technical Reference Model (TRM)?
- What optimizers are used? Is augmented machine learning (AugML) or automated machine learning (AutoML) used?
- When the Machine Learning (ML) model is updated, how is it determined that the performance was indeed increased for the better?
- What benchmark standard(s) are the Machine Learning (ML) model compared/scored? e.g. General Language Understanding Evaluation (GLUE)
- How often is the deployed Machine Learning (ML) process monitored or measures re-evaluated?
- How is bias accounted for in the Machine Learning (ML) process? How are the dataset(s) used are assured to represent the problem space? What is the process of the removal of features/data that is believed are not relevant? What assurance is provided that the model (algorithm) is not biased?
- Is the model (implemented or to be implemented) explainable? How so?
- Has role/job displacement due to automation and/or Machine Learning (ML) implementation being addressed?
- Are User and Entity Behavior Analytics (UEBA) and machine learning (ML) used to help to create a baseline for trusted workload access?
- Is machine learning (ML) being used for abnormality detection? Security?
- Is machine learning (ML) used protect the Program against targeted attacks, often referred to as advanced targeted attacks (ATAs) or advanced persistent threats (APTs)?
- If the Program is implementing machine learning (ML), is the Program implementing an AIOps pipeline/toolchain?
- Does the Program depict the AIOps pipeline/toolchain applications in their tech stack?
- Has the Program where AI is used in the SecDevOps architecture? e.g. software testing
- Does data management reflected in the AIOps pipeline/toolchain processes/architecture?
- Are the end-to-end visability and bottleneck risks for AIOps pipeline/toolchain reflected in the risk register with mitigation strategy for each risk?
|
|
ML Test Score
- Machine Learning: The High Interest Credit Card of Technical Debt | | D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, and M. Young - Google Research
- Hidden Technical Debt in Machine Learning Systems D. Sculley, G Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J. Crespo, and D. Dennison - Google Research
Creating reliable, production-level machine learning systems brings on a host of concerns not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for ensuring the production-readiness of an ML system, and for reducing technical debt of ML systems. But it can be difficult to formulate specific tests, given that the actual prediction behavior of any given model is difficult to specify a priori. In this paper, we present 28 specific tests and monitoring needs, drawn from experience with a wide range of production ML systems to help quantify these issues and present an easy to follow road-map to improve production readiness and pay down ML technical debt. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction | E. Breck, S. Cai, E. Nielsen, M. Salib, and D. Sculley - Google Research Full Stack Deep Learning
|
|
Buying
|
|
Best Practices
|
|
Model Deployment Scoring
|
|