Difference between revisions of "Algorithm Administration"
m (→Hyperparameter) |
m |
||
| Line 5: | Line 5: | ||
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | ||
}} | }} | ||
| − | [ | + | [https://www.youtube.com/results?search_query=feature+element+attribute+store+data+lineage+catalog+management+deep+machine+learning YouTube search...] |
| − | [ | + | [https://www.quora.com/search?q=feature+element+attribute+store+data+lineage+catalog+management Quora search...] |
| − | [ | + | [https://www.google.com/search?q=feature+element+attribute+store+data+lineage+catalog+management+deep+machine+learning+ML ...Google search] |
* [[AI Governance]] / [[Algorithm Administration]] | * [[AI Governance]] / [[Algorithm Administration]] | ||
| Line 42: | Line 42: | ||
* [[Containers; Docker, Kubernetes & Microservices]] | * [[Containers; Docker, Kubernetes & Microservices]] | ||
* [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)]] | * [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)]] | ||
| − | * [ | + | * [https://medium.com/georgian-impact-blog/automatic-machine-learning-aml-landscape-survey-f75c3ae3bbf2 Automatic Machine Learning (AutoML) Landscape Survey | Alexander Allen & Adithya Balaji - Georgian Partners]... |
| − | * [ | + | * [https://getmanta.com/?gclid=CjwKCAjwsfreBRB9EiwAikSUHSSOxld0nZNyLNXmiPM43x7jEAgeTxkXRH_s5XPJlfTekPdO8N1Y1xoCKwwQAvD_BwE Automate your data lineage] |
| − | * [ | + | * [https://www.information-age.com/benefiting-ai-data-management-123471564/ Benefiting from AI: A different approach to data management is needed] |
* [[Git - GitHub and GitLab]] ...[[Publishing#Model Publishing|publishing your model]] | * [[Git - GitHub and GitLab]] ...[[Publishing#Model Publishing|publishing your model]] | ||
| − | * [ | + | * [https://github.com/JonTupitza/Data-Science-Process/blob/master/10-Modeling-Pipeline.ipynb Use a Pipeline to Chain PCA with a RandomForest Classifier Jupyter Notebook |] [https://github.com/jontupitza Jon Tupitza] |
| − | * [ | + | * [https://devblogs.microsoft.com/cesardelatorre/ml-net-model-lifecycle-with-azure-devops-ci-cd-pipelines/ ML.NET Model Lifecycle with Azure DevOps CI/CD pipelines | Cesar de la Torre - Microsoft] |
| − | * [ | + | * [https://medium.com/data-ops/a-great-model-is-not-enough-deploying-ai-without-technical-debt-70e3d5fecfd3 A Great Model is Not Enough: Deploying AI Without Technical Debt | DataKitchen - Medium] |
| − | * [ | + | * [https://towardsdatascience.com/ml-infrastructure-tools-for-production-part-2-model-deployment-and-serving-fcfc75c4a362ML Infrastructure Tools for Production | Aparna Dhinakaran - Towards Data Science] ...Model Deployment and Serving |
| − | * [ | + | * [https://www.camelot-mc.com/en/client-services/information-data-management/global-community-for-artificial-intelligence-in-mdm/ Global Community for Artificial Intelligence (AI) in Master Data Management (MDM) | Camelot Management Consultants] |
| − | * [ | + | * [https://ce.aut.ac.ir/~meybodi/paper/Dynamic%20environment/PSO%20in%20dynamic%20environment==/Particle%20Swarms%20for%20Dynamic%20Optimization%20Problems.2008.pdf Particle Swarms for Dynamic Optimization Problems | T. Blackwell, J. Branke, and X. Li] |
* [[Telecommunications#5G_Security|5G_Security]] | * [[Telecommunications#5G_Security|5G_Security]] | ||
| Line 57: | Line 57: | ||
* [[Google AutoML]] automatically build and deploy state-of-the-art machine learning models | * [[Google AutoML]] automatically build and deploy state-of-the-art machine learning models | ||
** [[TensorBoard]] | [[Google ]] | ** [[TensorBoard]] | [[Google ]] | ||
| − | ** [[Kubeflow Pipelines]] - a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. [ | + | ** [[Kubeflow Pipelines]] - a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. [https://cloud.google.com/blog/products/ai-machine-learning/introducing-ai-hub-and-kubeflow-pipelines-making-ai-simpler-faster-and-more-useful-for-businesses Introducing AI Hub and Kubeflow Pipelines: Making AI simpler, faster, and more useful for businesses] | [[Google ]] |
* [[SageMaker]] | [[Amazon]] | * [[SageMaker]] | [[Amazon]] | ||
| − | * [ | + | * [https://docs.microsoft.com/en-us/azure/machine-learning/concept-model-management-and-deployment MLOps] | [[Microsoft]] ...model management, deployment, and monitoring with Azure |
| − | ** [ | + | ** [https://feedback.azure.com/forums/906052-data-catalog How can we improve Azure Data Catalog?] |
| − | ** [ | + | ** [https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml AutoML] |
* [[Ludwig]] - a [[Python]] toolbox from Uber that allows to train and test deep learning models | * [[Ludwig]] - a [[Python]] toolbox from Uber that allows to train and test deep learning models | ||
* TPOT a [[Python]] library that automatically creates and optimizes full machine learning pipelines using genetic programming. Not for NLP, strings need to be coded to numerics. | * TPOT a [[Python]] library that automatically creates and optimizes full machine learning pipelines using genetic programming. Not for NLP, strings need to be coded to numerics. | ||
| − | * [[H2O]] [ | + | * [[H2O]] [https://www.h2o.ai/products/h2o-driverless-ai/ Driverless AI] for automated [[Visualization]], feature engineering, model training, [[Algorithm Administration#Hyperparameter|hyperparameter]] optimization, and explainability. |
| − | * [ | + | * [https://www.alteryx.com/ alteryx:] [https://www.featurelabs.com/ Feature Labs], [https://www.alteryx.com/innovation-labs Featuretools] |
| − | * [ | + | * [https://mlbox.readthedocs.io/en/latest/ MLBox] Fast reading and distributed data preprocessing/cleaning/formatting. Highly robust feature selection and leak detection. Accurate hyper-parameter optimization in high-dimensional space. State-of-the art predictive models for classification and regression (Deep Learning, Stacking, [[LightGBM]],…). Prediction with models interpretation. Primarily Linux. |
| − | * [ | + | * [https://automl.github.io/auto-sklearn/master/ auto-sklearn] algorithm selection and [[Algorithm Administration#Hyperparameter|hyperparameter]] tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction.is a Bayesian [[Algorithm Administration#Hyperparameter|hyperparameter]] optimization layer on top of [[Python#scikit-learn|scikit-learn]]. Not for large datasets. |
* [[Auto Keras]] is an open-source [[Python]] package for neural architecture search. | * [[Auto Keras]] is an open-source [[Python]] package for neural architecture search. | ||
| − | * [ | + | * [https://github.com/HDI-Project/ATM ATM] -auto tune models - a multi-tenant, multi-data system for automated machine learning (model selection and tuning). ATM is an open source software library under the [https://github.com/HDI-Project Human Data Interaction project (HDI)] at MIT. |
| − | * [ | + | * [https://www.cs.ubc.ca/labs/beta/Projects/autoweka/ Auto-WEKA] is a Bayesian [[Algorithm Administration#Hyperparameter|hyperparameter]] optimization layer on top of [https://www.cs.waikato.ac.nz/ml/weka/ Weka]. [https://www.cs.waikato.ac.nz/ml/weka/ Weka] is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. [https://www.cs.waikato.ac.nz/ml/weka/ Weka] contains tools for data pre-processing, classification, regression, clustering, association rules, and [[visualization]]. |
| − | * [ | + | * [https://github.com/salesforce/TransmogrifAI TransmogrifAI] - an AutoML library for building modular, reusable, strongly typed machine learning workflows. A Scala/SparkML library created by Salesforce for automated data cleansing, feature engineering, model selection, and [[Algorithm Administration#Hyperparameter|hyperparameter]] optimization |
| − | * [ | + | * [https://github.com/laic-ufmg/Recipe RECIPE] - a framework based on grammar-based genetic programming that builds customized [[Python#scikit-learn|scikit-learn]] classification pipelines. |
| − | * [ | + | * [https://github.com/laic-ufmg/automlc AutoMLC] Automated Multi-Label Classification. GA-Auto-MLC and Auto-MEKAGGP are freely-available methods that perform automated multi-label classification on the MEKA software. |
| − | * [ | + | * [https://databricks.com/mlflow Databricks MLflow] an open source framework to manage the complete Machine Learning lifecycle using Managed MLflow as an integrated service with the Databricks Unified Analytics Platform... ...manage the ML lifecycle, including experimentation, reproducibility and deployment |
| − | * [ | + | * [https://www.sas.com/en_us/software/viya/new-features.html SAS Viya] automates the process of data cleansing, data transformations, feature engineering, algorithm matching, model training and ongoing governance. |
| − | * [ | + | * [https://www.comet.ml/site/ Comet ML] ...self-hosted and cloud-based meta machine learning platform allowing data scientists and teams to track, compare, explain and optimize experiments and models |
| − | * [ | + | * [https://www.dominodatalab.com/product/domino-model-monitor/ Domino Model Monitor (DMM) | Domino] ...monitor the performance of all models across your entire organization |
| − | * [ | + | * [https://www.wandb.com/ Weights and Biases] ...experiment tracking, model optimization, and dataset versioning |
| − | * [ | + | * [https://sigopt.com/ SigOpt] ...optimization platform and API designed to unlock the potential of modeling pipelines. This fully agnostic software solution accelerates, amplifies, and scales the model development process |
| − | * [ | + | * [https://dvc.org/ DVC] ...Open-source Version Control System for Machine Learning Projects |
| − | * [ | + | * [https://www.modelop.com/modelops-and-mlops/ ModelOp Center | ModelOp] |
| − | * [ | + | * [https://www.moogsoft.com/aiops-platform/ Moogsoft] and [https://www.ansible.com/ Red Hat Ansible] Tower |
| − | * [ | + | * [https://www.dataiku.com/product/ DSS | Dataiku] |
| − | * [ | + | * [https://www.sas.com/en_us/software/model-manager.html Model Manager | SAS] |
| − | * [ | + | * [https://www.datarobot.com/platform/mlops/ Machine Learning Operations (MLOps) | DataRobot] ...build highly accurate predictive models with full transparency |
| − | * [ | + | * [https://metaflow.org/ Metaflow], Netflix and AWS open source [[Python]] library |
= Master Data Management (MDM) = | = Master Data Management (MDM) = | ||
| − | [ | + | [https://www.youtube.com/results?search_query=Master+Data+Management+MDM+data+lineage+catalog+management+deep+machine+learning+ai YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=Master+Data+Management+MDM+data+lineage+catalog+management+deep+machine+learning+ai ...Google search] |
Feature Store / Data Lineage / Data Catalog | Feature Store / Data Lineage / Data Catalog | ||
| Line 160: | Line 160: | ||
<b>Top 10 Mistakes in Data Management | <b>Top 10 Mistakes in Data Management | ||
</b><br>Come learn about the mistakes we most often see organizations make in managing their data. Also learn more about Intricity's Data Management Health Check which you can download here: | </b><br>Come learn about the mistakes we most often see organizations make in managing their data. Also learn more about Intricity's Data Management Health Check which you can download here: | ||
| − | + | https://www.intricity.com/intricity101/ To Talk with a Specialist go to: https://www.intricity.com/intricity101/ www.intricity.com | |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 166: | Line 166: | ||
= <span id="Versioning"></span>Versioning = | = <span id="Versioning"></span>Versioning = | ||
| − | [ | + | [https://www.youtube.com/results?search_query=~version+versioning+ai YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=~version+versioning+ai ...Google search] |
| − | * [ | + | * [https://dvc.org/ DVC | DVC.org] |
| − | * [ | + | * [https://www.pachyderm.com/ Pachyderm] …[https://medium.com/bigdatarepublic/pachyderm-for-data-scientists-d1d1dff3a2fa Pachyderm for data scientists | Gerben Oostra - bigdata - Medium] |
| − | * [ | + | * [https://www.dataiku.com/ Dataiku] |
* [[Algorithm Administration#Continuous Machine Learning (CML)|Continuous Machine Learning (CML)]] | * [[Algorithm Administration#Continuous Machine Learning (CML)|Continuous Machine Learning (CML)]] | ||
| Line 196: | Line 196: | ||
- Cool features in DVC, like metrics, pipelines, and plots | - Cool features in DVC, like metrics, pipelines, and plots | ||
| − | Check out the DVC open source project on GitHub: | + | Check out the DVC open source project on GitHub: https://github.com/iterative/dvc |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 213: | Line 213: | ||
<youtube>IH2gEtxIbqM</youtube> | <youtube>IH2gEtxIbqM</youtube> | ||
<b>Alessia Marcolini: Version Control for Data Science | PyData Berlin 2019 | <b>Alessia Marcolini: Version Control for Data Science | PyData Berlin 2019 | ||
| − | </b><br>Track:PyData Are you versioning your Machine Learning project as you would do in a traditional software project? How are you keeping track of changes in your datasets? Recorded at the PyConDE & PyData Berlin 2019 conference. | + | </b><br>Track:PyData Are you versioning your Machine Learning project as you would do in a traditional software project? How are you keeping track of changes in your datasets? Recorded at the PyConDE & PyData Berlin 2019 conference. https://pycon.de |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 231: | Line 231: | ||
<b>E05 Pioneering version control for data science with Pachyderm co-founder and CEO Joe Doliner | <b>E05 Pioneering version control for data science with Pachyderm co-founder and CEO Joe Doliner | ||
</b><br>5 years ago, Joe Doliner and his co-founder Joey Zwicker decided to focus on the hard problems in data science, rather than building just another dashboard on top of the existing mess. It's been a long road, but it's really payed off. Last year, after an adventurous journey, they closed a $10m Series A led by Benchmark. In this episode, Erasmus Elsner is joined by Joe Doliner to explore what Pachyderm does and how it scaled from just an idea into a fast growing tech company. Listen to the podcast version | </b><br>5 years ago, Joe Doliner and his co-founder Joey Zwicker decided to focus on the hard problems in data science, rather than building just another dashboard on top of the existing mess. It's been a long road, but it's really payed off. Last year, after an adventurous journey, they closed a $10m Series A led by Benchmark. In this episode, Erasmus Elsner is joined by Joe Doliner to explore what Pachyderm does and how it scaled from just an idea into a fast growing tech company. Listen to the podcast version | ||
| − | + | https://apple.co/2W2g0nV | |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 257: | Line 257: | ||
= <span id="Hyperparameter"></span>Hyperparameter = | = <span id="Hyperparameter"></span>Hyperparameter = | ||
| − | [ | + | [https://www.youtube.com/results?search_query=hyperparameter+deep+learning+tuning+optimization+ai YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=hyperparameter+optimization+deep+machine+learning+ML+ai ...Google search] |
* [[Gradient Descent Optimization & Challenges]] | * [[Gradient Descent Optimization & Challenges]] | ||
* [[Hypernetworks]] | * [[Hypernetworks]] | ||
| − | * [ | + | * [https://cloud.google.com/ml-engine/docs/tensorflow/using-hyperparameter-tuning Using TensorFlow Tuning] |
| − | * [ | + | * [https://towardsdatascience.com/understanding-hyperparameters-and-its-optimisation-techniques-f0debba07568 Understanding Hyperparameters and its Optimisation techniques | Prabhu - Towards Data Science] |
| − | * [ | + | * [https://nanonets.com/blog/hyperparameter-optimization/ How To Make Deep Learning Models That Don’t Suck | Ajay Uppili Arasanipalai] |
| − | In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training. Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. [ | + | In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training. Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. [https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning) Hyperparameter (machine learning) | Wikipedia] |
| − | Machine learning algorithms train on data to find the best set of weights for each independent variable that affects the predicted value or class. The algorithms themselves have variables, called hyperparameters. They’re called hyperparameters, as opposed to parameters, because they control the operation of the algorithm rather than the weights being determined. The most important hyperparameter is often the learning rate, which determines the step size used when finding the next set of weights to try when optimizing. If the learning rate is too high, the gradient descent may quickly converge on a plateau or suboptimal point. If the learning rate is too low, the gradient descent may stall and never completely converge. Many other common hyperparameters depend on the algorithms used. Most algorithms have stopping parameters, such as the maximum number of epochs, or the maximum time to run, or the minimum improvement from epoch to epoch. Specific algorithms have hyperparameters that control the shape of their search. For example, a [[Random Forest (or) Random Decision Forest]] Classifier has hyperparameters for minimum samples per leaf, max depth, minimum samples at a split, minimum weight fraction for a leaf, and about 8 more. [ | + | Machine learning algorithms train on data to find the best set of weights for each independent variable that affects the predicted value or class. The algorithms themselves have variables, called hyperparameters. They’re called hyperparameters, as opposed to parameters, because they control the operation of the algorithm rather than the weights being determined. The most important hyperparameter is often the learning rate, which determines the step size used when finding the next set of weights to try when optimizing. If the learning rate is too high, the gradient descent may quickly converge on a plateau or suboptimal point. If the learning rate is too low, the gradient descent may stall and never completely converge. Many other common hyperparameters depend on the algorithms used. Most algorithms have stopping parameters, such as the maximum number of epochs, or the maximum time to run, or the minimum improvement from epoch to epoch. Specific algorithms have hyperparameters that control the shape of their search. For example, a [[Random Forest (or) Random Decision Forest]] Classifier has hyperparameters for minimum samples per leaf, max depth, minimum samples at a split, minimum weight fraction for a leaf, and about 8 more. [https://www.infoworld.com/article/3394399/machine-learning-algorithms-explained.html Machine learning algorithms explained | Martin Heller - InfoWorld] |
| − | + | https://nanonets.com/blog/content/images/2019/03/HPO1.png | |
== Hyperparameter Tuning == | == Hyperparameter Tuning == | ||
| − | Hyperparameters are the variables that govern the training process. Your model parameters are optimized (you could say "tuned") by the training process: you run data through the operations of the model, compare the resulting prediction with the actual value for each data instance, evaluate the accuracy, and adjust until you find the best combination to handle the problem. These algorithms automatically adjust (learn) their internal parameters based on data. However, there is a subset of parameters that is not learned and that have to be configured by an expert. Such parameters are often referred to as “hyperparameters” — and they have a big impact ...For example, the tree depth in a decision tree model and the number of layers in an artificial neural network are typical hyperparameters. The performance of a model can drastically depend on the choice of its hyperparameters. [ | + | Hyperparameters are the variables that govern the training process. Your model parameters are optimized (you could say "tuned") by the training process: you run data through the operations of the model, compare the resulting prediction with the actual value for each data instance, evaluate the accuracy, and adjust until you find the best combination to handle the problem. These algorithms automatically adjust (learn) their internal parameters based on data. However, there is a subset of parameters that is not learned and that have to be configured by an expert. Such parameters are often referred to as “hyperparameters” — and they have a big impact ...For example, the tree depth in a decision tree model and the number of layers in an artificial neural network are typical hyperparameters. The performance of a model can drastically depend on the choice of its hyperparameters. [https://thenextweb.com/podium/2019/11/11/machine-learning-algorithms-and-the-art-of-hyperparameter-selection/ Machine learning algorithms and the art of hyperparameter selection - A review of four optimization strategies | Mischa Lisovyi and Rosaria Silipo - TNW] |
There are four commonly used optimization strategies for hyperparameters: | There are four commonly used optimization strategies for hyperparameters: | ||
| Line 282: | Line 282: | ||
# Hill climbing | # Hill climbing | ||
| − | Bayesian optimization tends to be the most efficient. You would think that tuning as many hyperparameters as possible would give you the best answer. However, unless you are running on your own personal hardware, that could be very expensive. There are diminishing returns, in any case. With experience, you’ll discover which hyperparameters matter the most for your data and choice of algorithms. [ | + | Bayesian optimization tends to be the most efficient. You would think that tuning as many hyperparameters as possible would give you the best answer. However, unless you are running on your own personal hardware, that could be very expensive. There are diminishing returns, in any case. With experience, you’ll discover which hyperparameters matter the most for your data and choice of algorithms. [https://www.infoworld.com/article/3394399/machine-learning-algorithms-explained.html Machine learning algorithms explained | Martin Heller - InfoWorld] |
Hyperparameter Optimization libraries: | Hyperparameter Optimization libraries: | ||
* [https://github.com/maxim5/hyper-engine hyper-engine - Gaussian Process Bayesian optimization and some other techniques, like learning curve prediction] | * [https://github.com/maxim5/hyper-engine hyper-engine - Gaussian Process Bayesian optimization and some other techniques, like learning curve prediction] | ||
| − | * [ | + | * [https://ray.readthedocs.io/en/latest/tune.html Ray Tune: Hyperparameter Optimization Framework] |
| − | * [ | + | * [https://sigopt.com/ SigOpt’s API tunes your model’s parameters through state-of-the-art Bayesian optimization] |
| − | * [ | + | * [https://github.com/hyperopt/hyperopt hyperopt; Distributed Asynchronous Hyperparameter Optimization in Python - random search and tree of parzen estimators optimization.] |
| − | * [ | + | * [https://scikit-optimize.github.io/#skopt.Optimizer Scikit-Optimize, or skopt - Gaussian process Bayesian optimization] |
| − | * [ | + | * [https://github.com/polyaxon/polyaxon polyaxon] |
| − | * [ | + | * [https://github.com/SheffieldML/GPyOpt GPyOpt; Gaussian Process Optimization] |
Tuning: | Tuning: | ||
| Line 321: | Line 321: | ||
= Automated Learning = | = Automated Learning = | ||
| − | [ | + | [https://www.youtube.com/results?search_query=~Automated+~Learning+ai YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=~Automated+~Learning+ai ...Google search] |
* [[Other codeless options, Code Generators, Drag n' Drop]] | * [[Other codeless options, Code Generators, Drag n' Drop]] | ||
* [[AdaNet]] | * [[AdaNet]] | ||
| − | * [ | + | * [https://www.technologyreview.com/s/603381/ai-software-learns-to-make-ai-software/ AI Software Learns to Make AI Software] |
| − | * [ | + | * [https://www.nextgov.com/emerging-tech/2018/08/pentagon-wants-ai-take-over-scientific-process/150807/ The Pentagon Wants AI to Take Over the Scientific Process | Automating Scientific Knowledge Extraction (ASKE) | DARPA] |
| − | ** [ | + | ** [https://www.nextgov.com/emerging-tech/2018/09/inside-pentagons-plan-make-computers-collaborative-partners/151014/ Inside the Pentagon's Plan to Make Computers ‘Collaborative Partners’ | DARPA - Nextgov] |
| − | ** [ | + | ** [https://www.newscientist.com/article/dn28434-ai-tool-scours-all-the-science-on-the-web-to-find-new-knowledge/ AI tool scours all the science on the web to find new knowledge | Allen Institute for Artificial Intelligence (AI2)] |
| − | ** [ | + | ** [https://www.fbo.gov/index.php?s=opportunity&mode=form&id=f6149249b0f3c04c5b8994be1a492726&tab=core&tabmode=list&= Program Announcement for Artificial Intelligence Exploration (AIE) | DARPA - FedBizOpps.gov] |
| − | * [ | + | * [https://medium.com/applied-data-science/how-to-build-your-own-world-model-using-python-and-keras-64fb388ba459 Hallucinogenic Deep Reinforcement Learning Using Python and Keras | David Foster] |
| − | * [ | + | * [https://towardsdatascience.com/automated-feature-engineering-in-python-99baf11cc219 Automated Feature Engineering in Python - How to automatically create machine learning features | Will Koehrsen - Towards Data Science] |
| − | * [ | + | * [https://chatbotslife.com/why-meta-learning-is-crucial-for-further-advances-of-artificial-intelligence-c2df55959adf Why Meta-learning is Crucial for Further Advances of Artificial Intelligence? | Pavel Kordik] |
| − | * [ | + | * [https://www.darpa.mil/attachments/AssuredAutonomyProposersDay_Program%20Brief.pdf Assured Autonomy | Dr. Sandeep Neema, DARPA] |
| − | * [ | + | * [https://www.kdnuggets.com/2019/02/automatic-machine-learning-broken.html Automatic Machine Learning is Broken | Piotr Plonski - KDnuggets] |
| − | * [ | + | * [https://www.gigabitmagazine.com/ai/why-2020-will-be-year-automated-machine-learning Why 2020 will be the Year of Automated Machine Learning | Senthil Ravindran - Gigabit] |
| − | * [ | + | * [https://en.wikipedia.org/wiki/Meta_learning_(computer_science) Meta Learning | Wikipedia] |
Several production machine-learning platforms now offer automatic hyperparameter tuning. Essentially, you tell the system what hyperparameters you want to vary, and possibly what metric you want to optimize, and the system sweeps those hyperparameters across as many runs as you allow. ([[Google Cloud]] hyperparameter tuning extracts the appropriate metric from the TensorFlow model, so you don’t have to specify it.) | Several production machine-learning platforms now offer automatic hyperparameter tuning. Essentially, you tell the system what hyperparameters you want to vary, and possibly what metric you want to optimize, and the system sweeps those hyperparameters across as many runs as you allow. ([[Google Cloud]] hyperparameter tuning extracts the appropriate metric from the TensorFlow model, so you don’t have to specify it.) | ||
| − | An emerging class of data science toolkit that is finally making machine learning accessible to business subject matter experts. We anticipate that these innovations will mark a new era in data-driven decision support, where business analysts will be able to access and deploy machine learning on their own to analyze hundreds and thousands of dimensions simultaneously. Business analysts at highly competitive organizations will shift from using [[visualization]] tools as their only means of analysis, to using them in concert with AML. Data [[visualization]] tools will also be used more frequently to communicate model results, and to build task-oriented user interfaces that enable stakeholders to make both operational and strategic decisions based on output of scoring engines. They will also continue to be a more effective means for analysts to perform inverse analysis when one is seeking to identify where relationships in the data do not exist. [ | + | An emerging class of data science toolkit that is finally making machine learning accessible to business subject matter experts. We anticipate that these innovations will mark a new era in data-driven decision support, where business analysts will be able to access and deploy machine learning on their own to analyze hundreds and thousands of dimensions simultaneously. Business analysts at highly competitive organizations will shift from using [[visualization]] tools as their only means of analysis, to using them in concert with AML. Data [[visualization]] tools will also be used more frequently to communicate model results, and to build task-oriented user interfaces that enable stakeholders to make both operational and strategic decisions based on output of scoring engines. They will also continue to be a more effective means for analysts to perform inverse analysis when one is seeking to identify where relationships in the data do not exist. [https://www.ironsidegroup.com/2018/06/06/five-essential-capabilities-automated-machine-learning/ 'Five Essential Capabilities: Automated Machine Learning' | Gregory Bonnette] |
| − | [[H2O]] Driverless AI automatically performs feature engineering and [[Algorithm Administration#Hyperparameter|hyperparameter]] tuning, and claims to perform as well as Kaggle masters. [[AmazonML]] [[SageMaker]] supports [[Algorithm Administration#Hyperparameter|hyperparameter]] optimization. [[Microsoft]] Azure Machine Learning AutoML automatically sweeps through features, algorithms, and [[Algorithm Administration#Hyperparameter|hyperparameter]]s for basic machine learning algorithms; a separate [ | + | [[H2O]] Driverless AI automatically performs feature engineering and [[Algorithm Administration#Hyperparameter|hyperparameter]] tuning, and claims to perform as well as Kaggle masters. [[AmazonML]] [[SageMaker]] supports [[Algorithm Administration#Hyperparameter|hyperparameter]] optimization. [[Microsoft]] Azure Machine Learning AutoML automatically sweeps through features, algorithms, and [[Algorithm Administration#Hyperparameter|hyperparameter]]s for basic machine learning algorithms; a separate [https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml Azure Machine Learning [[Algorithm Administration#Hyperparameter|hyperparameter]] tuning facility] allows you to sweep specific [[Algorithm Administration#Hyperparameter|hyperparameter]]s for an existing experiment. [https://cloud.google.com/automl/ Google Cloud AutoML] implements automatic deep [[transfer learning]] (meaning that it starts from an existing [[Deep Neural Network (DNN)]] trained on other data) and neural architecture search (meaning that it finds the right combination of extra network layers) for language pair translation, natural language classification, and image classification. [https://www.infoworld.com/article/3344596/review-google-cloud-automl-is-truly-automated-machine-learning.html Review: Google Cloud AutoML is truly automated machine learning | Martin Heller] |
| − | <img src=" | + | <img src="https://miro.medium.com/max/588/1*pgTLoLGw0PVaP7ViSyQabA.png" width="600"> |
{|<!-- T --> | {|<!-- T --> | ||
| Line 370: | Line 370: | ||
== <span id="AutoML"></span>AutoML == | == <span id="AutoML"></span>AutoML == | ||
| − | [ | + | [https://www.youtube.com/results?search_query=AutoML+ai YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=AutoML+ai ...Google search] |
| − | * [ | + | * [https://en.wikipedia.org/wiki/Automated_machine_learning Automated Machine Learning (AutoML) | Wikipedia] |
| − | * [ | + | * [https://www.automl.org/ AutoML.org] ...[https://ml.informatik.uni-freiburg.de/ ML Freiburg] ... [https://github.com/automl GitHub] and [https://www.tnt.uni-hannover.de/project/automl/ ML Hannover] |
| − | New cloud software suite of machine learning tools. It’s based on [[Google]]’s state-of-the-art research in image recognition called [[Neural Architecture]] Search (NAS). NAS is basically an algorithm that, given your specific dataset, searches for the most optimal neural network to perform a certain task on that dataset. AutoML is then a suite of machine learning tools that will allow one to easily train high-performance deep networks, without requiring the user to have any knowledge of deep learning or AI; all you need is labelled data! [[Google]] will use NAS to then find the best network for your specific dataset and task. [ | + | New cloud software suite of machine learning tools. It’s based on [[Google]]’s state-of-the-art research in image recognition called [[Neural Architecture]] Search (NAS). NAS is basically an algorithm that, given your specific dataset, searches for the most optimal neural network to perform a certain task on that dataset. AutoML is then a suite of machine learning tools that will allow one to easily train high-performance deep networks, without requiring the user to have any knowledge of deep learning or AI; all you need is labelled data! [[Google]] will use NAS to then find the best network for your specific dataset and task. [https://www.kdnuggets.com/2018/08/autokeras-killer-google-automl.html AutoKeras: The Killer of Google’s AutoML | George Seif - KDnuggets] |
| − | * [ | + | * [https://www.androidauthority.com/google-cloud-automl-vision-guide-894671/ Cloud AutoML Vision: Train your own machine learning model | Jessica Thornsby] |
| − | <img src=" | + | <img src="https://cloud.google.com/images/products/natural-language/automl-nl-works.png" width="800"> |
| Line 400: | Line 400: | ||
== Automatic Machine Learning (AML) == | == Automatic Machine Learning (AML) == | ||
| − | * [ | + | * [https://www.forbes.com/sites/tomdavenport/2019/09/03/dotdata-and-the-explosion-of-automated-machine-learning/#4549ede92c3a dotData And The Explosion Of Automated Machine Learning | Tom Davenport - Forbes] |
| − | <img src=" | + | <img src="https://thumbor.forbes.com/thumbor/960x0/https%3A%2F%2Fblogs-images.forbes.com%2Ftomdavenport%2Ffiles%2F2019%2F09%2F0-1200x534.jpg" width="800"> |
<youtube>OR-IKyP4ZpI</youtube> | <youtube>OR-IKyP4ZpI</youtube> | ||
| Line 419: | Line 419: | ||
== DARTS: Differentiable Architecture Search == | == DARTS: Differentiable Architecture Search == | ||
| − | [ | + | [https://www.youtube.com/results?search_query=DARTS+Differentiable+Architecture+Search YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=Differentiable+Architecture+Search ...Google search] |
| − | * [ | + | * [https://arxiv.org/pdf/1806.09055.pdf DARTS: Differentiable Architecture Search | H. Liu, K. Simonyan, and Y. Yang] addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, the method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. |
| − | * [ | + | * [https://www.microsoft.com/en-us/research/uploads/prod/2018/12/Neural-Architecture-Search-SLIDES.pdf Neural Architecture Search | Debadeepta Dey - Microsoft Research] |
| − | <img src=" | + | <img src="https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/c1f457e31b611da727f9aef76c283a18157dfa83/3-Figure1-1.png" width="600"> |
<youtube>wL-p5cjDG64</youtube> | <youtube>wL-p5cjDG64</youtube> | ||
| − | <img src=" | + | <img src="https://www.ironsidegroup.com/wp-content/uploads/2018/05/aml.png" width="600"> |
= <span id="AIOps / MLOps"></span>AIOps / MLOps = | = <span id="AIOps / MLOps"></span>AIOps / MLOps = | ||
| − | [ | + | [https://www.youtube.com/results?search_query=~AIOps+~MLOps+~devops+~secdevops+~devsecops+pipeline+toolchain+CI+CD+machine+learning+ai Youtube search...] |
| − | [ | + | [https://www.google.com/search?q=~AIOps+~MLOps+~devops+~secdevops+~devsecops+pipeline+toolchain+CI+CD+machine+learning+ai ...Google search] |
| − | * [ | + | * [https://devops.com/?s=ai DevOps.com] |
| − | * [ | + | * [https://www.forbes.com/sites/servicenow/2020/02/26/a-silver-bullet-for-cios/#53a1381e6870 A Silver Bullet For CIOs; Three ways AIOps can help IT leaders get strategic - Lisa Wolfe - Forbes] |
| − | * [ | + | * [https://www.forbes.com/sites/tomtaulli/2020/08/01/mlops-what-you-need-to-know/#37b536da1214 MLOps: What You Need To Know | Tom Taulli - Forbes] |
| − | * [ | + | * [https://devops.com/what-is-so-special-about-aiops-for-mission-critical-workloads/ What is so Special About AIOps for Mission Critical Workloads? | Rebecca James - DevOps] |
| − | * [ | + | * [https://www.bmc.com/blogs/what-is-aiops/ What is AIOps? Artificial Intelligence for IT Operations Explained | BMC] |
| − | * [ | + | * [https://www.splunk.com/en_us/it-operations/artificial-intelligence-aiops.html AIOps: Artificial Intelligence for IT Operations, Modernize and transform IT Operations with solutions built on the only Data-to-Everything platform | splunk>] |
| − | * [ | + | * [https://www.gartner.com/smarterwithgartner/how-to-get-started-with-aiops/ How to Get Started With AIOps | Susan Moore - Gartner] |
| − | * [ | + | * [https://hackernoon.com/why-ai-ml-will-shake-software-testing-up-in-2019-b3f86a30bcfa Why AI & ML Will Shake Software Testing up in 2019 | Oleksii Kharkovyna - Medium] |
| − | * [ | + | * [https://martinfowler.com/articles/cd4ml.html Continuous Delivery for Machine Learning D. Sato, A. Wider and C. Windheuser - MartinFowler] |
* [[Defense]]: [[Joint Capabilities Integration and Development System (JCIDS)#Adaptive Acquisition Framework (AAF)|Adaptive Acquisition Framework (AAF)]] | * [[Defense]]: [[Joint Capabilities Integration and Development System (JCIDS)#Adaptive Acquisition Framework (AAF)|Adaptive Acquisition Framework (AAF)]] | ||
| − | Machine learning capabilities give IT operations teams contextual, actionable insights to make better decisions on the job. More importantly, AIOps is an approach that transforms how systems are automated, detecting important signals from vast amounts of data and relieving the operator from the headaches of managing according to tired, outdated runbooks or policies. In the AIOps future, the environment is continually improving. The administrator can get out of the impossible business of refactoring rules and policies that are immediately outdated in today’s modern IT environment. Now that we have AI and machine learning technologies embedded into IT operations systems, the game changes drastically. AI and machine learning-enhanced automation will bridge the gap between DevOps and IT Ops teams: helping the latter solve issues faster and more accurately to keep pace with business goals and user needs. [ | + | Machine learning capabilities give IT operations teams contextual, actionable insights to make better decisions on the job. More importantly, AIOps is an approach that transforms how systems are automated, detecting important signals from vast amounts of data and relieving the operator from the headaches of managing according to tired, outdated runbooks or policies. In the AIOps future, the environment is continually improving. The administrator can get out of the impossible business of refactoring rules and policies that are immediately outdated in today’s modern IT environment. Now that we have AI and machine learning technologies embedded into IT operations systems, the game changes drastically. AI and machine learning-enhanced automation will bridge the gap between DevOps and IT Ops teams: helping the latter solve issues faster and more accurately to keep pace with business goals and user needs. [https://it.toolbox.com/guest-article/how-aiops-helps-it-operators-on-the-job How AIOps Helps IT Operators on the Job | Ciaran Byrne - Toolbox] |
| − | <img src=" | + | <img src="https://martinfowler.com/articles/cd4ml/cd4ml-end-to-end.png" width="1000" height="500"> |
| Line 562: | Line 562: | ||
<youtube>P5wcE4IwKgQ</youtube> | <youtube>P5wcE4IwKgQ</youtube> | ||
<b>Machine Learning on Kubernetes with Kubeflow | <b>Machine Learning on Kubernetes with Kubeflow | ||
| − | </b><br>[[Google[[ Cloud Platform Join Fei and Ivan as they talk to us about the benefits of running your [[TensorFlow]] models in Kubernetes using Kubeflow. Working on a cool project and want to get in contact with us? Fill out Don't forget to subscribe to the channel! → | + | </b><br>[[Google[[ Cloud Platform Join Fei and Ivan as they talk to us about the benefits of running your [[TensorFlow]] models in Kubernetes using Kubeflow. Working on a cool project and want to get in contact with us? Fill out Don't forget to subscribe to the channel! → https://goo.gl/UzeAiN this form → https://take5.page.link/csf1 Watch more Take5 episodes here → https://bit.ly/2MgTllk |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 570: | Line 570: | ||
<youtube>gWgy3EdDObQ</youtube> | <youtube>gWgy3EdDObQ</youtube> | ||
<b>PipelineAI End-to-End [[TensorFlow]] Model Training + Deploying + Monitoring + Predicting (Demo) | <b>PipelineAI End-to-End [[TensorFlow]] Model Training + Deploying + Monitoring + Predicting (Demo) | ||
| − | </b><br>100% open source and reproduce-able on your laptop through Docker - or in production through Kubernetes! Details at | + | </b><br>100% open source and reproduce-able on your laptop through Docker - or in production through Kubernetes! Details at https://pipeline.io and https://github.com/fluxcapacitor/pipeline End-to-end pipeline demo: Train and deploy a [[TensorFlow]] model from research to live production. Includes full metrics and insight into the offline training and online predicting phases. |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 579: | Line 579: | ||
<youtube>PHVtFQpAbsY</youtube> | <youtube>PHVtFQpAbsY</youtube> | ||
<b>Deep Learning Pipelines: Enabling AI in Production | <b>Deep Learning Pipelines: Enabling AI in Production | ||
| − | </b><br>Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is an Apache Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipelines scales out many compute-intensive deep learning tasks. In this talk, we discuss the philosophy behind Deep Learning Pipelines, as well as the main tools it provides, how they fit into the deep learning ecosystem, and how they demonstrate Spark's role in deep learning. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Website: | + | </b><br>Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is an Apache Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipelines scales out many compute-intensive deep learning tasks. In this talk, we discuss the philosophy behind Deep Learning Pipelines, as well as the main tools it provides, how they fit into the deep learning ecosystem, and how they demonstrate Spark's role in deep learning. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Website: https://databricks.com |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 588: | Line 588: | ||
<b>PipelineAI: High Performance Distributed [[TensorFlow]] AI + GPU + Model Optimizing Predictions | <b>PipelineAI: High Performance Distributed [[TensorFlow]] AI + GPU + Model Optimizing Predictions | ||
</b><br>We will each build an end-to-end, continuous [[TensorFlow]] AI model training and deployment pipeline on our own GPU-based cloud instance. At the end, we will combine our cloud instances to create the LARGEST Distributed [[TensorFlow]] AI Training and Serving Cluster in the WORLD! Pre-requisites Just a modern browser and an internet connection. We'll provide the rest! Agenda Spark ML [[TensorFlow]] AI Storing and Serving Models with HDFS Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out | </b><br>We will each build an end-to-end, continuous [[TensorFlow]] AI model training and deployment pipeline on our own GPU-based cloud instance. At the end, we will combine our cloud instances to create the LARGEST Distributed [[TensorFlow]] AI Training and Serving Cluster in the WORLD! Pre-requisites Just a modern browser and an internet connection. We'll provide the rest! Agenda Spark ML [[TensorFlow]] AI Storing and Serving Models with HDFS Trade-offs of CPU vs. *GPU, Scale Up vs. Scale Out | ||
| − | CUDA + cuDNN GPU Development Overview [[TensorFlow]] Model Checkpointing, Saving, Exporting, and Importing Distributed [[TensorFlow]] AI Model Training (Distributed [[TensorFlow]]) [[TensorFlow]]'s Accelerated Linear Algebra Framework (XLA) [[TensorFlow]]'s Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler Centralized Logging and Visualizing of Distributed [[TensorFlow]] Training (Tensorboard) Distributed [[TensorFlow]] AI Model Serving/Predicting ([[TensorFlow]] Serving) Centralized Logging and Metrics Collection (Prometheus, Grafana) Continuous [[TensorFlow]] AI Model Deployment ([[TensorFlow]], Airflow) Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes) High-Performance and Fault-Tolerant Micro-services (NetflixOSS) | + | CUDA + cuDNN GPU Development Overview [[TensorFlow]] Model Checkpointing, Saving, Exporting, and Importing Distributed [[TensorFlow]] AI Model Training (Distributed [[TensorFlow]]) [[TensorFlow]]'s Accelerated Linear Algebra Framework (XLA) [[TensorFlow]]'s Just-in-Time (JIT) Compiler, Ahead of Time (AOT) Compiler Centralized Logging and Visualizing of Distributed [[TensorFlow]] Training (Tensorboard) Distributed [[TensorFlow]] AI Model Serving/Predicting ([[TensorFlow]] Serving) Centralized Logging and Metrics Collection (Prometheus, Grafana) Continuous [[TensorFlow]] AI Model Deployment ([[TensorFlow]], Airflow) Hybrid Cross-Cloud and On-Premise Deployments (Kubernetes) High-Performance and Fault-Tolerant Micro-services (NetflixOSS) https://pipeline.ai |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 597: | Line 597: | ||
<youtube>f_-3rQoudnc</youtube> | <youtube>f_-3rQoudnc</youtube> | ||
<b>Bringing Your Data Pipeline into The Machine Learning Era - Chris Gaun & Jörg Schad, Mesosphere | <b>Bringing Your Data Pipeline into The Machine Learning Era - Chris Gaun & Jörg Schad, Mesosphere | ||
| − | </b><br>Want to view more sessions and keep the conversations going? Join us for KubeCon + CloudNativeCon North America in Seattle, December 11 - 13, 2018 ( | + | </b><br>Want to view more sessions and keep the conversations going? Join us for KubeCon + CloudNativeCon North America in Seattle, December 11 - 13, 2018 (https://bit.ly/KCCNCNA18) or in Shanghai, November 14-15 (https://bit.ly/kccncchina18). Bringing Your Data Pipeline into The Machine Learning Era - Chris Gaun, Mesosphere (Intermediate Skill Level) Kubeflow is a new tool that makes it easy to run distributed machine learning solutions (e.g. Tensorflow) on Kubernetes. However, much of the data that can feed machine learning algorithms is already in existing distributed data stores. This presentation shows how to connect existing distributed data services running on Apache Mesos to Tensorflow on Kubernetes using the Kubeflow tool. Chris Gaun will show you how this existing data can now leverage machine learning, such as Tensorflow, on Kubernetes using the Kubeflow tool. These lessons can be extrapolated to any local distributed data. Chris Gaun is a CNCF ambassador and product marketing manager at Mesosphere. He has presented at Kubecon in 2016 and has put on over 40 free Kubernetes workshops across US and EU in 2017. About Jörg He is a technical lead at Mesosphere https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects. Learn more at https://bit.ly/2XTN3ho. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects. |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 606: | Line 606: | ||
<b>Deep Dive into Deep Learning Pipelines - Sue Ann Hong & Tim Hunter | <b>Deep Dive into Deep Learning Pipelines - Sue Ann Hong & Tim Hunter | ||
</b><br>"Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is a Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipelines scales out many compute-intensive deep learning tasks. In this talk we dive into - the various use cases of Deep Learning Pipelines such as prediction at massive scale, transfer learning, and hyperparameter tuning, many of which can be done in just a few lines of code. - how to work with complex data such as images in Spark and Deep Learning Pipelines. - how to deploy deep learning models through familiar Spark APIs such as MLlib and Spark SQL to empower everyone from machine learning practitioners to business analysts. Finally, we discuss integration with popular deep learning frameworks. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Connect with us: | </b><br>"Deep learning has shown tremendous successes, yet it often requires a lot of effort to leverage its power. Existing deep learning frameworks require writing a lot of code to run a model, let alone in a distributed manner. Deep Learning Pipelines is a Spark Package library that makes practical deep learning simple based on the Spark MLlib Pipelines API. Leveraging Spark, Deep Learning Pipelines scales out many compute-intensive deep learning tasks. In this talk we dive into - the various use cases of Deep Learning Pipelines such as prediction at massive scale, transfer learning, and hyperparameter tuning, many of which can be done in just a few lines of code. - how to work with complex data such as images in Spark and Deep Learning Pipelines. - how to deploy deep learning models through familiar Spark APIs such as MLlib and Spark SQL to empower everyone from machine learning practitioners to business analysts. Finally, we discuss integration with popular deep learning frameworks. About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Connect with us: | ||
| − | Website: | + | Website: https://databricks.com |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 623: | Line 623: | ||
<youtube>UmCB9ycz55Q</youtube> | <youtube>UmCB9ycz55Q</youtube> | ||
<b>End to End Streaming ML Recommendation Pipeline Spark 2 0, Kafka, [[TensorFlow]] Workshop | <b>End to End Streaming ML Recommendation Pipeline Spark 2 0, Kafka, [[TensorFlow]] Workshop | ||
| − | </b><br>End to End Streaming ML Recommendation Pipeline Spark 2 0, Kafka, TensorFlow Workshop Presented at Bangalore Apache Spark Meetup by Chris Fregly on 10/12/2016. Connect with Chris Fregly at | + | </b><br>End to End Streaming ML Recommendation Pipeline Spark 2 0, Kafka, TensorFlow Workshop Presented at Bangalore Apache Spark Meetup by Chris Fregly on 10/12/2016. Connect with Chris Fregly at https://www.linkedin.com/in/cfregly https://twitter.com/cfregly https://www.slideshare.net/cfregly |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 640: | Line 640: | ||
<youtube>84gqSbLcBFE</youtube> | <youtube>84gqSbLcBFE</youtube> | ||
<b>Let’s Write a Pipeline - Machine Learning Recipes #4 | <b>Let’s Write a Pipeline - Machine Learning Recipes #4 | ||
| − | </b><br>[[Google]] Developer In this episode, we’ll write a basic pipeline for supervised learning with just 12 lines of code. Along the way, we'll talk about training and testing data. Then, we’ll work on our intuition for what it means to “learn” from data. Check out [[TensorFlow]] Playground: | + | </b><br>[[Google]] Developer In this episode, we’ll write a basic pipeline for supervised learning with just 12 lines of code. Along the way, we'll talk about training and testing data. Then, we’ll work on our intuition for what it means to “learn” from data. Check out [[TensorFlow]] Playground: https://goo.gl/cv7Dq5 |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 726: | Line 726: | ||
<youtube>KidlhiqSNmM</youtube> | <youtube>KidlhiqSNmM</youtube> | ||
<b>Commit Virtual 2020: MLOps DevOps for Machine Learning | <b>Commit Virtual 2020: MLOps DevOps for Machine Learning | ||
| − | </b><br>GitLab Speaker: Monmayuri Ray The practice of Devops - developing software and operationalizing the development cycle has been evolving for over a decade. Now, a new addition has joined this holistic development cycle: Machine Predictions. The emerging art and science of machine learning algorithms integrated into current operational systems is opening new possibilities for engineers, scientists, and architects in the tech world. This presentation will take the audience on a journey in understanding the fundamentals of orchestrating machine predictions using MLOps in this ever-changing, agile world of software development. You’ll learn how to excel at the craft of DevOps for Machine Learning (ML). Monmayuri will unpack the theoretical constructs and show they apply to real-world scenarios. Get in touch with Sales: | + | </b><br>GitLab Speaker: Monmayuri Ray The practice of Devops - developing software and operationalizing the development cycle has been evolving for over a decade. Now, a new addition has joined this holistic development cycle: Machine Predictions. The emerging art and science of machine learning algorithms integrated into current operational systems is opening new possibilities for engineers, scientists, and architects in the tech world. This presentation will take the audience on a journey in understanding the fundamentals of orchestrating machine predictions using MLOps in this ever-changing, agile world of software development. You’ll learn how to excel at the craft of DevOps for Machine Learning (ML). Monmayuri will unpack the theoretical constructs and show they apply to real-world scenarios. Get in touch with Sales: https://bit.ly/2IygR7z |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 735: | Line 735: | ||
<youtube>_Ni6JWdeCew</youtube> | <youtube>_Ni6JWdeCew</youtube> | ||
<b>AIDevFest20: Machine Learning Design Patterns for MLOps | <b>AIDevFest20: Machine Learning Design Patterns for MLOps | ||
| − | </b><br>Speaker: Valliappa Lakshmanan (LAK), [[Google]] Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects. Anyone designing infrastructure for machine learning will have to be able to provide easy ways for the data engineers, data scientists, and ML engineers to implement these, and other, design patterns. Website: | + | </b><br>Speaker: Valliappa Lakshmanan (LAK), [[Google]] Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects. Anyone designing infrastructure for machine learning will have to be able to provide easy ways for the data engineers, data scientists, and ML engineers to implement these, and other, design patterns. Website: https://gdgdevfest20.xnextcon.com |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 749: | Line 749: | ||
== <span id="Continuous Machine Learning (CML)"></span>Continuous Machine Learning (CML) == | == <span id="Continuous Machine Learning (CML)"></span>Continuous Machine Learning (CML) == | ||
| − | * [ | + | * [https://cml.dev/ Continuous Machine Learning (CML)] ...is Continuous Integration/Continuous Deployment (CI/CD) for Machine Learning Projects |
| − | * [ | + | * [https://dvc.org/ DVC | DVC.org] |
{|<!-- T --> | {|<!-- T --> | ||
| Line 758: | Line 758: | ||
<youtube>9BgIDqAzfuA</youtube> | <youtube>9BgIDqAzfuA</youtube> | ||
<b>MLOps Tutorial #1: Intro to Continuous Integration for ML | <b>MLOps Tutorial #1: Intro to Continuous Integration for ML | ||
| − | </b><br>DVCorg Learn how to use one of the most powerful ideas from the DevOps revolution, continuous integration, in your data science and machine learning projects. This hands-on tutorial shows you how to create an automatic model training & testing setup using GitHub Actions and Continuous Machine Learning (CML), two free and open-source tools in the Git ecosystem. Designed for total beginners! We'll be using: GitHub Actions: | + | </b><br>DVCorg Learn how to use one of the most powerful ideas from the DevOps revolution, continuous integration, in your data science and machine learning projects. This hands-on tutorial shows you how to create an automatic model training & testing setup using GitHub Actions and Continuous Machine Learning (CML), two free and open-source tools in the Git ecosystem. Designed for total beginners! We'll be using: GitHub Actions: https://github.com/features/actions CML: https://github.com/iterative/cml |
| − | Resources: Code: | + | Resources: Code: https://github.com/andronovhopf/wine GitLab support: https://github.com/iterative/cml/wiki |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 773: | Line 773: | ||
- How to display a table comparing model performance to the main branch in a GitHub Pull Request | - How to display a table comparing model performance to the main branch in a GitHub Pull Request | ||
| − | ** Need an intro to GitHub Actions and continuous integration? Check out the first video in this series! | + | ** Need an intro to GitHub Actions and continuous integration? Check out the first video in this series! https://youtu.be/9BgIDqAzfuA ** |
Helpful links: | Helpful links: | ||
| − | Dataset: Data on farmers’ adoption of climate change mitigation measures, individual characteristics, risk attitudes and social influences in a region of Switzerland | + | Dataset: Data on farmers’ adoption of climate change mitigation measures, individual characteristics, risk attitudes and social influences in a region of Switzerland https://www.sciencedirect.com/science/article/pii/S2352340920303048 |
| − | Code: | + | Code: https://github.com/elleobrien/farmer |
| − | DVC pipelines & metrics documentation: | + | DVC pipelines & metrics documentation: https://dvc.org/doc/start/data-pipelines#data-pipelines |
| − | CML project repo: | + | CML project repo: https://github.com/iterative/cml |
| − | DVC Discord channel: | + | DVC Discord channel: https://discord.gg/bzA6uY7 |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
== <span id="DevSecOps"></span>DevSecOps == | == <span id="DevSecOps"></span>DevSecOps == | ||
| − | [ | + | [https://www.youtube.com/results?search_query=~AIOps+~MLOps+~devops+~secdevops+~devsecops+pipeline+toolchain+CI+CD+machine+learning+ai Youtube search...] |
| − | [ | + | [https://www.google.com/search?q=~AIOps+~MLOps+~devops+~secdevops+~devsecops+pipeline+toolchain+CI+CD+machine+learning+ai ...Google search] |
* [[Cybersecurity]] | * [[Cybersecurity]] | ||
* [[Containers; Docker, Kubernetes & Microservices]] | * [[Containers; Docker, Kubernetes & Microservices]] | ||
| − | * [ | + | * [https://safecode.org/ SafeCode] ...nonprofit organization that brings business leaders and technical experts together to exchange insights and ideas on creating, improving and promoting scalable and effective software security programs. |
| − | * [ | + | * [https://techbeacon.com/devops/3-ways-ai-will-advance-devsecops 3 ways AI will advance DevSecOps | Joseph Feiman - TechBeacon] |
| − | * [ | + | * [https://dzone.com/articles/leveraging-ai-and-automation-for-successful-devsec Leveraging AI and Automation for Successful DevSecOps | Vishnu Nallani - DZone] |
DevSecOps (also known as SecDevOps and DevOpsSec) is the process of integrating secure development best practices and methodologies into continuous design, development, deployment and integration processes | DevSecOps (also known as SecDevOps and DevOpsSec) is the process of integrating secure development best practices and methodologies into continuous design, development, deployment and integration processes | ||
| Line 806: | Line 806: | ||
<youtube>DzX9Vi_UQ8o</youtube> | <youtube>DzX9Vi_UQ8o</youtube> | ||
<b>DevSecOps : What, Why and How | <b>DevSecOps : What, Why and How | ||
| − | </b><br>Black Hat In this talk, we shall focus on how a DevOps pipeline can easily be metamorphosed into a DevSecOps and the benefits which can be achieved with this transformation. The talk (assisted with various demos) will focus on developing a DevSecOps pipeline using free/open-source tools in various deployment platforms, i.e. on-premise, cloud native and hybrid scenarios. By Anant Shrivastava [ | + | </b><br>Black Hat In this talk, we shall focus on how a DevOps pipeline can easily be metamorphosed into a DevSecOps and the benefits which can be achieved with this transformation. The talk (assisted with various demos) will focus on developing a DevSecOps pipeline using free/open-source tools in various deployment platforms, i.e. on-premise, cloud native and hybrid scenarios. By Anant Shrivastava [https://www.blackhat.com/us-19/briefings/schedule/#devsecops--what-why-and-how-17058 Full Abstract & Presentation Material] |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 926: | Line 926: | ||
Keynote: How to Use Hacker Personas to Successfully Build DevSecOps Pipeline | Keynote: How to Use Hacker Personas to Successfully Build DevSecOps Pipeline | ||
Robin Yeman 07:29:54 – 08:00:45 | Robin Yeman 07:29:54 – 08:00:45 | ||
| − | + | https://resources.sei.cmu.edu/library/... | |
Outbrief and Closing Remarks | Outbrief and Closing Remarks | ||
| Line 936: | Line 936: | ||
=== <span id="DevSecOps in Government"></span>DevSecOps in Government === | === <span id="DevSecOps in Government"></span>DevSecOps in Government === | ||
| − | * [[Evaluation]]: [ | + | * [[Evaluation]]: [https://tech.gsa.gov/guides/dev_sec_ops_guide/ DevSecOps Guide | General Services Administration (GSA)] |
* [[Joint_Capabilities_Integration_and_Development_System_(JCIDS)#Cybersecurity & Acquisition Lifecycle Integration|Cybersecurity & Acquisition Lifecycle Integration]] | * [[Joint_Capabilities_Integration_and_Development_System_(JCIDS)#Cybersecurity & Acquisition Lifecycle Integration|Cybersecurity & Acquisition Lifecycle Integration]] | ||
| − | * [ | + | * [https://dodcio.defense.gov/Portals/0/Documents/DoD%20Enterprise%20DevSecOps%20Reference%20Design%20v1.0_Public%20Release.pdf?ver=2019-09-26-115824-583 Defense|DOD Enterprise DevSecOps Reference Design Version 1.0 12 August 2019 | Department of Defense (DOD) Chief Information Officer (CIO] |
| − | * [ | + | * [https://tech.gsa.gov/guides/understanding_differences_agile_devsecops/ Understanding the Differences Between Agile & DevSecOps - from a Business Perspective | General Services Administration (GSA)] |
{|<!-- T --> | {|<!-- T --> | ||
| Line 947: | Line 947: | ||
<youtube>2STTK52eAbM</youtube> | <youtube>2STTK52eAbM</youtube> | ||
<b>[[Defense|DOD]] Enterprise DevSecOps Initiative | <b>[[Defense|DOD]] Enterprise DevSecOps Initiative | ||
| − | </b><br>CSIAC The current [[Defense|Department of Defense (DOD)]] software acquisition process is not responsive to the needs of our warfighters. Therefore, it is difficult for the DOD to keep pace with our potential adversaries and avoid falling behind them. To address this situation, the [[Defense|DOD]] is pursuing a new software development activity called the [[Defense|DOD]] Enterprise DevSecOps Initiative. This webinar will present the vision for transforming [[Defense|DOD]] software acquisition into secure, responsive software factories. It will examine and explore the utilization of modern software development processes and tools to revolutionize the Department’s ability to provide responsive, timely, and secure software capabilities for our warfighters. The focus of the effort involves exploiting automated software tools, services, and standards so warfighters can rapidly create, deploy, and operate software applications in a secure, flexible, and interoperable manner. [ | + | </b><br>CSIAC The current [[Defense|Department of Defense (DOD)]] software acquisition process is not responsive to the needs of our warfighters. Therefore, it is difficult for the DOD to keep pace with our potential adversaries and avoid falling behind them. To address this situation, the [[Defense|DOD]] is pursuing a new software development activity called the [[Defense|DOD]] Enterprise DevSecOps Initiative. This webinar will present the vision for transforming [[Defense|DOD]] software acquisition into secure, responsive software factories. It will examine and explore the utilization of modern software development processes and tools to revolutionize the Department’s ability to provide responsive, timely, and secure software capabilities for our warfighters. The focus of the effort involves exploiting automated software tools, services, and standards so warfighters can rapidly create, deploy, and operate software applications in a secure, flexible, and interoperable manner. [https://www.csiac.org/podcast/dod-enterprise-devsecops-initiative/ Slides] |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 977: | Line 977: | ||
=== Strangler Fig / Strangler Pattern === | === Strangler Fig / Strangler Pattern === | ||
| − | * [ | + | * [https://martinfowler.com/bliki/StranglerFigApplication.html Strangler Fig Application | Martin Fowler] |
| − | * [ | + | * [https://www.michielrook.nl/2016/11/strangler-pattern-practice/ The Strangler pattern in practice | Michiel Rook] |
| − | * [ | + | * [https://docs.microsoft.com/en-us/azure/architecture/patterns/strangler Strangler pattern] ...[https://docs.microsoft.com/en-us/azure/architecture/patterns/ Cloud Design Patterns | ][[Microsoft]] |
| − | * [ | + | * [https://www.castsoftware.com/blog/how-to-use-strangler-pattern-for-microservices-modernization How to use strangler pattern for microservices modernization | N. Natean - Software Intelligence Plus] |
| − | * [ | + | * [https://medium.com/@rmmeans/serverless-strangler-pattern-on-aws-31c88191268d Serverless Strangler Pattern on AWS | Ryan Means - Medium] |
Strangulation of a legacy or undesirable solution is a safe way to phase one thing out for something better, cheaper, or more expandable. You make something new that obsoletes a small percentage of something old, and put them live together. You do some more work in the same style, and go live again (rinse, repeat). | Strangulation of a legacy or undesirable solution is a safe way to phase one thing out for something better, cheaper, or more expandable. You make something new that obsoletes a small percentage of something old, and put them live together. You do some more work in the same style, and go live again (rinse, repeat). | ||
| − | [ | + | [https://paulhammant.com/2013/07/14/legacy-application-strangulation-case-studies/ Strangler Applications | Paul Hammant] ...case studies |
{|<!-- T --> | {|<!-- T --> | ||
| Line 993: | Line 993: | ||
<b>"Choking the Monolith − The Strangler Pattern Applied" by Tobias Goeschel (@w3ltraumpirat) | <b>"Choking the Monolith − The Strangler Pattern Applied" by Tobias Goeschel (@w3ltraumpirat) | ||
</b><br>The so called "Strangler Fig" (aka "Strangler") pattern is a much cited strategy for replacing legacy systems with new, often microservice-based architectures. However, it's not actually a microservice pattern, and there are several - quite different - ways to implement it. Are you confused yet? Fear not: We will have a look at the theory, and then explore together, how Strangler Fig could be used to improve and replace a project worthy being called "The Most Horrible Piece Of Code in the World". | </b><br>The so called "Strangler Fig" (aka "Strangler") pattern is a much cited strategy for replacing legacy systems with new, often microservice-based architectures. However, it's not actually a microservice pattern, and there are several - quite different - ways to implement it. Are you confused yet? Fear not: We will have a look at the theory, and then explore together, how Strangler Fig could be used to improve and replace a project worthy being called "The Most Horrible Piece Of Code in the World". | ||
| − | === As a principal consultant at codecentric, Tobias has seen, improved and survived enough of "the code that earns our money" to enjoy it. [ | + | === As a principal consultant at codecentric, Tobias has seen, improved and survived enough of "the code that earns our money" to enjoy it. [https://www.slideshare.net/TobiasGoeschel/choking-the-monolith-the-strangler-fig-pattern-applied Slides] |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
| Line 1,001: | Line 1,001: | ||
<youtube>E2dnSg-IHdo</youtube> | <youtube>E2dnSg-IHdo</youtube> | ||
<b>AWS New York Summit 2019: Migrating Monolithic Applications with the Strangler Pattern (FSV303) | <b>AWS New York Summit 2019: Migrating Monolithic Applications with the Strangler Pattern (FSV303) | ||
| − | </b><br>Learn more about [[Amazon]] AWS Global Summits at – [ | + | </b><br>Learn more about [[Amazon]] AWS Global Summits at – [https://amzn.to/2Obv2Hs “Lifting and shifting”] an enterprise-scale application will yield some of the benefits of the cloud, but elasticity and agility may still be limited. Conversely, rewriting that application to be cloud-native can be costly in terms of both time and money and could cause you to miss market opportunities. This session explores the challenges financial institutions face when migrating their existing portfolio of applications to the cloud. Then, we share practical tips to migrate applications to realize cloud-native architectural benefits incrementally using the strangler pattern. |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
= <span id="Model Monitoring"></span>Model Monitoring = | = <span id="Model Monitoring"></span>Model Monitoring = | ||
| − | [ | + | [https://www.youtube.com/results?search_query=model+monitoring+machine+learning+ai YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=model+monitoring+machine+learning+ML+artificial+intelligence+ai ...Google search] |
| − | * [ | + | * [https://www.quora.com/How-do-you-evaluate-the-performance-of-a-machine-learning-model-thats-deployed-into-production How do you evaluate the performance of a machine learning model that's deployed into production? | Quora] |
| − | * [ | + | * [https://towardsdatascience.com/why-your-models-need-maintenance-faff545b38a2 Why your Models need Maintenance | Martin Schmitz - Towards Data Science] ...Change of concept & drift of Concept |
| − | * [ | + | * [https://www.analyticsvidhya.com/blog/2019/10/deployed-machine-learning-model-post-production-monitoring/ Deployed your Machine Learning Model? Here’s What you Need to Know About Post-Production Monitoring | Om Deshmukh - Analytics Vidhya] ...proactive & reactive model monitoring |
| − | Monitoring production systems is essential to keeping them running well. For ML systems, monitoring becomes even more important, because their performance depends not just on factors that we have some control over, like infrastructure and our own software, but also on data, which we have much less control over. Therefore, in addition to monitoring standard metrics like latency, traffic, errors and saturation, we also need to monitor model prediction performance. An obvious challenge with monitoring model performance is that we usually don’t have a verified label to compare our model’s predictions to, since the model works on new data. In some cases we might have some indirect way of assessing the model’s effectiveness, for example by measuring click rate for a recommendation model. In other cases, we might have to rely on comparisons between time periods, for example by calculating a percentage of positive classifications hourly and alerting if it deviates by more than a few percent from the average for that time. Just like when validating the model, it’s also important to monitor metrics across slices, and not just globally, to be able to detect problems affecting specific segments. [ | + | Monitoring production systems is essential to keeping them running well. For ML systems, monitoring becomes even more important, because their performance depends not just on factors that we have some control over, like infrastructure and our own software, but also on data, which we have much less control over. Therefore, in addition to monitoring standard metrics like latency, traffic, errors and saturation, we also need to monitor model prediction performance. An obvious challenge with monitoring model performance is that we usually don’t have a verified label to compare our model’s predictions to, since the model works on new data. In some cases we might have some indirect way of assessing the model’s effectiveness, for example by measuring click rate for a recommendation model. In other cases, we might have to rely on comparisons between time periods, for example by calculating a percentage of positive classifications hourly and alerting if it deviates by more than a few percent from the average for that time. Just like when validating the model, it’s also important to monitor metrics across slices, and not just globally, to be able to detect problems affecting specific segments. [https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f ML Ops: Machine Learning as an Engineering Discipline | Cristiano Breuel - Towards Data Science] |
| − | <img src=" | + | <img src="https://miro.medium.com/max/1000/1*U7Efc4rSPsXDTeKRzM86eg.png" width="800"> |
| Line 1,056: | Line 1,056: | ||
== <span id="A/B Testing"></span>A/B Testing == | == <span id="A/B Testing"></span>A/B Testing == | ||
| − | [ | + | [https://www.youtube.com/results?search_query=AB+A/B+~test+~Scoring+~score+~ai YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=AB+A/B+~test+~Scoring+~score+~ai ...Google search] |
| − | * [ | + | * [https://www.optimizely.com/ Optimizely] |
* [[Math for Intelligence#P-Value|P-Value]] | * [[Math for Intelligence#P-Value|P-Value]] | ||
* [[Math for Intelligence#Confidence Interval|Confidence Interval]] | * [[Math for Intelligence#Confidence Interval|Confidence Interval]] | ||
| − | <b>A/B testing (also known as bucket testing or split-run testing)</b> is a user experience research methodology. A/B tests consist of a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing or "two-sample hypothesis testing" as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective. [ | + | <b>A/B testing (also known as bucket testing or split-run testing)</b> is a user experience research methodology. A/B tests consist of a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing or "two-sample hypothesis testing" as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective. [https://en.wikipedia.org/wiki/A/B_testing A/B testing | Wikipedia] |
| − | A <b>randomized controlled trial (or randomized control trial; RCT)</b> is a type of scientific (often medical) experiment that aims to reduce certain sources of bias when testing the effectiveness of new treatments; this is accomplished by randomly allocating subjects to two or more groups, treating them differently, and then comparing them with respect to a measured response. One group—the experimental group—receives the intervention being assessed, while the other—usually called the control group—receives an alternative treatment, such as a placebo or no intervention. The groups are monitored under conditions of the trial design to determine the effectiveness of the experimental intervention, and efficacy is assessed in comparison to the control. There may be more than one treatment group or more than one control group. [ | + | A <b>randomized controlled trial (or randomized control trial; RCT)</b> is a type of scientific (often medical) experiment that aims to reduce certain sources of bias when testing the effectiveness of new treatments; this is accomplished by randomly allocating subjects to two or more groups, treating them differently, and then comparing them with respect to a measured response. One group—the experimental group—receives the intervention being assessed, while the other—usually called the control group—receives an alternative treatment, such as a placebo or no intervention. The groups are monitored under conditions of the trial design to determine the effectiveness of the experimental intervention, and efficacy is assessed in comparison to the control. There may be more than one treatment group or more than one control group. [https://en.wikipedia.org/wiki/Randomized_controlled_trial Randomized controlled trial | Wikipedia] |
{|<!-- T --> | {|<!-- T --> | ||
| Line 1,087: | Line 1,087: | ||
== Scoring Deployed Models == | == Scoring Deployed Models == | ||
| − | [ | + | [https://www.youtube.com/results?search_query=~Scoring+~score+~rate+~Deploy+~production+~installed+~Model+~ai YouTube search...] |
| − | [ | + | [https://www.google.com/search?q=Scoring+~score+~rate+~Deploy+~production+~installed+~Model+~ai ...Google search] |
{|<!-- T --> | {|<!-- T --> | ||
| Line 1,110: | Line 1,110: | ||
<hr> | <hr> | ||
| − | + | https://miro.medium.com/max/1000/1*ldWxdWDzEYnSbvchuL5k1w.png | |
| − | <img src=" | + | <img src="https://images.contentful.com/pqts2v0qq7kz/4Mcjw0xAi4auqweOQQyWCu/80236c975b5026ec67e61f767a646b45/machine_learning_flow--4j88rajonr_s600x0_q80_noupscale.png" width="500" height="500"> |
| − | <img src=" | + | <img src="https://miro.medium.com/max/1327/1*9xjlXSJ9i2DBB-BJIrY26w.png" width="800" height="500"> |
Revision as of 20:47, 28 January 2023
YouTube search... Quora search... ...Google search
- AI Governance / Algorithm Administration
- Visualization
- Graphical Tools for Modeling AI Components
- Hyperparameters
- Evaluation
- Train, Validate, and Test
- NLP Workbench / Pipeline
- Development
- Building Your Environment
- Service Capabilities
- AI Marketplace & Toolkit/Model Interoperability
- Software Development
- Directed Acyclic Graph (DAG) - programming pipelines
- Containers; Docker, Kubernetes & Microservices
- Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)
- Automatic Machine Learning (AutoML) Landscape Survey | Alexander Allen & Adithya Balaji - Georgian Partners...
- Automate your data lineage
- Benefiting from AI: A different approach to data management is needed
- Git - GitHub and GitLab ...publishing your model
- Use a Pipeline to Chain PCA with a RandomForest Classifier Jupyter Notebook | Jon Tupitza
- ML.NET Model Lifecycle with Azure DevOps CI/CD pipelines | Cesar de la Torre - Microsoft
- A Great Model is Not Enough: Deploying AI Without Technical Debt | DataKitchen - Medium
- Infrastructure Tools for Production | Aparna Dhinakaran - Towards Data Science ...Model Deployment and Serving
- Global Community for Artificial Intelligence (AI) in Master Data Management (MDM) | Camelot Management Consultants
- Particle Swarms for Dynamic Optimization Problems | T. Blackwell, J. Branke, and X. Li
- 5G_Security
Contents
Tools
- Google AutoML automatically build and deploy state-of-the-art machine learning models
- TensorBoard | Google
- Kubeflow Pipelines - a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. Introducing AI Hub and Kubeflow Pipelines: Making AI simpler, faster, and more useful for businesses | Google
- SageMaker | Amazon
- MLOps | Microsoft ...model management, deployment, and monitoring with Azure
- Ludwig - a Python toolbox from Uber that allows to train and test deep learning models
- TPOT a Python library that automatically creates and optimizes full machine learning pipelines using genetic programming. Not for NLP, strings need to be coded to numerics.
- H2O Driverless AI for automated Visualization, feature engineering, model training, hyperparameter optimization, and explainability.
- alteryx: Feature Labs, Featuretools
- MLBox Fast reading and distributed data preprocessing/cleaning/formatting. Highly robust feature selection and leak detection. Accurate hyper-parameter optimization in high-dimensional space. State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,…). Prediction with models interpretation. Primarily Linux.
- auto-sklearn algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction.is a Bayesian hyperparameter optimization layer on top of scikit-learn. Not for large datasets.
- Auto Keras is an open-source Python package for neural architecture search.
- ATM -auto tune models - a multi-tenant, multi-data system for automated machine learning (model selection and tuning). ATM is an open source software library under the Human Data Interaction project (HDI) at MIT.
- Auto-WEKA is a Bayesian hyperparameter optimization layer on top of Weka. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
- TransmogrifAI - an AutoML library for building modular, reusable, strongly typed machine learning workflows. A Scala/SparkML library created by Salesforce for automated data cleansing, feature engineering, model selection, and hyperparameter optimization
- RECIPE - a framework based on grammar-based genetic programming that builds customized scikit-learn classification pipelines.
- AutoMLC Automated Multi-Label Classification. GA-Auto-MLC and Auto-MEKAGGP are freely-available methods that perform automated multi-label classification on the MEKA software.
- Databricks MLflow an open source framework to manage the complete Machine Learning lifecycle using Managed MLflow as an integrated service with the Databricks Unified Analytics Platform... ...manage the ML lifecycle, including experimentation, reproducibility and deployment
- SAS Viya automates the process of data cleansing, data transformations, feature engineering, algorithm matching, model training and ongoing governance.
- Comet ML ...self-hosted and cloud-based meta machine learning platform allowing data scientists and teams to track, compare, explain and optimize experiments and models
- Domino Model Monitor (DMM) | Domino ...monitor the performance of all models across your entire organization
- Weights and Biases ...experiment tracking, model optimization, and dataset versioning
- SigOpt ...optimization platform and API designed to unlock the potential of modeling pipelines. This fully agnostic software solution accelerates, amplifies, and scales the model development process
- DVC ...Open-source Version Control System for Machine Learning Projects
- ModelOp Center | ModelOp
- Moogsoft and Red Hat Ansible Tower
- DSS | Dataiku
- Model Manager | SAS
- Machine Learning Operations (MLOps) | DataRobot ...build highly accurate predictive models with full transparency
- Metaflow, Netflix and AWS open source Python library
Master Data Management (MDM)
YouTube search... ...Google search Feature Store / Data Lineage / Data Catalog
|
|
|
|
|
|
|
|
Versioning
YouTube search... ...Google search
- DVC | DVC.org
- Pachyderm …Pachyderm for data scientists | Gerben Oostra - bigdata - Medium
- Dataiku
- Continuous Machine Learning (CML)
|
|
|
|
|
|
Model Versioning - ModelDB
- ModelDB: An open-source system for Machine Learning model versioning, metadata, and experiment management
|
|
Hyperparameter
YouTube search... ...Google search
- Gradient Descent Optimization & Challenges
- Hypernetworks
- Using TensorFlow Tuning
- Understanding Hyperparameters and its Optimisation techniques | Prabhu - Towards Data Science
- How To Make Deep Learning Models That Don’t Suck | Ajay Uppili Arasanipalai
In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training. Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. Hyperparameter (machine learning) | Wikipedia
Machine learning algorithms train on data to find the best set of weights for each independent variable that affects the predicted value or class. The algorithms themselves have variables, called hyperparameters. They’re called hyperparameters, as opposed to parameters, because they control the operation of the algorithm rather than the weights being determined. The most important hyperparameter is often the learning rate, which determines the step size used when finding the next set of weights to try when optimizing. If the learning rate is too high, the gradient descent may quickly converge on a plateau or suboptimal point. If the learning rate is too low, the gradient descent may stall and never completely converge. Many other common hyperparameters depend on the algorithms used. Most algorithms have stopping parameters, such as the maximum number of epochs, or the maximum time to run, or the minimum improvement from epoch to epoch. Specific algorithms have hyperparameters that control the shape of their search. For example, a Random Forest (or) Random Decision Forest Classifier has hyperparameters for minimum samples per leaf, max depth, minimum samples at a split, minimum weight fraction for a leaf, and about 8 more. Machine learning algorithms explained | Martin Heller - InfoWorld
Hyperparameter Tuning
Hyperparameters are the variables that govern the training process. Your model parameters are optimized (you could say "tuned") by the training process: you run data through the operations of the model, compare the resulting prediction with the actual value for each data instance, evaluate the accuracy, and adjust until you find the best combination to handle the problem. These algorithms automatically adjust (learn) their internal parameters based on data. However, there is a subset of parameters that is not learned and that have to be configured by an expert. Such parameters are often referred to as “hyperparameters” — and they have a big impact ...For example, the tree depth in a decision tree model and the number of layers in an artificial neural network are typical hyperparameters. The performance of a model can drastically depend on the choice of its hyperparameters. Machine learning algorithms and the art of hyperparameter selection - A review of four optimization strategies | Mischa Lisovyi and Rosaria Silipo - TNW
There are four commonly used optimization strategies for hyperparameters:
- Bayesian optimization
- Grid search
- Random search
- Hill climbing
Bayesian optimization tends to be the most efficient. You would think that tuning as many hyperparameters as possible would give you the best answer. However, unless you are running on your own personal hardware, that could be very expensive. There are diminishing returns, in any case. With experience, you’ll discover which hyperparameters matter the most for your data and choice of algorithms. Machine learning algorithms explained | Martin Heller - InfoWorld
Hyperparameter Optimization libraries:
- hyper-engine - Gaussian Process Bayesian optimization and some other techniques, like learning curve prediction
- Ray Tune: Hyperparameter Optimization Framework
- SigOpt’s API tunes your model’s parameters through state-of-the-art Bayesian optimization
- hyperopt; Distributed Asynchronous Hyperparameter Optimization in Python - random search and tree of parzen estimators optimization.
- Scikit-Optimize, or skopt - Gaussian process Bayesian optimization
- polyaxon
- GPyOpt; Gaussian Process Optimization
Tuning:
- Optimizer type
- Learning rate (fixed or not)
- Epochs
- Regularization rate (or not)
- Type of Regularization - L1, L2, ElasticNet
- Search type for local minima
- Gradient descent
- Simulated
- Annealing
- Evolutionary
- Decay rate (or not)
- Momentum (fixed or not)
- Nesterov Accelerated Gradient momentum (or not)
- Batch size
- Fitness measurement type
- MSE, accuracy, MAE, Cross-Entropy Loss
- Precision, recall
- Stop criteria
Automated Learning
YouTube search... ...Google search
- Other codeless options, Code Generators, Drag n' Drop
- AdaNet
- AI Software Learns to Make AI Software
- The Pentagon Wants AI to Take Over the Scientific Process | Automating Scientific Knowledge Extraction (ASKE) | DARPA
- Hallucinogenic Deep Reinforcement Learning Using Python and Keras | David Foster
- Automated Feature Engineering in Python - How to automatically create machine learning features | Will Koehrsen - Towards Data Science
- Why Meta-learning is Crucial for Further Advances of Artificial Intelligence? | Pavel Kordik
- Assured Autonomy | Dr. Sandeep Neema, DARPA
- Automatic Machine Learning is Broken | Piotr Plonski - KDnuggets
- Why 2020 will be the Year of Automated Machine Learning | Senthil Ravindran - Gigabit
- Meta Learning | Wikipedia
Several production machine-learning platforms now offer automatic hyperparameter tuning. Essentially, you tell the system what hyperparameters you want to vary, and possibly what metric you want to optimize, and the system sweeps those hyperparameters across as many runs as you allow. (Google Cloud hyperparameter tuning extracts the appropriate metric from the TensorFlow model, so you don’t have to specify it.)
An emerging class of data science toolkit that is finally making machine learning accessible to business subject matter experts. We anticipate that these innovations will mark a new era in data-driven decision support, where business analysts will be able to access and deploy machine learning on their own to analyze hundreds and thousands of dimensions simultaneously. Business analysts at highly competitive organizations will shift from using visualization tools as their only means of analysis, to using them in concert with AML. Data visualization tools will also be used more frequently to communicate model results, and to build task-oriented user interfaces that enable stakeholders to make both operational and strategic decisions based on output of scoring engines. They will also continue to be a more effective means for analysts to perform inverse analysis when one is seeking to identify where relationships in the data do not exist. 'Five Essential Capabilities: Automated Machine Learning' | Gregory Bonnette
H2O Driverless AI automatically performs feature engineering and hyperparameter tuning, and claims to perform as well as Kaggle masters. AmazonML SageMaker supports hyperparameter optimization. Microsoft Azure Machine Learning AutoML automatically sweeps through features, algorithms, and hyperparameters for basic machine learning algorithms; a separate Azure Machine Learning hyperparameter tuning facility allows you to sweep specific hyperparameters for an existing experiment. Google Cloud AutoML implements automatic deep transfer learning (meaning that it starts from an existing Deep Neural Network (DNN) trained on other data) and neural architecture search (meaning that it finds the right combination of extra network layers) for language pair translation, natural language classification, and image classification. Review: Google Cloud AutoML is truly automated machine learning | Martin Heller
|
|
AutoML
YouTube search... ...Google search
- Automated Machine Learning (AutoML) | Wikipedia
- AutoML.org ...ML Freiburg ... GitHub and ML Hannover
New cloud software suite of machine learning tools. It’s based on Google’s state-of-the-art research in image recognition called Neural Architecture Search (NAS). NAS is basically an algorithm that, given your specific dataset, searches for the most optimal neural network to perform a certain task on that dataset. AutoML is then a suite of machine learning tools that will allow one to easily train high-performance deep networks, without requiring the user to have any knowledge of deep learning or AI; all you need is labelled data! Google will use NAS to then find the best network for your specific dataset and task. AutoKeras: The Killer of Google’s AutoML | George Seif - KDnuggets
Automatic Machine Learning (AML)
Self-Learning
DARTS: Differentiable Architecture Search
YouTube search... ...Google search
- DARTS: Differentiable Architecture Search | H. Liu, K. Simonyan, and Y. Yang addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, the method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent.
- Neural Architecture Search | Debadeepta Dey - Microsoft Research
AIOps / MLOps
Youtube search... ...Google search
- DevOps.com
- A Silver Bullet For CIOs; Three ways AIOps can help IT leaders get strategic - Lisa Wolfe - Forbes
- MLOps: What You Need To Know | Tom Taulli - Forbes
- What is so Special About AIOps for Mission Critical Workloads? | Rebecca James - DevOps
- What is AIOps? Artificial Intelligence for IT Operations Explained | BMC
- AIOps: Artificial Intelligence for IT Operations, Modernize and transform IT Operations with solutions built on the only Data-to-Everything platform | splunk>
- How to Get Started With AIOps | Susan Moore - Gartner
- Why AI & ML Will Shake Software Testing up in 2019 | Oleksii Kharkovyna - Medium
- Continuous Delivery for Machine Learning D. Sato, A. Wider and C. Windheuser - MartinFowler
- Defense: Adaptive Acquisition Framework (AAF)
Machine learning capabilities give IT operations teams contextual, actionable insights to make better decisions on the job. More importantly, AIOps is an approach that transforms how systems are automated, detecting important signals from vast amounts of data and relieving the operator from the headaches of managing according to tired, outdated runbooks or policies. In the AIOps future, the environment is continually improving. The administrator can get out of the impossible business of refactoring rules and policies that are immediately outdated in today’s modern IT environment. Now that we have AI and machine learning technologies embedded into IT operations systems, the game changes drastically. AI and machine learning-enhanced automation will bridge the gap between DevOps and IT Ops teams: helping the latter solve issues faster and more accurately to keep pace with business goals and user needs. How AIOps Helps IT Operators on the Job | Ciaran Byrne - Toolbox
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Continuous Machine Learning (CML)
- Continuous Machine Learning (CML) ...is Continuous Integration/Continuous Deployment (CI/CD) for Machine Learning Projects
- DVC | DVC.org
|
|
DevSecOps
Youtube search... ...Google search
- Cybersecurity
- Containers; Docker, Kubernetes & Microservices
- SafeCode ...nonprofit organization that brings business leaders and technical experts together to exchange insights and ideas on creating, improving and promoting scalable and effective software security programs.
- 3 ways AI will advance DevSecOps | Joseph Feiman - TechBeacon
- Leveraging AI and Automation for Successful DevSecOps | Vishnu Nallani - DZone
DevSecOps (also known as SecDevOps and DevOpsSec) is the process of integrating secure development best practices and methodologies into continuous design, development, deployment and integration processes
|
|
|
|
|
|
DevSecOps in Government
- Evaluation: DevSecOps Guide | General Services Administration (GSA)
- Cybersecurity & Acquisition Lifecycle Integration
- Defense|DOD Enterprise DevSecOps Reference Design Version 1.0 12 August 2019 | Department of Defense (DOD) Chief Information Officer (CIO
- Understanding the Differences Between Agile & DevSecOps - from a Business Perspective | General Services Administration (GSA)
|
|
|
|
Strangler Fig / Strangler Pattern
- Strangler Fig Application | Martin Fowler
- The Strangler pattern in practice | Michiel Rook
- Strangler pattern ...Cloud Design Patterns | Microsoft
- How to use strangler pattern for microservices modernization | N. Natean - Software Intelligence Plus
- Serverless Strangler Pattern on AWS | Ryan Means - Medium
Strangulation of a legacy or undesirable solution is a safe way to phase one thing out for something better, cheaper, or more expandable. You make something new that obsoletes a small percentage of something old, and put them live together. You do some more work in the same style, and go live again (rinse, repeat). Strangler Applications | Paul Hammant ...case studies
|
|
Model Monitoring
YouTube search... ...Google search
- How do you evaluate the performance of a machine learning model that's deployed into production? | Quora
- Why your Models need Maintenance | Martin Schmitz - Towards Data Science ...Change of concept & drift of Concept
- Deployed your Machine Learning Model? Here’s What you Need to Know About Post-Production Monitoring | Om Deshmukh - Analytics Vidhya ...proactive & reactive model monitoring
Monitoring production systems is essential to keeping them running well. For ML systems, monitoring becomes even more important, because their performance depends not just on factors that we have some control over, like infrastructure and our own software, but also on data, which we have much less control over. Therefore, in addition to monitoring standard metrics like latency, traffic, errors and saturation, we also need to monitor model prediction performance. An obvious challenge with monitoring model performance is that we usually don’t have a verified label to compare our model’s predictions to, since the model works on new data. In some cases we might have some indirect way of assessing the model’s effectiveness, for example by measuring click rate for a recommendation model. In other cases, we might have to rely on comparisons between time periods, for example by calculating a percentage of positive classifications hourly and alerting if it deviates by more than a few percent from the average for that time. Just like when validating the model, it’s also important to monitor metrics across slices, and not just globally, to be able to detect problems affecting specific segments. ML Ops: Machine Learning as an Engineering Discipline | Cristiano Breuel - Towards Data Science
|
|
|
|
A/B Testing
YouTube search... ...Google search
A/B testing (also known as bucket testing or split-run testing) is a user experience research methodology. A/B tests consist of a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing or "two-sample hypothesis testing" as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject's response to variant A against variant B, and determining which of the two variants is more effective. A/B testing | Wikipedia
A randomized controlled trial (or randomized control trial; RCT) is a type of scientific (often medical) experiment that aims to reduce certain sources of bias when testing the effectiveness of new treatments; this is accomplished by randomly allocating subjects to two or more groups, treating them differently, and then comparing them with respect to a measured response. One group—the experimental group—receives the intervention being assessed, while the other—usually called the control group—receives an alternative treatment, such as a placebo or no intervention. The groups are monitored under conditions of the trial design to determine the effectiveness of the experimental intervention, and efficacy is assessed in comparison to the control. There may be more than one treatment group or more than one control group. Randomized controlled trial | Wikipedia
|
|
Scoring Deployed Models
YouTube search... ...Google search
|
|