Difference between revisions of "Algorithm Administration"

From
Jump to: navigation, search
m
m
Line 38: Line 38:
 
** [http://www.wandb.com/ Weights and Biases] ...experiment tracking, model optimization, and dataset versioning
 
** [http://www.wandb.com/ Weights and Biases] ...experiment tracking, model optimization, and dataset versioning
 
** [http://feedback.azure.com/forums/906052-data-catalog How can we improve Azure Data Catalog?]
 
** [http://feedback.azure.com/forums/906052-data-catalog How can we improve Azure Data Catalog?]
 +
** [http://sigopt.com/ SigOpt] ...optimization platform and API designed to unlock the potential of modeling pipelines. This fully agnostic software solution accelerates, amplifies, and scales the model development process
 
* [http://getmanta.com/?gclid=CjwKCAjwsfreBRB9EiwAikSUHSSOxld0nZNyLNXmiPM43x7jEAgeTxkXRH_s5XPJlfTekPdO8N1Y1xoCKwwQAvD_BwE Automate your data lineage]
 
* [http://getmanta.com/?gclid=CjwKCAjwsfreBRB9EiwAikSUHSSOxld0nZNyLNXmiPM43x7jEAgeTxkXRH_s5XPJlfTekPdO8N1Y1xoCKwwQAvD_BwE Automate your data lineage]
 
* [http://www.information-age.com/benefiting-ai-data-management-123471564/ Benefiting from AI: A different approach to data management is needed]
 
* [http://www.information-age.com/benefiting-ai-data-management-123471564/ Benefiting from AI: A different approach to data management is needed]
Line 184: Line 185:
 
|}
 
|}
 
|}<!-- B -->
 
|}<!-- B -->
 +
 +
= <span id="Hyperparameter"></span>Hyperparameter =
 +
[http://www.youtube.com/results?search_query=hyperparameters+deep+learning+tuning+optimization YouTube search...]
 +
[http://www.google.com/search?q=hyperparameters+optimization+deep+machine+learning+ML ...Google search]
 +
 +
* [[Gradient Descent Optimization & Challenges]]
 +
* [http://cloud.google.com/ml-engine/docs/tensorflow/using-hyperparameter-tuning Using TensorFlow Tuning]
 +
* [http://towardsdatascience.com/understanding-hyperparameters-and-its-optimisation-techniques-f0debba07568 Understanding Hyperparameters and its Optimisation techniques | Prabhu - Towards Data Science]
 +
* [http://nanonets.com/blog/hyperparameter-optimization/ How To Make Deep Learning Models That Don’t Suck | Ajay Uppili Arasanipalai]
 +
 +
In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training. Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. [http://en.wikipedia.org/wiki/Hyperparameter_(machine_learning) Hyperparameter (machine learning) | Wikipedia]
 +
 +
Machine learning algorithms train on data to find the best set of weights for each independent variable that affects the predicted value or class. The algorithms themselves have variables, called hyperparameters. They’re called hyperparameters, as opposed to parameters, because they control the operation of the algorithm rather than the weights being determined. The most important hyperparameter is often the learning rate, which determines the step size used when finding the next set of weights to try when optimizing. If the learning rate is too high, the gradient descent may quickly converge on a plateau or suboptimal point. If the learning rate is too low, the gradient descent may stall and never completely converge. Many other common hyperparameters depend on the algorithms used. Most algorithms have stopping parameters, such as the maximum number of epochs, or the maximum time to run, or the minimum improvement from epoch to epoch. Specific algorithms have hyperparameters that control the shape of their search. For example, a [[Random Forest (or) Random Decision Forest]] Classifier has hyperparameters for minimum samples per leaf, max depth, minimum samples at a split, minimum weight fraction for a leaf, and about 8 more. [http://www.infoworld.com/article/3394399/machine-learning-algorithms-explained.html Machine learning algorithms explained | Martin Heller - InfoWorld]
 +
 +
http://nanonets.com/blog/content/images/2019/03/HPO1.png
 +
 +
==== Hyperparameter tuning ====
 +
Hyperparameters are the variables that govern the training process. Your model parameters are optimized (you could say "tuned") by the training process: you run data through the operations of the model, compare the resulting prediction with the actual value for each data instance, evaluate the accuracy, and adjust until you find the best combination to handle the problem.
 +
 +
These algorithms automatically adjust (learn) their internal parameters based on data. However, there is a subset of parameters that is not learned and that have to be configured by an expert. Such parameters are often referred to as “hyperparameters” — and they have a big impact ...For example, the tree depth in a decision tree model and the number of layers in an artificial neural network are typical hyperparameters. The performance of a model can drastically depend on the choice of its hyperparameters. [http://thenextweb.com/podium/2019/11/11/machine-learning-algorithms-and-the-art-of-hyperparameter-selection/ Machine learning algorithms and the art of hyperparameter selection - A review of four optimization strategies | Mischa Lisovyi and Rosaria Silipo - TNW]
 +
 +
There are four commonly used optimization strategies for hyperparameters:
 +
# Bayesian optimization
 +
# Grid search
 +
# Random search
 +
# Hill climbing
 +
 +
Bayesian optimization tends to be the most efficient. You would think that tuning as many hyperparameters as possible would give you the best answer. However, unless you are running on your own personal hardware, that could be very expensive. There are diminishing returns, in any case. With experience, you’ll discover which hyperparameters matter the most for your data and choice of algorithms.  [http://www.infoworld.com/article/3394399/machine-learning-algorithms-explained.html Machine learning algorithms explained | Martin Heller - InfoWorld]
 +
 +
Hyperparameter Optimization libraries:
 +
* [https://github.com/maxim5/hyper-engine hyper-engine - Gaussian Process Bayesian optimization and some other techniques, like learning curve prediction]
 +
* [http://ray.readthedocs.io/en/latest/tune.html Ray Tune: Hyperparameter Optimization Framework]
 +
* [http://sigopt.com/ SigOpt’s API tunes your model’s parameters through state-of-the-art Bayesian optimization]
 +
* [http://github.com/hyperopt/hyperopt hyperopt; Distributed Asynchronous Hyperparameter Optimization in Python - random search and tree of parzen estimators optimization.]
 +
* [http://scikit-optimize.github.io/#skopt.Optimizer Scikit-Optimize, or skopt - Gaussian process Bayesian optimization]
 +
* [http://github.com/polyaxon/polyaxon polyaxon]
 +
* [http://github.com/SheffieldML/GPyOpt GPyOpt; Gaussian Process Optimization]
 +
 +
Tuning:
 +
* Optimizer type
 +
* Learning rate (fixed or not)
 +
* Epochs
 +
* Regularization rate (or not)
 +
* Type of Regularization - L1, L2, ElasticNet
 +
* Search type for local minima
 +
** Gradient descent
 +
** Simulated
 +
** Annealing
 +
** Evolutionary
 +
* Decay rate (or not)
 +
* Momentum (fixed or not)
 +
* Nesterov Accelerated Gradient momentum (or not)
 +
* Batch size
 +
* Fitness measurement type
 +
** MSE, accuracy, MAE, [[Cross-Entropy Loss]]
 +
** Precision, recall
 +
 +
* Stop criteria
 +
 +
<youtube>oaxf3rk0KGM</youtube>
 +
<youtube>wKkcBPp3F1Y</youtube>
 +
<youtube>giBAxWeuysM</youtube>
 +
<youtube>WYLoNEcVeZo</youtube>
 +
<youtube>ttE0F7fghfk</youtube>
 +
 +
=== Automatic Hyperparameter Tuning ===
 +
* [[Automated Machine Learning (AML) - AutoML]]
 +
 +
Several production machine-learning platforms now offer automatic hyperparameter tuning. Essentially, you tell the system what hyperparameters you want to vary, and possibly what metric you want to optimize, and the system sweeps those hyperparameters across as many runs as you allow. ([[Google Cloud]] hyperparameter tuning extracts the appropriate metric from the TensorFlow model, so you don’t have to specify it.) 
 +
 +
<youtube>ynYnZywayC4</youtube>
 +
<youtube>mSvw0TfxqDo</youtube>

Revision as of 15:15, 27 September 2020

YouTube search... Quora search... ...Google search

Master Data Management (MDM)

Feature Store / Data Lineage / Data Catalog

How is AI changing the game for Master Data Management?
Tony Brownlee talks about the ability to inspect and find data quality issues as one of several ways cognitive computing technology is influencing master data management.

Introducing Roxie. Data Management Meets Artificial Intelligence.
Introducing Roxie, Rubrik's Intelligent Personal Assistant. A hackathon project by Manjunath Chinni. Created in 10 hours with the power of Rubrik APIs.

DAS Webinar: Master Data Management – Aligning Data, Process, and Governance
Getting MDM “right” requires a strategic mix of Data Architecture, business process, and Data Governance.

IBM MDM Feature Spotlight: Machine learning-assisted Data Stewardship
This three minute overview shows the benefits of using machine learning models trained by a clients' own data stewards to facilitate faster resolution of pending clerical tasks in IBM Master Data Management Standard Edition.

Better Machine Learning Outcomes rely on Modern Data Management
Tarun Batra, CEO, LumenData, talks about how the movement towards artificial intelligence and machine learning relies on a Modern Data Management platform that is able to correlate large amounts of data, and provide a reliable data foundation for machine learning algorithms to deliver better business outcomes. In this video, Tarun discusses: Key industry trends driving Modern Data Management, Data management best practices, Creating joint value for customers "There is a lot of movement towards artificial intelligence and machine learning as being the next big domain that organizations are focusing on. With data volumes continuing to increase, and the velocity of change of data, decisions have to be made in an automated, data-driven fashion for organizations to remain competitive. Machine learning can predict and recommend actions, but a reliable data foundation through MDM that continuously manages and ensures data quality is essential for machine learning algorithms to create accurate, meaningful insight." - Tarun Batra

How to manage Artificial Intelligence Data Collection [Enterprise AI Governance Data Management ]
Mind Data AI AI researcher Brian Ka Chan's AI ML DL introduction series. Collecting Data is an important step to the success of Artificial intelligence Program in the 4th industrial Revolution. In the current advancement of Artificial Intelligence technologies, machine learning has always been associated with AI, and in many cases, Machine Learning is considered equivalent of Artifical Intelligence. Machine learning is actually a subset of Artificial Intelligence, this discipline of machine learning relies on data to perform AI training, supervised or unsupervised. On average, 80% of the time that my team spent in AI or Data Sciences projects is about preparing data. Preparing data includes, but not limited to: Identify Data required, Identify the availability of data, and location of them, Profiling the data, Source the data, Integrating the data, Cleanse the data, and prepare the data for learning

What is Data Governance?
Understand what problems a Data Governance program is intended to solve and why the Business Users must own it. Also learn some sample roles that each group might need to play.

Top 10 Mistakes in Data Management
Come learn about the mistakes we most often see organizations make in managing their data. Also learn more about Intricity's Data Management Health Check which you can download here: http://www.intricity.com/intricity101/ To Talk with a Specialist go to: http://www.intricity.com/intricity101/ www.intricity.com


Versioning

How to manage model and data versions
Raj Ramesh Managing data versions and model versions is critical in deploying machine learning models. This is because if you want to re-create the models or go back to fix them, you will need both the data that went into training the model and as well as the model hyperparameters itself. In this video I explained that concept. Here's what I can do to help you. I speak on the topics of architecture and AI, help you integrate AI into your organization, educate your team on what AI can or cannot do, and make things simple enough that you can take action from your new knowledge. I work with your organization to understand the nuances and challenges that you face, and together we can understand, frame, analyze, and address challenges in a systematic way so you see improvement in your overall business, is aligned with your strategy, and most importantly, you and your organization can incrementally change to transform and thrive in the future. If any of this sounds like something you might need, please reach out to me at dr.raj.ramesh@topsigma.com, and we'll get back in touch within a day. Thanks for watching my videos and for subscribing. www.topsigma.com www.linkedin.com/in/rajramesh

Version Control for Data Science Explained in 5 Minutes (No Code!)
In this code-free, five-minute explainer for complete beginners, we'll teach you about Data Version Control (DVC), a tool for adapting Git version control to machine learning projects.

- Why data science and machine learning badly need tools for versioning - Why Git version control alone will fall short - How DVC helps you use Git with big datasets and models - Cool features in DVC, like metrics, pipelines, and plots

Check out the DVC open source project on GitHub: http://github.com/iterative/dvc

How to easily set up and version your Machine Learning pipelines, using Data Version Control (DVC) and Machine Learning Versioning (MLV)-tools | PyData Amsterdam 2019
Stephanie Bracaloni, Sarah Diot-Girard Have you ever heard about Machine Learning versioning solutions? Have you ever tried one of them? And what about automation? Come with us and learn how to easily build versionable pipelines! This tutorial explains through small exercises how to setup a project using DVC and MLV-tools. www.pydata.org

Alessia Marcolini: Version Control for Data Science | PyData Berlin 2019
Track:PyData Are you versioning your Machine Learning project as you would do in a traditional software project? How are you keeping track of changes in your datasets? Recorded at the PyConDE & PyData Berlin 2019 conference. http://pycon.de

Introduction to Pachyderm
Joey Zwicker A high-level introduction to the core concepts and features of Pachyderm as well as a quick demo. Learn more at: pachyderm.io github.com/pachyderm/pachyderm docs.pachyderm.io

E05 Pioneering version control for data science with Pachyderm co-founder and CEO Joe Doliner
5 years ago, Joe Doliner and his co-founder Joey Zwicker decided to focus on the hard problems in data science, rather than building just another dashboard on top of the existing mess. It's been a long road, but it's really payed off. Last year, after an adventurous journey, they closed a $10m Series A led by Benchmark. In this episode, Erasmus Elsner is joined by Joe Doliner to explore what Pachyderm does and how it scaled from just an idea into a fast growing tech company. Listen to the podcast version http://apple.co/2W2g0nV

Hyperparameter

YouTube search... ...Google search

In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training. Different model training algorithms require different hyperparameters, some simple algorithms (such as ordinary least squares regression) require none. Given these hyperparameters, the training algorithm learns the parameters from the data. Hyperparameter (machine learning) | Wikipedia

Machine learning algorithms train on data to find the best set of weights for each independent variable that affects the predicted value or class. The algorithms themselves have variables, called hyperparameters. They’re called hyperparameters, as opposed to parameters, because they control the operation of the algorithm rather than the weights being determined. The most important hyperparameter is often the learning rate, which determines the step size used when finding the next set of weights to try when optimizing. If the learning rate is too high, the gradient descent may quickly converge on a plateau or suboptimal point. If the learning rate is too low, the gradient descent may stall and never completely converge. Many other common hyperparameters depend on the algorithms used. Most algorithms have stopping parameters, such as the maximum number of epochs, or the maximum time to run, or the minimum improvement from epoch to epoch. Specific algorithms have hyperparameters that control the shape of their search. For example, a Random Forest (or) Random Decision Forest Classifier has hyperparameters for minimum samples per leaf, max depth, minimum samples at a split, minimum weight fraction for a leaf, and about 8 more. Machine learning algorithms explained | Martin Heller - InfoWorld

HPO1.png

Hyperparameter tuning

Hyperparameters are the variables that govern the training process. Your model parameters are optimized (you could say "tuned") by the training process: you run data through the operations of the model, compare the resulting prediction with the actual value for each data instance, evaluate the accuracy, and adjust until you find the best combination to handle the problem.

These algorithms automatically adjust (learn) their internal parameters based on data. However, there is a subset of parameters that is not learned and that have to be configured by an expert. Such parameters are often referred to as “hyperparameters” — and they have a big impact ...For example, the tree depth in a decision tree model and the number of layers in an artificial neural network are typical hyperparameters. The performance of a model can drastically depend on the choice of its hyperparameters. Machine learning algorithms and the art of hyperparameter selection - A review of four optimization strategies | Mischa Lisovyi and Rosaria Silipo - TNW

There are four commonly used optimization strategies for hyperparameters:

  1. Bayesian optimization
  2. Grid search
  3. Random search
  4. Hill climbing

Bayesian optimization tends to be the most efficient. You would think that tuning as many hyperparameters as possible would give you the best answer. However, unless you are running on your own personal hardware, that could be very expensive. There are diminishing returns, in any case. With experience, you’ll discover which hyperparameters matter the most for your data and choice of algorithms. Machine learning algorithms explained | Martin Heller - InfoWorld

Hyperparameter Optimization libraries:

Tuning:

  • Optimizer type
  • Learning rate (fixed or not)
  • Epochs
  • Regularization rate (or not)
  • Type of Regularization - L1, L2, ElasticNet
  • Search type for local minima
    • Gradient descent
    • Simulated
    • Annealing
    • Evolutionary
  • Decay rate (or not)
  • Momentum (fixed or not)
  • Nesterov Accelerated Gradient momentum (or not)
  • Batch size
  • Fitness measurement type
  • Stop criteria

Automatic Hyperparameter Tuning

Several production machine-learning platforms now offer automatic hyperparameter tuning. Essentially, you tell the system what hyperparameters you want to vary, and possibly what metric you want to optimize, and the system sweeps those hyperparameters across as many runs as you allow. (Google Cloud hyperparameter tuning extracts the appropriate metric from the TensorFlow model, so you don’t have to specify it.)