Revision as of 12:26, 19 September 2020

YouTube search... Quora search... ...Google search

How is AI changing the game for Master Data Management? Tony Brownlee talks about the ability to inspect and find data quality issues as one of several ways cognitive computing technology is influencing master data management.

Introducing Roxie. Data Management Meets Artificial Intelligence. Introducing Roxie, Rubrik's Intelligent Personal Assistant. A hackathon project by Manjunath Chinni. Created in 10 hours with the power of Rubrik APIs.

DAS Webinar: Master Data Management – Aligning Data, Process, and Governance Getting MDM “right” requires a strategic mix of Data Architecture, business process, and Data Governance.

IBM MDM Feature Spotlight: Machine learning-assisted Data Stewardship This three minute overview shows the benefits of using machine learning models trained by a clients' own data stewards to facilitate faster resolution of pending clerical tasks in IBM Master Data Management Standard Edition.

Better Machine Learning Outcomes rely on Modern Data Management Tarun Batra, CEO, LumenData, talks about how the movement towards artificial intelligence and machine learning relies on a Modern Data Management platform that is able to correlate large amounts of data, and provide a reliable data foundation for machine learning algorithms to deliver better business outcomes. In this video, Tarun discusses: Key industry trends driving Modern Data Management, Data management best practices, Creating joint value for customers "There is a lot of movement towards artificial intelligence and machine learning as being the next big domain that organizations are focusing on. With data volumes continuing to increase, and the velocity of change of data, decisions have to be made in an automated, data-driven fashion for organizations to remain competitive. Machine learning can predict and recommend actions, but a reliable data foundation through MDM that continuously manages and ensures data quality is essential for machine learning algorithms to create accurate, meaningful insight." - Tarun Batra

How to manage Artificial Intelligence Data Collection [Enterprise AI Governance Data Management ] Mind Data AI AI researcher Brian Ka Chan's AI ML DL introduction series. Collecting Data is an important step to the success of Artificial intelligence Program in the 4th industrial Revolution. In the current advancement of Artificial Intelligence technologies, machine learning has always been associated with AI, and in many cases, Machine Learning is considered equivalent of Artifical Intelligence. Machine learning is actually a subset of Artificial Intelligence, this discipline of machine learning relies on data to perform AI training, supervised or unsupervised. On average, 80% of the time that my team spent in AI or Data Sciences projects is about preparing data. Preparing data includes, but not limited to: Identify Data required, Identify the availability of data, and location of them, Profiling the data, Source the data, Integrating the data, Cleanse the data, and prepare the data for learning

What is Data Governance? Understand what problems a Data Governance program is intended to solve and why the Business Users must own it. Also learn some sample roles that each group might need to play.

Top 10 Mistakes in Data Management Come learn about the mistakes we most often see organizations make in managing their data. Also learn more about Intricity's Data Management Health Check which you can download here: http://www.intricity.com/intricity101/ To Talk with a Specialist go to: http://www.intricity.com/intricity101/ www.intricity.com

Versioning

How to manage model and data versions Raj Ramesh Managing data versions and model versions is critical in deploying machine learning models. This is because if you want to re-create the models or go back to fix them, you will need both the data that went into training the model and as well as the model hyperparameters itself. In this video I explained that concept. Here's what I can do to help you. I speak on the topics of architecture and AI, help you integrate AI into your organization, educate your team on what AI can or cannot do, and make things simple enough that you can take action from your new knowledge. I work with your organization to understand the nuances and challenges that you face, and together we can understand, frame, analyze, and address challenges in a systematic way so you see improvement in your overall business, is aligned with your strategy, and most importantly, you and your organization can incrementally change to transform and thrive in the future. If any of this sounds like something you might need, please reach out to me at dr.raj.ramesh@topsigma.com, and we'll get back in touch within a day. Thanks for watching my videos and for subscribing. www.topsigma.com www.linkedin.com/in/rajramesh

Version Control for Data Science Explained in 5 Minutes (No Code!) In this code-free, five-minute explainer for complete beginners, we'll teach you about Data Version Control (DVC), a tool for adapting Git version control to machine learning projects. - Why data science and machine learning badly need tools for versioning - Why Git version control alone will fall short - How DVC helps you use Git with big datasets and models - Cool features in DVC, like metrics, pipelines, and plots Check out the DVC open source project on GitHub: http://github.com/iterative/dvc

How to easily set up and version your Machine Learning pipelines, using Data Version Control (DVC) and Machine Learning Versioning (MLV)-tools \| PyData Amsterdam 2019 Stephanie Bracaloni, Sarah Diot-Girard Have you ever heard about Machine Learning versioning solutions? Have you ever tried one of them? And what about automation? Come with us and learn how to easily build versionable pipelines! This tutorial explains through small exercises how to setup a project using DVC and MLV-tools. www.pydata.org

Alessia Marcolini: Version Control for Data Science \| PyData Berlin 2019 Track:PyData Are you versioning your Machine Learning project as you would do in a traditional software project? How are you keeping track of changes in your datasets? Recorded at the PyConDE & PyData Berlin 2019 conference. http://pycon.de

Introduction to Pachyderm Joey Zwicker A high-level introduction to the core concepts and features of Pachyderm as well as a quick demo. Learn more at: pachyderm.io github.com/pachyderm/pachyderm docs.pachyderm.io

E05 Pioneering version control for data science with Pachyderm co-founder and CEO Joe Doliner 5 years ago, Joe Doliner and his co-founder Joey Zwicker decided to focus on the hard problems in data science, rather than building just another dashboard on top of the existing mess. It's been a long road, but it's really payed off. Last year, after an adventurous journey, they closed a $10m Series A led by Benchmark. In this episode, Erasmus Elsner is joined by Joe Doliner to explore what Pachyderm does and how it scaled from just an idea into a fast growing tech company. Listen to the podcast version http://apple.co/2W2g0nV

@@ Line 172: / Line 172: @@
 </b><br>5 years ago, Joe Doliner and his co-founder Joey Zwicker decided to focus on the hard problems in data science, rather than building just another dashboard on top of the existing mess. It's been a long road, but it's really payed off. Last year, after an adventurous journey, they  closed a $10m Series A led by Benchmark. In this episode, Erasmus Elsner is joined by Joe Doliner to explore what Pachyderm does and how it scaled from just an idea into a fast growing tech company. Listen to the podcast version
 http://apple.co/2W2g0nV
-|}
-|}<!-- B -->
-= <span id="Data Quality"></span>Data Quality =
-* [http://greatexpectations.io/ Great Expectations]  ...helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
-{|<!-- T -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>aUGCxTgvFf0</youtube>
-<b>Testing and Documenting Your Data Doesn't Have to Suck | Superconductive
-</b><br>Data teams everywhere struggle with pipeline debt: untested, undocumented assumptions that drain productivity, erode trust in data and kill team morale. Unfortunately, rolling your own data validation tooling usually takes weeks or months. In addition, most teams suffer from “documentation rot,” where data documentation is hard to maintain, and therefore chronically outdated, incomplete, and only semi-trusted. Great Expectations - http://bit.ly/2OtmY1W, the leading open source project for fighting pipeline debt, can solve these problems for you. We're excited to share new features and under-the-hood architecture with the data community.  ABOUT THE SPEAKER
-Abe Gong is a core contributor to the Great Expectations open source library, and CEO and Co-founder at Superconductive.  Prior to Superconductive, Abe was Chief Data Officer at Aspire Health, the founding member of the Jawbone data science team, and lead data scientist at Massive Health.  Abe has been leading teams using data and technology to solve problems in health care, consumer wellness, and public policy for over a decade. Abe earned his PhD at the University of Michigan in Public Policy, Political Science, and Complex Systems. He speaks and writes regularly on data, healthcare, and data ethics.
-|}
-|<!-- M -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>DRGajth6OO4</youtube>
-<b>"Data Quality Check In Machine Learning"
-</b><br>The world of data quality check in Machine Learning is expanding at an unimaginable pace. Researchers estimate that by 2020, every human would create 1.7MB of information each second. The true power of data can be unlocked when it is refined and transformed into a high quality state where we can realize its true potential. Many businesses and researchers believe that data quality is one of the primary concerns for data-driven enterprises and associated processes considering the pace of data growth. Most of the operational processes and analytics rely on good quality data for being efficient and consistent in output.Data quality process has evolved in its capacity but the demand for pace and efficiency has been proliferating extensively. Data management experts believe that data quality remains a bottleneck that creeps repeatedly to bother the data management and business fraternity due to proliferating data volumes and the complexity involved to derive quality insights. Innovative technologies such as Big Data, AI, ML etc.ML algorithms can learn from human decision labels in the training datasets and replicate the scenarios in real-time. However, ML algorithms are also prone to biases that may reflect in these data sets and are learnt through fresh data sets. These biases could lead to erosion of data quality. External validity testing and audits on a regular basis will help in avoiding such situations.
-|}
-|}<!-- B -->
-{|<!-- T -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>t7vHpA39TXM</youtube>
-<b>An Approach to Data Quality for Netflix Personalization Systems
-</b><br>Personalization is one of the key pillars of Netflix as it enables each member to experience the vast collection of content tailored to their interests. Our personalization system is powered by several machine learning models. These models are only as good as the data that is fed to them. They are trained using hundreds of terabytes of data everyday, that make it a non-trivial challenge to track and maintain data quality. To ensure high data quality, we require three things: automated monitoring of data; visualization to observe changes in the metrics over time; and mechanisms to control data related regressions, wherein a data regression is defined as data loss or distributional shifts over a given period of time. In this talk, we will describe infrastructure and methods that we used to achieve the above: – ‘Swimlanes’ that help us define data boundaries for different environments that are used to develop, evaluate and deploy ML models, – Pipelines that aggregate data metrics from various sources within each swimlane – Time series and dashboard visualization tools across an atypically larger period of time – Automated audits that periodically monitor these metrics to detect data regressions. We will explain how we run aggregation jobs to optimize metric computations, SQL queries to quickly define/test individual metrics and other ETL jobs to power the visualization/audits tools using Spark.’ About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Connect with us: Website: http://databricks.com  [[Facebook]]: http://www.facebook.com/databricksinc
-|}
-|<!-- M -->
-| valign="top" |
-{| class="wikitable" style="width: 550px;"
-||
-<youtube>ID2</youtube>
-<b>HH2
-</b><br>BB2
 |}
 |}<!-- B -->

Difference between revisions of "Algorithm Administration"

Revision as of 12:26, 19 September 2020

Versioning

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools