Difference between revisions of "Algorithm Administration"
m (→Versioning) |
m (→Versioning) |
||
| Line 172: | Line 172: | ||
</b><br>5 years ago, Joe Doliner and his co-founder Joey Zwicker decided to focus on the hard problems in data science, rather than building just another dashboard on top of the existing mess. It's been a long road, but it's really payed off. Last year, after an adventurous journey, they closed a $10m Series A led by Benchmark. In this episode, Erasmus Elsner is joined by Joe Doliner to explore what Pachyderm does and how it scaled from just an idea into a fast growing tech company. Listen to the podcast version | </b><br>5 years ago, Joe Doliner and his co-founder Joey Zwicker decided to focus on the hard problems in data science, rather than building just another dashboard on top of the existing mess. It's been a long road, but it's really payed off. Last year, after an adventurous journey, they closed a $10m Series A led by Benchmark. In this episode, Erasmus Elsner is joined by Joe Doliner to explore what Pachyderm does and how it scaled from just an idea into a fast growing tech company. Listen to the podcast version | ||
http://apple.co/2W2g0nV | http://apple.co/2W2g0nV | ||
| + | |} | ||
| + | |}<!-- B --> | ||
| + | |||
| + | = <span id="Data Quality"></span>Data Quality = | ||
| + | * [http://greatexpectations.io/ Great Expectations] ...helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. | ||
| + | {|<!-- T --> | ||
| + | | valign="top" | | ||
| + | {| class="wikitable" style="width: 550px;" | ||
| + | || | ||
| + | <youtube>aUGCxTgvFf0</youtube> | ||
| + | <b>Testing and Documenting Your Data Doesn't Have to Suck | Superconductive | ||
| + | </b><br>Data teams everywhere struggle with pipeline debt: untested, undocumented assumptions that drain productivity, erode trust in data and kill team morale. Unfortunately, rolling your own data validation tooling usually takes weeks or months. In addition, most teams suffer from “documentation rot,” where data documentation is hard to maintain, and therefore chronically outdated, incomplete, and only semi-trusted. Great Expectations - http://bit.ly/2OtmY1W, the leading open source project for fighting pipeline debt, can solve these problems for you. We're excited to share new features and under-the-hood architecture with the data community. ABOUT THE SPEAKER | ||
| + | Abe Gong is a core contributor to the Great Expectations open source library, and CEO and Co-founder at Superconductive. Prior to Superconductive, Abe was Chief Data Officer at Aspire Health, the founding member of the Jawbone data science team, and lead data scientist at Massive Health. Abe has been leading teams using data and technology to solve problems in health care, consumer wellness, and public policy for over a decade. Abe earned his PhD at the University of Michigan in Public Policy, Political Science, and Complex Systems. He speaks and writes regularly on data, healthcare, and data ethics. | ||
| + | |} | ||
| + | |<!-- M --> | ||
| + | | valign="top" | | ||
| + | {| class="wikitable" style="width: 550px;" | ||
| + | || | ||
| + | <youtube>DRGajth6OO4</youtube> | ||
| + | <b>"Data Quality Check In Machine Learning" | ||
| + | </b><br>The world of data quality check in Machine Learning is expanding at an unimaginable pace. Researchers estimate that by 2020, every human would create 1.7MB of information each second. The true power of data can be unlocked when it is refined and transformed into a high quality state where we can realize its true potential. Many businesses and researchers believe that data quality is one of the primary concerns for data-driven enterprises and associated processes considering the pace of data growth. Most of the operational processes and analytics rely on good quality data for being efficient and consistent in output.Data quality process has evolved in its capacity but the demand for pace and efficiency has been proliferating extensively. Data management experts believe that data quality remains a bottleneck that creeps repeatedly to bother the data management and business fraternity due to proliferating data volumes and the complexity involved to derive quality insights. Innovative technologies such as Big Data, AI, ML etc.ML algorithms can learn from human decision labels in the training datasets and replicate the scenarios in real-time. However, ML algorithms are also prone to biases that may reflect in these data sets and are learnt through fresh data sets. These biases could lead to erosion of data quality. External validity testing and audits on a regular basis will help in avoiding such situations. | ||
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
Revision as of 03:16, 19 September 2020
YouTube search... Quora search... ...Google search
- AI Governance
- Data Science
- Managed Vocabularies
- Datasets
- Benchmarks
- Batch Norm(alization) & Standardization
- Data Preprocessing
- Data Encoding
- Data Cleaning
- Feature Exploration/Learning
- Data Interoperability
- Data Augmentation, Data Labeling, and Auto-Tagging
- Imbalanced Data
- Privacy in Data Science
- Bias and Variances
- Excel - Data Analysis
- Data Science
- Hyperparameters
- Automated Machine Learning (AML) - AutoML
- Visualization
- Evaluation
- alteryx: Feature Labs, Featuretools
- How can we improve Azure Data Catalog?
- Automate your data lineage
- Benefiting from AI: A different approach to data management is needed
- Git - GitHub and GitLab
- Global Community for Artificial Intelligence (AI) in Master Data Management (MDM) | Camelot Management Consultants
|
|
|
|
|
|
|
|
Versioning
|
|
|
|
|
|
Data Quality
- Great Expectations ...helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
|
|