Difference between revisions of "Data Preprocessing"
| Line 15: | Line 15: | ||
* [http://sci2s.ugr.es/most-influential-preprocessing Most Influential Data Preprocessing Algorithms | S. García, J. Luengo, F. Herrera] | * [http://sci2s.ugr.es/most-influential-preprocessing Most Influential Data Preprocessing Algorithms | S. García, J. Luengo, F. Herrera] | ||
* [http://www.kdnuggets.com/2019/05/fix-unbalanced-dataset.html How to fix an Unbalanced Dataset | Will Badr - Amazon Web Services] | * [http://www.kdnuggets.com/2019/05/fix-unbalanced-dataset.html How to fix an Unbalanced Dataset | Will Badr - Amazon Web Services] | ||
| + | * [http://docs.aws.amazon.com/machine-learning/latest/dg/creating-and-using-datasources.html Creating and Using Datasources | AWS] | ||
* [[Datasets]] | * [[Datasets]] | ||
* [[Imbalanced Data]] | * [[Imbalanced Data]] | ||
Revision as of 07:01, 2 June 2020
YouTube search... ...Google search
- Data Cleaning
- Sklearn.preprocessing
- The Passenger Screening Kaggle challenge 1st place solution was won in part due to data preparation/generation.
- Data Pre Processing Techniques You Should Know | Maneesha Rajaratne - Towards Data Science
- Machine Learning(ML) — Data Preprocessing | Raji Adam Bifola
- Most Influential Data Preprocessing Algorithms | S. García, J. Luengo, F. Herrera
- How to fix an Unbalanced Dataset | Will Badr - Amazon Web Services
- Creating and Using Datasources | AWS
- Datasets
- Imbalanced Data
- Data Encoding
- Batch Norm(alization) & Standardization
- Feature Exploration/Learning
- Hyperparameters
- Data Augmentation, Data Labeling, and Auto-Tagging
- Visualization
- Python
- Master Data Management (MDM) / Feature Store / Data Lineage / Data Catalog
- Jon Tupitza Famous Jupyter Notebooks:
- The COVID Tracking Project - software used
Splitting Data - training and testing sets
Time-Series Data
- Time-based Algorithms
- A Comparison of Time Series Databases and Netsil’s Use of Druid | Netsil
- Microsoft announces the general availability of Azure Time Series Insights | Ryan Waite - Microsoft
- Top 10 Time Series Databases | Outlyer
SQL Database Optimization