Difference between revisions of "Datasets"
| Line 4: | Line 4: | ||
* [[Data Preprocessing & Feature Exploration]] | * [[Data Preprocessing & Feature Exploration]] | ||
* [[Hyperparameters]] | * [[Hyperparameters]] | ||
| − | |||
| − | + | Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about [http://news.google.com/topics/CAAqKAgKIiJDQkFTRXdvTkwyY3ZNVEZqYkd4cWMyMDNOQklDWlc0b0FBUAE Cambridge Analytica] highlights the importance of datasets and data collection. Reference also: [[Privacy in Data Science]] | |
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | == Sources == | |
| + | * [http://www.kaggle.com/datasets Kaggle Datasets] | ||
| + | * [http://mlr.cs.umass.edu/ml/ UC Irvine Machine Learning Repository] | ||
* [http://yann.lecun.com/exdb/mnist/ MNIST database] | * [http://yann.lecun.com/exdb/mnist/ MNIST database] | ||
| − | |||
* [http://registry.opendata.aws/ Registry of Open Data | on AWS] | * [http://registry.opendata.aws/ Registry of Open Data | on AWS] | ||
* [http://storage.googleapis.com/openimages/web/index.html Open Images | Google] | * [http://storage.googleapis.com/openimages/web/index.html Open Images | Google] | ||
| Line 39: | Line 39: | ||
<youtube>KoA1lVRwHrc</youtube> | <youtube>KoA1lVRwHrc</youtube> | ||
<youtube>M1WQxTofGe8</youtube> | <youtube>M1WQxTofGe8</youtube> | ||
| + | |||
| + | * [http://www.quora.com/What-are-the-alternatives-to-CrowdFlower Human in the Loop...] | ||
| + | ** [http://www.mturk.com/ Amazon Mechanical Turk (MTurk)] - [http://blog.mturk.com/using-mturk-with-amazon-sagemaker-for-supervised-learning-ml-bc30f94e1c0d Using MTurk with Amazon SageMaker for Supervised Learning (ML)] | ||
| + | ** [http://gengo.ai/ Gengo.ai] - high-quality multilingual data with a human touch for machine learning | ||
| + | ** [http://visit.figure-eight.com/crowdflower-ai-info-old.html Figure Eight CrowdFlower AI] - build a state-of-the-art machine learning model trained with human labeled data | ||
Revision as of 10:18, 9 January 2019
YouTube search... ...Google search
Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy in Data Science
Sources
- Kaggle Datasets
- UC Irvine Machine Learning Repository
- MNIST database
- Registry of Open Data | on AWS
- Open Images | Google
- The Open Machine Learning project | OpenML.org
- Datasets | Wikipedia
- Neural Net Repository | Wolfram
- Open Data for Deep Learning & Machine Learning | 4j
- Wind Turbine Map and Database | USGS & DOE
- Autosomal DNA
- EMBER; benign and malicious Windows-portable executable files | Endgame
- Pascal Visual Object Classes Challenge (VOC)
- OpenNASA
- Human in the Loop...
- Amazon Mechanical Turk (MTurk) - Using MTurk with Amazon SageMaker for Supervised Learning (ML)
- Gengo.ai - high-quality multilingual data with a human touch for machine learning
- Figure Eight CrowdFlower AI - build a state-of-the-art machine learning model trained with human labeled data