Difference between revisions of "Datasets"

Revision as of 10:18, 9 January 2019

Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy in Data Science

Sources

Human in the Loop...
- Amazon Mechanical Turk (MTurk) - Using MTurk with Amazon SageMaker for Supervised Learning (ML)
- Gengo.ai - high-quality multilingual data with a human touch for machine learning
- Figure Eight CrowdFlower AI - build a state-of-the-art machine learning model trained with human labeled data

@@ Line 4: / Line 4: @@
 * [[Data Preprocessing & Feature Exploration]]
 * [[Hyperparameters]]
-* [[Privacy in Data Science]]
-* [http://www.quora.com/What-are-the-alternatives-to-CrowdFlower Human in the Loop...]
+Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about [http://news.google.com/topics/CAAqKAgKIiJDQkFTRXdvTkwyY3ZNVEZqYkd4cWMyMDNOQklDWlc0b0FBUAE Cambridge Analytica] highlights the importance of datasets and data collection.  Reference also: [[Privacy in Data Science]]
-** [http://www.mturk.com/ Amazon Mechanical Turk (MTurk)]  - [http://blog.mturk.com/using-mturk-with-amazon-sagemaker-for-supervised-learning-ml-bc30f94e1c0d Using MTurk with Amazon SageMaker for Supervised Learning (ML)]
-** [http://gengo.ai/ Gengo.ai] - high-quality multilingual data with a human touch for machine learning
-** [http://visit.figure-eight.com/crowdflower-ai-info-old.html Figure Eight CrowdFlower AI] - build a state-of-the-art machine learning model trained with human labeled data
-_________________________________________________________
+== Sources ==
+* [http://www.kaggle.com/datasets Kaggle Datasets]
+* [http://mlr.cs.umass.edu/ml/ UC Irvine Machine Learning Repository]
 * [http://yann.lecun.com/exdb/mnist/ MNIST database]
-* [http://www.kaggle.com/datasets Kaggle Datasets]
 * [http://registry.opendata.aws/ Registry of Open Data | on AWS]
 * [http://storage.googleapis.com/openimages/web/index.html Open Images | Google]
@@ Line 39: / Line 39: @@
 <youtube>KoA1lVRwHrc</youtube>
 <youtube>M1WQxTofGe8</youtube>
+* [http://www.quora.com/What-are-the-alternatives-to-CrowdFlower Human in the Loop...]
+** [http://www.mturk.com/ Amazon Mechanical Turk (MTurk)]  - [http://blog.mturk.com/using-mturk-with-amazon-sagemaker-for-supervised-learning-ml-bc30f94e1c0d Using MTurk with Amazon SageMaker for Supervised Learning (ML)]
+** [http://gengo.ai/ Gengo.ai] - high-quality multilingual data with a human touch for machine learning
+** [http://visit.figure-eight.com/crowdflower-ai-info-old.html Figure Eight CrowdFlower AI] - build a state-of-the-art machine learning model trained with human labeled data

Difference between revisions of "Datasets"

Revision as of 10:18, 9 January 2019

Sources

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools