Difference between revisions of "Datasets"

From
Jump to: navigation, search
Line 5: Line 5:
 
   
 
   
 
== Sources ==
 
== Sources ==
 +
 +
 
* [http://www.kaggle.com/datasets Kaggle Datasets]
 
* [http://www.kaggle.com/datasets Kaggle Datasets]
 
* [http://mlr.cs.umass.edu/ml/ UC Irvine Machine Learning Repository]  
 
* [http://mlr.cs.umass.edu/ml/ UC Irvine Machine Learning Repository]  
 +
** [http://archive.ics.uci.edu/ml/datasets.html Archive | UC Irvine Machine Learning Repository]
 
* [http://yann.lecun.com/exdb/mnist/ MNIST database]
 
* [http://yann.lecun.com/exdb/mnist/ MNIST database]
* [http://public.enigma.com/ Enigma Public]
+
* [http://datahub.io/collections Collections | DataHub]
 
* [http://registry.opendata.aws/ Registry of Open Data on AWS | Amazon]
 
* [http://registry.opendata.aws/ Registry of Open Data on AWS | Amazon]
 
* [http://www.google.com/publicdata/directory Public Data | Google]
 
* [http://www.google.com/publicdata/directory Public Data | Google]
Line 14: Line 17:
 
* [http://www.microsoft.com/en-us/research/academic-program/data-science-microsoft-research/ Data Science for Research | Microsoft]
 
* [http://www.microsoft.com/en-us/research/academic-program/data-science-microsoft-research/ Data Science for Research | Microsoft]
 
* [http://www.kdnuggets.com/datasets/index.html Datasets for Data Mining and Data Science | KDnuggets]
 
* [http://www.kdnuggets.com/datasets/index.html Datasets for Data Mining and Data Science | KDnuggets]
 +
* [http://public.enigma.com/ Enigma Public]
 +
* [http://dataportals.org/  A Comprehensive List of Open Data Portals from Around the World | DataPortals.org]
 +
* [http://www.opendatasoft.com/a-comprehensive-list-of-all-open-data-portals-around-the-world/ OpenDataSoft]
 +
* [http://knoema.com/atlas/sources World Data Atlas | Knoema]
 
* [http://www.openml.org/search?type=data The Open Machine Learning project | OpenML.org]
 
* [http://www.openml.org/search?type=data The Open Machine Learning project | OpenML.org]
* [http://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research Datasets | Wikipedia]
+
* [http://www.researchpipeline.com/mediawiki/index.php?title=Main_Page World's Free Online Data | Research Pipeline]
 +
* [http://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research List of datasets for machine learning research | Wikipedia]
 
* [http://resources.wolframcloud.com/NeuralNetRepository Neural Net Repository | Wolfram]
 
* [http://resources.wolframcloud.com/NeuralNetRepository Neural Net Repository | Wolfram]
 
* [http://deeplearning4j.org/opendata Open Data for Deep Learning & Machine Learning | 4j]
 
* [http://deeplearning4j.org/opendata Open Data for Deep Learning & Machine Learning | 4j]
 +
* [http://catalog.data.gov/dataset Data Catalog | Data.gov]
 
* [http://www.usgs.gov/news/us-geological-survey-and-us-department-energy-release-online-public-dataset-and-viewer-us-wind Wind Turbine Map and Database | USGS & DOE]
 
* [http://www.usgs.gov/news/us-geological-survey-and-us-department-energy-release-online-public-dataset-and-viewer-us-wind Wind Turbine Map and Database | USGS & DOE]
 
* [http://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart Autosomal DNA]
 
* [http://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart Autosomal DNA]
* [http://github.com/awesomedata/awesome-public-datasets#publicdomains PublicDomains - GitHub]
 
* [http://github.com/endgameinc/ember EMBER; benign and malicious Windows-portable executable files | Endgame]
 
 
* [http://host.robots.ox.ac.uk/pascal/VOC Pascal Visual Object Classes Challenge (VOC)]
 
* [http://host.robots.ox.ac.uk/pascal/VOC Pascal Visual Object Classes Challenge (VOC)]
 
* [http://open.nasa.gov/ OpenNASA]
 
* [http://open.nasa.gov/ OpenNASA]
Line 29: Line 36:
 
* [http://archive.org/details/datasets The Dataset Collection | Archive.org]
 
* [http://archive.org/details/datasets The Dataset Collection | Archive.org]
 
* [http://www.archive-it.org/explore?show=Collections Collections |Archive-it.org]
 
* [http://www.archive-it.org/explore?show=Collections Collections |Archive-it.org]
 
+
* [http://ec.europa.eu/eurostat/data/database Eurostat | EU statistical office]
 +
* [http://www.re3data.org/ Re3data]
 +
* [http://fairsharing.org/ Resource on data and metadata standards - open research data | FAIRsharing]
 +
* [http://www.quandl.com/ Financial and economic  | Quandl]
 +
** [http://www.quandl.com/alternative-data Alternative data | Quandl]
 +
* [http://github.com/awesomedata/awesome-public-datasets#publicdomains PublicDomains | GitHub]
 +
* [http://github.com/BuzzFeedNews/everything datasets and related content | BuzzFeed - GitHub]
 +
* [http://data.fivethirtyeight.com/ Sports, politics, economics, and other spheres of life | FiveThirtyEight]
 +
* [http://github.com/endgameinc/ember EMBER; benign and malicious Windows-portable executable files | Endgame - GitHub]
 +
* [http://www.reddit.com/r/datasets/ r/datasets | reddit]
  
 
== Articles ==
 
== Articles ==
 
* [http://gengo.ai/datasets/the-50-best-free-datasets-for-machine-learning/  The 50 Best Free Datasets for Machine Learning | Meiryum Ali - Gengo AI]
 
* [http://gengo.ai/datasets/the-50-best-free-datasets-for-machine-learning/  The 50 Best Free Datasets for Machine Learning | Meiryum Ali - Gengo AI]
 
* [http://medium.com/datadriveninvestor/the-50-best-public-datasets-for-machine-learning-d80e9f030279 The 50 Best Public Datasets for Machine Learning | Stacy Stanford - Medium] 
 
* [http://medium.com/datadriveninvestor/the-50-best-public-datasets-for-machine-learning-d80e9f030279 The 50 Best Public Datasets for Machine Learning | Stacy Stanford - Medium] 
 
+
* [http://www.altexsoft.com/blog/datascience/best-public-machine-learning-datasets/ Best Public Datasets for Machine Learning and Data Science: Sources and Advice on the Choice | Altexsoft]
  
  

Revision as of 11:15, 9 January 2019

YouTube search... ...Google search

Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy in Data Science

Sources

Articles