Difference between revisions of "Datasets"

From
Jump to: navigation, search
(44 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
{{#seo:
 +
|title=PRIMO.ai
 +
|titlemode=append
 +
|keywords=artificial, intelligence, machine, learning, models, algorithms, data, singularity, moonshot, Tensorflow, Google, Nvidia, Microsoft, Azure, Amazon, AWS
 +
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools
 +
}}
 
[http://www.youtube.com/results?search_query=training+datasets YouTube search...]
 
[http://www.youtube.com/results?search_query=training+datasets YouTube search...]
[http://www.google.com/search?q=datasets+training+deep+learning+artificial+intelligence+&oq=datasets+training+deep+learning+artificial+intelligence+ ...Google search]
+
[http://www.google.com/search?q=datasets+training+deep+machine+learning+artificial+intelligence+ML+AI ...Google search]
  
* [[Data Preprocessing & Feature Exploration]]
+
* [[Benchmarks]]
* [[Hyperparameters]]
+
* [[Batch Norm(alization) & Standardization]]
 +
* [[Data Preprocessing]]
 +
* [[Feature Exploration/Learning]]
 +
* [[Hyperparameter]]s
 +
* [[Data Augmentation]], Data Labeling, and Auto-Tagging
 +
* [[Visualization]]
 +
* [[Master Data Management  (MDM) / Feature Store / Data Lineage / Data Catalog]]
 +
* [[Natural Language Processing (NLP)#Managed Vocabularies |Managed Vocabularies]]
 +
* [http://www.openml.org/search?type=data OpenML datasets]
 +
* [http://pathmind.com/wiki/datasets-ml Datasets and Machine Learning | Chris Nicholson - A.I. Wiki pathmind]
 +
 
 +
Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about [http://news.google.com/topics/CAAqKAgKIiJDQkFTRXdvTkwyY3ZNVEZqYkd4cWMyMDNOQklDWlc0b0FBUAE Cambridge Analytica] highlights the importance of datasets and data collection.  Reference also: [[Privacy in Data Science]]
 +
 +
== Sources ==
 +
* [http://tatoeba.org/eng Tatoeba] a collection of sentences and translations - [http://www.manythings.org/anki/ Tab-delimited Bilingual Sentence Pairs]
 
* [http://www.kaggle.com/datasets Kaggle Datasets]
 
* [http://www.kaggle.com/datasets Kaggle Datasets]
* [http://registry.opendata.aws/ Registry of Open Data | on AWS]
+
* [http://pages.semanticscholar.org/coronavirus-research COVID-19 Open Research Dataset (CORD-19)]  ...[[COVID-19]]
 +
* [http://mlr.cs.umass.edu/ml/ UC Irvine Machine Learning Repository]
 +
** [http://archive.ics.uci.edu/ml/datasets.html Archive | UC Irvine Machine Learning Repository]
 +
* [http://yann.lecun.com/exdb/mnist/ MNIST database]
 +
* [http://datahub.io/collections Collections | DataHub]
 +
* [http://registry.opendata.aws/ Registry of Open Data on AWS | Amazon]
 +
* [http://www.google.com/publicdata/directory Public Data | Google]
 +
* [http://cloud.google.com/bigquery/public-data/ BigQuery public datasets | Google]
 
* [http://storage.googleapis.com/openimages/web/index.html Open Images | Google]
 
* [http://storage.googleapis.com/openimages/web/index.html Open Images | Google]
 +
* [http://www.microsoft.com/en-us/research/academic-program/data-science-microsoft-research/ Data Science for Research | Microsoft]
 +
* [http://www.kdnuggets.com/datasets/index.html Datasets for Data Mining and Data Science | KDnuggets]
 +
* [http://public.enigma.com/ Enigma Public]
 +
* [http://dataportals.org/  A Comprehensive List of Open Data Portals from Around the World | DataPortals.org]
 +
* [http://www.opendatasoft.com/a-comprehensive-list-of-all-open-data-portals-around-the-world/ OpenDataSoft]
 +
* [http://knoema.com/atlas/sources World Data Atlas | Knoema]
 
* [http://www.openml.org/search?type=data The Open Machine Learning project | OpenML.org]
 
* [http://www.openml.org/search?type=data The Open Machine Learning project | OpenML.org]
* [http://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research Datasets | Wikipedia]
+
* [http://www.researchpipeline.com/mediawiki/index.php?title=Main_Page World's Free Online Data | Research Pipeline]
 +
* [http://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research List of datasets for machine learning research | Wikipedia]
 
* [http://resources.wolframcloud.com/NeuralNetRepository Neural Net Repository | Wolfram]
 
* [http://resources.wolframcloud.com/NeuralNetRepository Neural Net Repository | Wolfram]
 
* [http://deeplearning4j.org/opendata Open Data for Deep Learning & Machine Learning | 4j]
 
* [http://deeplearning4j.org/opendata Open Data for Deep Learning & Machine Learning | 4j]
 +
* [http://catalog.data.gov/dataset Data Catalog | Data.gov]
 +
* [http://github.com/timzhang642/3D-Machine-Learning#datasets 3D-Machine-Learning | GitHub]
 +
** [http://github.com/timzhang642/3D-Machine-Learning#3d_models 3D Models]
 +
** [http://github.com/timzhang642/3D-Machine-Learning#3d_scenes 3D Scenes]
 
* [http://www.usgs.gov/news/us-geological-survey-and-us-department-energy-release-online-public-dataset-and-viewer-us-wind Wind Turbine Map and Database | USGS & DOE]
 
* [http://www.usgs.gov/news/us-geological-survey-and-us-department-energy-release-online-public-dataset-and-viewer-us-wind Wind Turbine Map and Database | USGS & DOE]
 
* [http://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart Autosomal DNA]
 
* [http://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart Autosomal DNA]
* [http://github.com/endgameinc/ember EMBER; benign and malicious Windows-portable executable files | Endgame]
 
 
* [http://host.robots.ox.ac.uk/pascal/VOC Pascal Visual Object Classes Challenge (VOC)]
 
* [http://host.robots.ox.ac.uk/pascal/VOC Pascal Visual Object Classes Challenge (VOC)]
 +
* [http://open.nasa.gov/ OpenNASA]
 +
* [http://lib.stat.cmu.edu/jasadata/  JASA Data Archive | Journal of the American Statistical Association]
 +
* [http://lib.stat.cmu.edu/datasets/ Datasets Archive | Journal of the American Statistical Association]
 +
* [http://data.world/ Data.World]
 +
* [http://archive.org/details/datasets The Dataset Collection | Archive.org]
 +
* [http://www.archive-it.org/explore?show=Collections Collections |Archive-it.org]
 +
* [http://ec.europa.eu/eurostat/data/database Eurostat | EU statistical office]
 +
* [http://www.re3data.org/ Re3data]
 +
* [http://fairsharing.org/ Resource on data and metadata standards - open research data | FAIRsharing]
 +
* [http://blog.bigml.com/list-of-public-data-sources-fit-for-machine-learning/ List of Public Data Sources Fit for Machine Learning | bigml]
 +
* [http://skymind.ai/wiki/open-datasets Open Datasets | Skymind]
 +
* [http://apps.who.int/gho/data/node.resources Global Health Observatory resources | World Health Organization (WHO)]
 +
* [http://wonder.cdc.gov/Welcome.html CDC WONDER | Center for Disease Control (CDC)]
 +
* [http://data.medicare.gov/ US health insurance program | Medicare]
 +
* [http://data.imf.org International economy |International Monetary Fund (IMF)]
 +
* [http://datacatalog.worldbank.org/search/datasets Data Catalog }| The World Bank]
 +
* [http://www.quandl.com/ Financial and economic  | Quandl]
 +
** [http://www.quandl.com/alternative-data Alternative data | Quandl]
 +
* [http://github.com/awesomedata/awesome-public-datasets#publicdomains PublicDomains | GitHub]
 +
* [http://github.com/BuzzFeedNews/everything datasets and related content | BuzzFeed - GitHub]
 +
* [http://data.fivethirtyeight.com/ Sports, politics, economics, and other spheres of life | FiveThirtyEight]
 +
* [http://github.com/endgameinc/ember EMBER; benign and malicious Windows-portable executable files | Endgame - GitHub]
 +
* [http://www.reddit.com/r/datasets/ r/datasets | reddit]
 +
* [http://www.microsoft.com/en-us/download/details.aspx?id=55594&WT.mc_id=rss_alldownloads_all Microsoft Information-Seeking Conversation (MISC)] - audio and video signals; transcripts of conversation
 +
* [http://www.clips.uantwerpen.be/conll2003/ner/ Language-Independent Named Entity Recognition (II)]
 +
* [http://www.robots.ox.ac.uk/~vgg/data/vgg_face/ VGG | Oxford]
 +
* [http://challenge2019.perfectcorp.com/ Perfect-500K] beauty and personal care
 +
* [http://voice.mozilla.org/en Mozilla’s Common Voice project] collect human voices
 +
* [http://pages.semanticscholar.org/coronavirus-research COVID-19 Open Research Dataset (CORD-19)] in response to the COVID-19 pandemic
 +
 +
== Networks ==
 +
* [[Bidirectional Encoder Representations from Transformers (BERT)]]
 +
* [[ResNet-50]]
 +
* [http://en.wikipedia.org/wiki/ImageNet ImageNet | Wikipedia]
 +
* [http://en.wikipedia.org/wiki/AlexNet AlexNet | Wikipedia]
 +
* [http://wordnet.princeton.edu/ WordNet]
 +
 +
== Articles ==
 +
* [http://www.forbes.com/sites/korihale/2019/06/25/microsoft-scraps-10-million-facial-recognition-photos-on-the-low/#6672d61949f2 Microsoft Scraps 10 Million Facial Recognition Photos On The Low | Kori Hale -Forbes]
 +
* [http://gengo.ai/datasets/the-50-best-free-datasets-for-machine-learning/  The 50 Best Free Datasets for Machine Learning | Meiryum Ali - Gengo AI]
 +
* [http://medium.com/datadriveninvestor/the-50-best-public-datasets-for-machine-learning-d80e9f030279 The 50 Best Public Datasets for Machine Learning | Stacy Stanford - Medium] 
 +
* [http://www.altexsoft.com/blog/datascience/best-public-machine-learning-datasets/ Best Public Datasets for Machine Learning and Data Science: Sources and Advice on the Choice | Altexsoft]
 +
* [http://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/ 25 Open Datasets for Deep Learning Every Data Scientist Must Work With | PRANAV DAR - Analytics Vidhya]
  
<youtube>koiTTim4M-s</youtube>
+
<youtube>tChcZpBbTTA</youtube>
+
<youtube>jYvBmJo7qjc</youtube>
 
<youtube>dGM1mgkIayY</youtube>
 
<youtube>dGM1mgkIayY</youtube>
 
<youtube>tTjZqz6qk1s</youtube>
 
<youtube>tTjZqz6qk1s</youtube>
Line 26: Line 106:
 
<youtube>av8zkywXHSM</youtube>
 
<youtube>av8zkywXHSM</youtube>
 
<youtube>WhYrLA-fOK0</youtube>
 
<youtube>WhYrLA-fOK0</youtube>
 +
<youtube>KoA1lVRwHrc</youtube>
 +
<youtube>M1WQxTofGe8</youtube>
 +
<youtube>koiTTim4M-s</youtube>
 +
<youtube>tChcZpBbTTA</youtube>
 +
 +
* [http://www.quora.com/What-are-the-alternatives-to-CrowdFlower Human in the Loop...]
 +
** [http://www.mturk.com/ Amazon Mechanical Turk (MTurk)]  - [http://blog.mturk.com/using-mturk-with-amazon-sagemaker-for-supervised-learning-ml-bc30f94e1c0d Using MTurk with Amazon SageMaker for Supervised Learning (ML)]
 +
** [http://gengo.ai/ Gengo.ai] - high-quality multilingual data with a human touch for machine learning
 +
** [http://visit.figure-eight.com/crowdflower-ai-info-old.html Figure Eight CrowdFlower AI] - build a state-of-the-art machine learning model trained with human labeled data

Revision as of 16:42, 26 April 2020

YouTube search... ...Google search

Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy in Data Science

Sources

Networks

Articles