Difference between revisions of "Datasets"

From
Jump to: navigation, search
(27 intermediate revisions by the same user not shown)
Line 8: Line 8:
 
[http://www.google.com/search?q=datasets+training+deep+machine+learning+artificial+intelligence+ML+AI ...Google search]
 
[http://www.google.com/search?q=datasets+training+deep+machine+learning+artificial+intelligence+ML+AI ...Google search]
  
 +
* [[Benchmarks]]
 
* [[Batch Norm(alization) & Standardization]]
 
* [[Batch Norm(alization) & Standardization]]
 
* [[Data Preprocessing]]
 
* [[Data Preprocessing]]
 
* [[Feature Exploration/Learning]]
 
* [[Feature Exploration/Learning]]
* [[Hyperparameters]]
+
* [[Hyperparameter]]s
* [[Data Augmentation]]
+
* [[Data Augmentation]], Data Labeling, and Auto-Tagging
 
* [[Visualization]]
 
* [[Visualization]]
 
* [[Master Data Management  (MDM) / Feature Store / Data Lineage / Data Catalog]]
 
* [[Master Data Management  (MDM) / Feature Store / Data Lineage / Data Catalog]]
 +
* [[Natural Language Processing (NLP)#Managed Vocabularies |Managed Vocabularies]]
 +
* [http://www.openml.org/search?type=data OpenML datasets]
 +
* [http://pathmind.com/wiki/datasets-ml Datasets and Machine Learning | Chris Nicholson - A.I. Wiki pathmind]
  
 
Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about [http://news.google.com/topics/CAAqKAgKIiJDQkFTRXdvTkwyY3ZNVEZqYkd4cWMyMDNOQklDWlc0b0FBUAE Cambridge Analytica] highlights the importance of datasets and data collection.  Reference also: [[Privacy in Data Science]]
 
Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about [http://news.google.com/topics/CAAqKAgKIiJDQkFTRXdvTkwyY3ZNVEZqYkd4cWMyMDNOQklDWlc0b0FBUAE Cambridge Analytica] highlights the importance of datasets and data collection.  Reference also: [[Privacy in Data Science]]
 
   
 
   
 
== Sources ==
 
== Sources ==
 
+
* [http://tatoeba.org/eng Tatoeba] a collection of sentences and translations - [http://www.manythings.org/anki/ Tab-delimited Bilingual Sentence Pairs]
 
 
 
* [http://www.kaggle.com/datasets Kaggle Datasets]
 
* [http://www.kaggle.com/datasets Kaggle Datasets]
 +
* [http://pages.semanticscholar.org/coronavirus-research COVID-19 Open Research Dataset (CORD-19)]  ...[[COVID-19]]
 
* [http://mlr.cs.umass.edu/ml/ UC Irvine Machine Learning Repository]  
 
* [http://mlr.cs.umass.edu/ml/ UC Irvine Machine Learning Repository]  
 
** [http://archive.ics.uci.edu/ml/datasets.html Archive | UC Irvine Machine Learning Repository]
 
** [http://archive.ics.uci.edu/ml/datasets.html Archive | UC Irvine Machine Learning Repository]
Line 28: Line 32:
 
* [http://registry.opendata.aws/ Registry of Open Data on AWS | Amazon]
 
* [http://registry.opendata.aws/ Registry of Open Data on AWS | Amazon]
 
* [http://www.google.com/publicdata/directory Public Data | Google]
 
* [http://www.google.com/publicdata/directory Public Data | Google]
 +
* [http://cloud.google.com/bigquery/public-data/ BigQuery public datasets | Google]
 
* [http://storage.googleapis.com/openimages/web/index.html Open Images | Google]
 
* [http://storage.googleapis.com/openimages/web/index.html Open Images | Google]
 
* [http://www.microsoft.com/en-us/research/academic-program/data-science-microsoft-research/ Data Science for Research | Microsoft]
 
* [http://www.microsoft.com/en-us/research/academic-program/data-science-microsoft-research/ Data Science for Research | Microsoft]
Line 41: Line 46:
 
* [http://deeplearning4j.org/opendata Open Data for Deep Learning & Machine Learning | 4j]
 
* [http://deeplearning4j.org/opendata Open Data for Deep Learning & Machine Learning | 4j]
 
* [http://catalog.data.gov/dataset Data Catalog | Data.gov]
 
* [http://catalog.data.gov/dataset Data Catalog | Data.gov]
 +
* [http://github.com/timzhang642/3D-Machine-Learning#datasets 3D-Machine-Learning | GitHub]
 +
** [http://github.com/timzhang642/3D-Machine-Learning#3d_models 3D Models]
 +
** [http://github.com/timzhang642/3D-Machine-Learning#3d_scenes 3D Scenes]
 
* [http://www.usgs.gov/news/us-geological-survey-and-us-department-energy-release-online-public-dataset-and-viewer-us-wind Wind Turbine Map and Database | USGS & DOE]
 
* [http://www.usgs.gov/news/us-geological-survey-and-us-department-energy-release-online-public-dataset-and-viewer-us-wind Wind Turbine Map and Database | USGS & DOE]
 
* [http://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart Autosomal DNA]
 
* [http://isogg.org/wiki/Autosomal_DNA_testing_comparison_chart Autosomal DNA]
Line 67: Line 75:
 
* [http://github.com/endgameinc/ember EMBER; benign and malicious Windows-portable executable files | Endgame - GitHub]
 
* [http://github.com/endgameinc/ember EMBER; benign and malicious Windows-portable executable files | Endgame - GitHub]
 
* [http://www.reddit.com/r/datasets/ r/datasets | reddit]
 
* [http://www.reddit.com/r/datasets/ r/datasets | reddit]
 +
* [http://www.microsoft.com/en-us/download/details.aspx?id=55594&WT.mc_id=rss_alldownloads_all Microsoft Information-Seeking Conversation (MISC)] - audio and video signals; transcripts of conversation
 +
* [http://www.clips.uantwerpen.be/conll2003/ner/ Language-Independent Named Entity Recognition (II)]
 +
* [http://www.robots.ox.ac.uk/~vgg/data/vgg_face/ VGG | Oxford]
 +
* [http://challenge2019.perfectcorp.com/ Perfect-500K] beauty and personal care
 +
* [http://voice.mozilla.org/en Mozilla’s Common Voice project] collect human voices
 +
* [http://pages.semanticscholar.org/coronavirus-research COVID-19 Open Research Dataset (CORD-19)] in response to the COVID-19 pandemic
 +
 +
== Networks ==
 +
* [[Bidirectional Encoder Representations from Transformers (BERT)]]
 +
* [[ResNet-50]]
 +
* [http://en.wikipedia.org/wiki/ImageNet ImageNet | Wikipedia]
 +
* [http://en.wikipedia.org/wiki/AlexNet AlexNet | Wikipedia]
 +
* [http://wordnet.princeton.edu/ WordNet]
  
 
== Articles ==
 
== Articles ==
 +
* [http://www.forbes.com/sites/korihale/2019/06/25/microsoft-scraps-10-million-facial-recognition-photos-on-the-low/#6672d61949f2 Microsoft Scraps 10 Million Facial Recognition Photos On The Low | Kori Hale -Forbes]
 
* [http://gengo.ai/datasets/the-50-best-free-datasets-for-machine-learning/  The 50 Best Free Datasets for Machine Learning | Meiryum Ali - Gengo AI]
 
* [http://gengo.ai/datasets/the-50-best-free-datasets-for-machine-learning/  The 50 Best Free Datasets for Machine Learning | Meiryum Ali - Gengo AI]
 
* [http://medium.com/datadriveninvestor/the-50-best-public-datasets-for-machine-learning-d80e9f030279 The 50 Best Public Datasets for Machine Learning | Stacy Stanford - Medium] 
 
* [http://medium.com/datadriveninvestor/the-50-best-public-datasets-for-machine-learning-d80e9f030279 The 50 Best Public Datasets for Machine Learning | Stacy Stanford - Medium] 
 
* [http://www.altexsoft.com/blog/datascience/best-public-machine-learning-datasets/ Best Public Datasets for Machine Learning and Data Science: Sources and Advice on the Choice | Altexsoft]
 
* [http://www.altexsoft.com/blog/datascience/best-public-machine-learning-datasets/ Best Public Datasets for Machine Learning and Data Science: Sources and Advice on the Choice | Altexsoft]
 
* [http://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/ 25 Open Datasets for Deep Learning Every Data Scientist Must Work With | PRANAV DAR - Analytics Vidhya]
 
* [http://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/ 25 Open Datasets for Deep Learning Every Data Scientist Must Work With | PRANAV DAR - Analytics Vidhya]
 +
 
   
 
   
 
+
<youtube>jYvBmJo7qjc</youtube>
<youtube>koiTTim4M-s</youtube>
 
<youtube>tChcZpBbTTA</youtube>
 
 
<youtube>dGM1mgkIayY</youtube>
 
<youtube>dGM1mgkIayY</youtube>
 
<youtube>tTjZqz6qk1s</youtube>
 
<youtube>tTjZqz6qk1s</youtube>
Line 87: Line 108:
 
<youtube>KoA1lVRwHrc</youtube>
 
<youtube>KoA1lVRwHrc</youtube>
 
<youtube>M1WQxTofGe8</youtube>
 
<youtube>M1WQxTofGe8</youtube>
 +
<youtube>koiTTim4M-s</youtube>
 +
<youtube>tChcZpBbTTA</youtube>
  
 
* [http://www.quora.com/What-are-the-alternatives-to-CrowdFlower Human in the Loop...]
 
* [http://www.quora.com/What-are-the-alternatives-to-CrowdFlower Human in the Loop...]

Revision as of 16:42, 26 April 2020

YouTube search... ...Google search

Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy in Data Science

Sources

Networks

Articles