Difference between revisions of "Datasets"
m |
m |
||
| Line 184: | Line 184: | ||
|| | || | ||
<youtube>KoA1lVRwHrc</youtube> | <youtube>KoA1lVRwHrc</youtube> | ||
| − | <b> | + | <b>ML #8 - Open Healthcare Datasets |
| − | </b><br> | + | </b><br>Many people want healthcare data to play with, but don't know where to find it. In this chat we'll provide you the data resources you need to start doing machine learning. |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> | ||
| Line 193: | Line 193: | ||
|| | || | ||
<youtube>M1WQxTofGe8</youtube> | <youtube>M1WQxTofGe8</youtube> | ||
| − | <b> | + | <b>Open Data Innovation: Building on Open Data Sets for Innovative Applications |
| − | </b><br> | + | </b><br>An overarching conversation on open data innovation. The session highlights how democratizing access to information drives innovation and greater impact. Learn how organizations are using the cloud to gather data and discover insights to foster innovation, improve service delivery and address big societal problems. As data becomes more widely available (GIS, weather, research), having access to scalable technology and the multiple data sources that can feed into the technology solution can help create solutions for significant problems in the world. This session highlights real-world examples of how open data is enabling transformative innovation. Explore how the new Landsat open data set on AWS is spurring innovation among public and private entities or delivering applications to citizens and users. |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
Revision as of 12:52, 7 September 2020
YouTube search... ...Google search
- AI Governance
- Hyperparameters
- Visualization
- Facets | Google...contains two robust Visualizations to aid in understanding and analyzing machine learning datasets.
- OpenML datasets
- Datasets and Machine Learning | Chris Nicholson - A.I. Wiki pathmind
- Datasets used in deep learning applications within X-ray security imaging | Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging | Samet Akcay and Toby P. Breckon - Durham University, UK
Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy in Data Science
Sources
- Tatoeba a collection of sentences and translations - Tab-delimited Bilingual Sentence Pairs
- Kaggle Datasets
- COVID-19 Open Research Dataset (CORD-19) ...COVID-19
- UC Irvine Machine Learning Repository
- MNIST database
- Collections | DataHub
- Registry of Open Data on AWS | Amazon
- Public Data | Google
- BigQuery public datasets | Google
- Open Images | Google
- Data Science for Research | Microsoft
- Datasets for Data Mining and Data Science | KDnuggets
- Enigma Public
- A Comprehensive List of Open Data Portals from Around the World | DataPortals.org
- OpenDataSoft
- World Data Atlas | Knoema
- The Open Machine Learning project | OpenML.org
- World's Free Online Data | Research Pipeline
- List of datasets for machine learning research | Wikipedia
- Neural Net Repository | Wolfram
- Open Data for Deep Learning & Machine Learning | 4j
- Data Catalog | Data.gov
- 3D-Machine-Learning | GitHub
- Wind Turbine Map and Database | USGS & DOE
- Autosomal DNA
- Pascal Visual Object Classes Challenge (VOC)
- OpenNASA
- Data: Close encounters between two objects |European Space Agency (ESA)
- JASA Data Archive | Journal of the American Statistical Association
- Datasets Archive | Journal of the American Statistical Association
- Data.World
- The Dataset Collection | Archive.org
- Collections |Archive-it.org
- Eurostat | EU statistical office
- Re3data
- Resource on data and metadata standards - open research data | FAIRsharing
- List of Public Data Sources Fit for Machine Learning | bigml
- Open Datasets | Skymind
- Global Health Observatory resources | World Health Organization (WHO)
- CDC WONDER | Center for Disease Control (CDC)
- US health insurance program | Medicare
- International economy |International Monetary Fund (IMF)
- Data Catalog }| The World Bank
- Financial and economic | Quandl
- PublicDomains | GitHub
- datasets and related content | BuzzFeed - GitHub
- Sports, politics, economics, and other spheres of life | FiveThirtyEight
- EMBER; benign and malicious Windows-portable executable files | Endgame - GitHub
- r/datasets | reddit
- Microsoft Information-Seeking Conversation (MISC) - audio and video signals; transcripts of conversation
- Language-Independent Named Entity Recognition (II)
- VGG | Oxford
- Perfect-500K beauty and personal care
- Mozilla’s Common Voice project collect human voices
- CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. | A. Krizhevsky, V. Nair, and G. Hinton - Canadian Institute For Advanced Research]
Networks
- Bidirectional Encoder Representations from Transformers (BERT)
- ResNet-50
- ImageNet | Wikipedia
- AlexNet | Wikipedia
- WordNet
Articles
- Microsoft Scraps 10 Million Facial Recognition Photos On The Low | Kori Hale -Forbes
- The 50 Best Free Datasets for Machine Learning | Meiryum Ali - Gengo AI
- The 50 Best Public Datasets for Machine Learning | Stacy Stanford - Medium
- Best Public Datasets for Machine Learning and Data Science: Sources and Advice on the Choice | Altexsoft
- 25 Open Datasets for Deep Learning Every Data Scientist Must Work With | PRANAV DAR - Analytics Vidhya
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- Human in the Loop...
- Amazon Mechanical Turk (MTurk) - Using MTurk with Amazon SageMaker for Supervised Learning (ML)
- Gengo.ai - high-quality multilingual data with a human touch for machine learning
- Figure Eight CrowdFlower AI - build a state-of-the-art machine learning model trained with human labeled data