Difference between revisions of "Datasets"
m |
m |
||
| Line 25: | Line 25: | ||
* [[AI Governance]] / [[Algorithm Administration]] | * [[AI Governance]] / [[Algorithm Administration]] | ||
* [[Natural Language Processing (NLP)#Managed Vocabularies |Managed Vocabularies]] | * [[Natural Language Processing (NLP)#Managed Vocabularies |Managed Vocabularies]] | ||
| − | * [[ | + | * [[Analytics]] ... [[Bayes]] ... [[Loop]] ... [[Visualization]] ... [[Diagrams for Business Analysis|Diagrams]] & [[Generative AI for Business Analysis]] ... [[Network Pattern]] |
| + | |||
| + | * [[Development]] ... [[Notebooks]] ... [[Development#AI Pair Programming Tools|AI Pair Programming Tools]] ... [[Algorithm Administration#AIOps/MLOps|AIOps/MLOps]] ... [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)|AIaaS/MLaaS]] | ||
** [[Google Facets| Facets]] | [[Google]]...contains two robust [[Visualization]]s to aid in understanding and analyzing machine learning datasets. | ** [[Google Facets| Facets]] | [[Google]]...contains two robust [[Visualization]]s to aid in understanding and analyzing machine learning datasets. | ||
* [[Algorithm Administration#Hyperparameter|Hyperparameter]]s | * [[Algorithm Administration#Hyperparameter|Hyperparameter]]s | ||
| Line 34: | Line 36: | ||
* [https://pathmind.com/wiki/datasets-ml Datasets and Machine Learning | Chris Nicholson - A.I. Wiki pathmind] | * [https://pathmind.com/wiki/datasets-ml Datasets and Machine Learning | Chris Nicholson - A.I. Wiki pathmind] | ||
* [https://paperswithcode.com/paper/towards-automatic-threat-detection-a-survey/review/ Datasets used in deep learning applications within X-ray security imaging | Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging | Samet Akcay and Toby P. Breckon - Durham University, UK] | * [https://paperswithcode.com/paper/towards-automatic-threat-detection-a-survey/review/ Datasets used in deep learning applications within X-ray security imaging | Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging | Samet Akcay and Toby P. Breckon - Durham University, UK] | ||
| − | |||
Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about [https://news.google.com/topics/CAAqKAgKIiJDQkFTRXdvTkwyY3ZNVEZqYkd4cWMyMDNOQklDWlc0b0FBUAE Cambridge Analytica] highlights the importance of datasets and data collection. Reference also: [[Privacy]] | Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about [https://news.google.com/topics/CAAqKAgKIiJDQkFTRXdvTkwyY3ZNVEZqYkd4cWMyMDNOQklDWlc0b0FBUAE Cambridge Analytica] highlights the importance of datasets and data collection. Reference also: [[Privacy]] | ||
Revision as of 14:48, 3 July 2023
YouTube ... Quora ...Google search ...Google News ...Bing News
- Data Science ... Governance ... Preprocessing ... Exploration ... Interoperability ... Master Data Management (MDM) ... Bias and Variances ... Benchmarks ... Datasets
- Excel ... Documents ... Database ... Graph ... LlamaIndex
- Data Quality ...validity, accuracy, cleaning, completeness, consistency, encoding, padding, augmentation, labeling, auto-tagging, normalization, standardization, and imbalanced data
- AI Governance / Algorithm Administration
- Managed Vocabularies
- Analytics ... Bayes ... Loop ... Visualization ... Diagrams & Generative AI for Business Analysis ... Network Pattern
- Development ... Notebooks ... AI Pair Programming Tools ... AIOps/MLOps ... AIaaS/MLaaS
- Facets | Google...contains two robust Visualizations to aid in understanding and analyzing machine learning datasets.
- Hyperparameters
- Evaluation ... Prompts for assessing AI projects
- Train, Validate, and Test
- OpenML datasets
- Datasets and Machine Learning | Chris Nicholson - A.I. Wiki pathmind
- Datasets used in deep learning applications within X-ray security imaging | Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging | Samet Akcay and Toby P. Breckon - Durham University, UK
Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy
Sources
- MLCommons ...MLCommons debuts with public 86,000-hour speech data set for AI researchers | Devin Coldewey - TechCrunch
- Question Answering in Context (QuAC) ...Question Answering in context for modeling, understanding, and participating in information seeking dialog.
- Tatoeba a collection of sentences and translations - Tab-delimited Bilingual Sentence Pairs
- Kaggle Datasets
- COVID-19 Open Research Dataset (CORD-19) ...COVID-19
- UC Irvine Machine Learning Repository
- MNIST database
- Collections | DataHub
- Registry of Open Data on AWS | Amazon
- Public Data | Google
- BigQuery public datasets | Google
- Open Images | Google
- Data Science for Research | Microsoft
- Datasets for Data Mining and Data Science | KDnuggets
- Enigma Public
- A Comprehensive List of Open Data Portals from Around the World | DataPortals.org
- OpenDataSoft
- World Data Atlas | Knoema
- The Open Machine Learning project | OpenML.org
- World's Free Online Data | Research Pipeline
- List of datasets for machine learning research | Wikipedia
- Neural Net Repository | Wolfram
- Open Data for Deep Learning & Machine Learning | 4j
- Data Catalog | Data.gov
- 3D-Machine-Learning | GitHub
- Wind Turbine Map and Database | USGS & DOE
- Autosomal DNA
- Pascal Visual Object Classes Challenge (VOC)
- OpenNASA
- Data: Close encounters between two objects |European Space Agency (ESA)
- JASA Data Archive | Journal of the American Statistical Association
- Datasets Archive | Journal of the American Statistical Association
- Data.World
- The Dataset Collection | Archive.org
- Collections |Archive-it.org
- Eurostat | EU statistical office
- Re3data
- Resource on data and metadata standards - open research data | FAIRsharing
- List of Public Data Sources Fit for Machine Learning | bigml
- Open Datasets | Skymind
- Global Health Observatory resources | World Health Organization (WHO)
- CDC WONDER | Center for Disease Control (CDC)
- US health insurance program | Medicare
- International economy |International Monetary Fund (IMF)
- Data Catalog }| The World Bank
- Financial and economic | Quandl
- PublicDomains | GitHub
- datasets and related content | BuzzFeed - GitHub
- Sports, politics, economics, and other spheres of life | FiveThirtyEight
- EMBER; benign and malicious Windows-portable executable files | Endgame - GitHub
- r/datasets | reddit
- Microsoft Information-Seeking Conversation (MISC) - audio and video signals; transcripts of conversation
- Language-Independent Named Entity Recognition (II)
- VGG | Oxford
- Perfect-500K beauty and personal care
- Mozilla’s Common Voice project collect human voices
- CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. | A. Krizhevsky, V. Nair, and G. Hinton - Canadian Institute For Advanced Research]
Networks
- Bidirectional Encoder Representations from Transformers (BERT)
- ResNet-50
- ImageNet | Wikipedia
- AlexNet | Wikipedia
- WordNet
Articles
- Microsoft Scraps 10 Million Facial Recognition Photos On The Low | Kori Hale -Forbes
- The 50 Best Free Datasets for Machine Learning | Meiryum Ali - Gengo AI
- The 50 Best Public Datasets for Machine Learning | Stacy Stanford - Medium
- Best Public Datasets for Machine Learning and Data Science: Sources and Advice on the Choice | Altexsoft
- 25 Open Datasets for Deep Learning Every Data Scientist Must Work With | PRANAV DAR - Analytics Vidhya
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- Human in the Loop...
- Amazon Mechanical Turk (MTurk) - Using MTurk with Amazon SageMaker for Supervised Learning (ML)
- Gengo.ai - high-quality multilingual data with a human touch for machine learning
- Figure Eight CrowdFlower AI - build a state-of-the-art machine learning model trained with human labeled data