Difference between revisions of "Datasets"
m |
m |
||
| Line 116: | Line 116: | ||
<youtube>jYvBmJo7qjc</youtube> | <youtube>jYvBmJo7qjc</youtube> | ||
<b>"ImageNet: Where Have We Been? Where Are We Going?" with [[Creatives#Fei-Fei Li |Fei-Fei Li]] | <b>"ImageNet: Where Have We Been? Where Are We Going?" with [[Creatives#Fei-Fei Li |Fei-Fei Li]] | ||
| − | </b><br>Date: 9/21/2017 It took nature and evolution more than 500 million years to develop a powerful visual system in humans. The journey for AI and computer vision is about half of a century. In this talk, [[Creatives#Fei-Fei Li |Dr. Li]] will briefly discuss the key ideas and the cutting edge advances in the quest for visual intelligences in computers, focusing on work done to develop ImageNet over the years. [[Creatives#Fei-Fei Li |Fei-Fei Li]] is currently on sabbatical as the Chief Scientist of AI/ML at Google Cloud. She is an Associate Professor in the Computer Science Department at Stanford, and the Director of the Stanford Artificial Intelligence Lab. Her main research areas are in machine learning, deep learning, computer vision, and cognitive and computational neuroscience. She has published more than 150 scientific articles in top-tier journals and conferences, including Nature, PNAS, Journal of Neuroscience, CVPR, ICCV, NIPS, ECCV, IJCV, IEEE-PAMI, etc. Li obtained her B.A. degree in physics from Princeton with High Honors, and her Ph.D. degree in electrical engineering from the California Institute of Technology (Caltech). She joined Stanford in 2009 as an assistant professor, and was promoted to associate professor with tenure in 2012. | + | </b><br>Date: 9/21/2017 It took nature and evolution more than 500 million years to develop a powerful visual system in humans. The journey for AI and computer vision is about half of a century. In this talk, [[Creatives#Fei-Fei Li |Dr. Li]] will briefly discuss the key ideas and the cutting edge advances in the quest for visual intelligences in computers, focusing on work done to develop ImageNet over the years. [[Creatives#Fei-Fei Li |Fei-Fei Li]] is currently on sabbatical as the Chief Scientist of AI/ML at Google Cloud. She is an Associate Professor in the Computer Science Department at Stanford, and the Director of the Stanford Artificial Intelligence Lab. Her main research areas are in machine learning, deep learning, computer vision, and cognitive and computational neuroscience. She has published more than 150 scientific articles in top-tier journals and conferences, including Nature, PNAS, Journal of Neuroscience, CVPR, ICCV, NIPS, ECCV, IJCV, IEEE-PAMI, etc. Li obtained her B.A. degree in physics from Princeton with High Honors, and her Ph.D. degree in electrical engineering from the California Institute of Technology (Caltech). She joined Stanford in 2009 as an [[Assistants|assistant]] professor, and was promoted to associate professor with tenure in 2012. |
|} | |} | ||
|<!-- M --> | |<!-- M --> | ||
Revision as of 20:46, 6 February 2023
YouTube search... ...Google search
- AI Governance / Algorithm Administration
- Visualization
- Facets | Google...contains two robust Visualizations to aid in understanding and analyzing machine learning datasets.
- Hyperparameters
- Evaluation
- Train, Validate, and Test
- OpenML datasets
- Datasets and Machine Learning | Chris Nicholson - A.I. Wiki pathmind
- Datasets used in deep learning applications within X-ray security imaging | Towards Automatic Threat Detection: A Survey of Advances of Deep Learning within X-ray Security Imaging | Samet Akcay and Toby P. Breckon - Durham University, UK
Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy
Sources
- MLCommons ...MLCommons debuts with public 86,000-hour speech data set for AI researchers | Devin Coldewey - TechCrunch
- Question Answering in Context (QuAC) ...Question Answering in Context for modeling, understanding, and participating in information seeking dialog.
- Tatoeba a collection of sentences and translations - Tab-delimited Bilingual Sentence Pairs
- Kaggle Datasets
- COVID-19 Open Research Dataset (CORD-19) ...COVID-19
- UC Irvine Machine Learning Repository
- MNIST database
- Collections | DataHub
- Registry of Open Data on AWS | Amazon
- Public Data | Google
- BigQuery public datasets | Google
- Open Images | Google
- Data Science for Research | Microsoft
- Datasets for Data Mining and Data Science | KDnuggets
- Enigma Public
- A Comprehensive List of Open Data Portals from Around the World | DataPortals.org
- OpenDataSoft
- World Data Atlas | Knoema
- The Open Machine Learning project | OpenML.org
- World's Free Online Data | Research Pipeline
- List of datasets for machine learning research | Wikipedia
- Neural Net Repository | Wolfram
- Open Data for Deep Learning & Machine Learning | 4j
- Data Catalog | Data.gov
- 3D-Machine-Learning | GitHub
- Wind Turbine Map and Database | USGS & DOE
- Autosomal DNA
- Pascal Visual Object Classes Challenge (VOC)
- OpenNASA
- Data: Close encounters between two objects |European Space Agency (ESA)
- JASA Data Archive | Journal of the American Statistical Association
- Datasets Archive | Journal of the American Statistical Association
- Data.World
- The Dataset Collection | Archive.org
- Collections |Archive-it.org
- Eurostat | EU statistical office
- Re3data
- Resource on data and metadata standards - open research data | FAIRsharing
- List of Public Data Sources Fit for Machine Learning | bigml
- Open Datasets | Skymind
- Global Health Observatory resources | World Health Organization (WHO)
- CDC WONDER | Center for Disease Control (CDC)
- US health insurance program | Medicare
- International economy |International Monetary Fund (IMF)
- Data Catalog }| The World Bank
- Financial and economic | Quandl
- PublicDomains | GitHub
- datasets and related content | BuzzFeed - GitHub
- Sports, politics, economics, and other spheres of life | FiveThirtyEight
- EMBER; benign and malicious Windows-portable executable files | Endgame - GitHub
- r/datasets | reddit
- Microsoft Information-Seeking Conversation (MISC) - audio and video signals; transcripts of conversation
- Language-Independent Named Entity Recognition (II)
- VGG | Oxford
- Perfect-500K beauty and personal care
- Mozilla’s Common Voice project collect human voices
- CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. | A. Krizhevsky, V. Nair, and G. Hinton - Canadian Institute For Advanced Research]
Networks
- Bidirectional Encoder Representations from Transformers (BERT)
- ResNet-50
- ImageNet | Wikipedia
- AlexNet | Wikipedia
- WordNet
Articles
- Microsoft Scraps 10 Million Facial Recognition Photos On The Low | Kori Hale -Forbes
- The 50 Best Free Datasets for Machine Learning | Meiryum Ali - Gengo AI
- The 50 Best Public Datasets for Machine Learning | Stacy Stanford - Medium
- Best Public Datasets for Machine Learning and Data Science: Sources and Advice on the Choice | Altexsoft
- 25 Open Datasets for Deep Learning Every Data Scientist Must Work With | PRANAV DAR - Analytics Vidhya
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- Human in the Loop...
- Amazon Mechanical Turk (MTurk) - Using MTurk with Amazon SageMaker for Supervised Learning (ML)
- Gengo.ai - high-quality multilingual data with a human touch for machine learning
- Figure Eight CrowdFlower AI - build a state-of-the-art machine learning model trained with human labeled data