Datasets

From
Jump to: navigation, search

YouTube search... ...Google search


Datasets (often in combination with algorithms) are becoming more important themselves and can sometimes be seen as the primary intellectual output of the research. The revelations about Cambridge Analytica highlights the importance of datasets and data collection. Reference also: Privacy

Sources

Networks

Articles

"ImageNet: Where Have We Been? Where Are We Going?" with Fei-Fei Li
Date: 9/21/2017 It took nature and evolution more than 500 million years to develop a powerful visual system in humans. The journey for AI and computer vision is about half of a century. In this talk, Dr. Li will briefly discuss the key ideas and the cutting edge advances in the quest for visual intelligences in computers, focusing on work done to develop ImageNet over the years. Fei-Fei Li is currently on sabbatical as the Chief Scientist of AI/ML at Google Cloud. She is an Associate Professor in the Computer Science Department at Stanford, and the Director of the Stanford Artificial Intelligence Lab. Her main research areas are in machine learning, deep learning, computer vision, and cognitive and computational neuroscience. She has published more than 150 scientific articles in top-tier journals and conferences, including Nature, PNAS, Journal of Neuroscience, CVPR, ICCV, NIPS, ECCV, IJCV, IEEE-PAMI, etc. Li obtained her B.A. degree in physics from Princeton with High Honors, and her Ph.D. degree in electrical engineering from the California Institute of Technology (Caltech). She joined Stanford in 2009 as an assistant professor, and was promoted to associate professor with tenure in 2012.

GCP Public Datasets Program: Share and analyze large-scale global datasets (Google Cloud Next '17)
Publicly available large datasets hold great potential to better the world. In this video, Felipe Hoffa introduces the Public Datasets Program by Google Cloud Platform. The program gives dataset owners a terrific platform to share their data, so that users across the world can easily leverage these datasets for large-scale analytics. You'll learn how you can participate in this program, whether you want to broadly share your data or hope to glean insights from large public datasets. Missed the conference? Watch all the talks here: http://goo.gl/c1Vs3h Watch more talks about Big Data & Machine Learning here: http://goo.gl/OcqI9k

P2 How to download a Kaggle dataset & Install Numpy, Pandas, and more - Multiple Linear Regression
What’s up yall! We are back again. How was your weekend? After yesterday's introductory episode we are jumping straight in to the nitty gritty of multiple linear regression. But first, let's do some preparation.

Accessing public datasets on Amazon S3 using Globus
Demonstrates how you can easily access and download big datasets from public repositories using Globus for Amazon S3

AWS re:Invent 2017: Migrating Databases and Data Warehouses to the Cloud: Getting St (DAT317)
In this introductory session, we look at how to convert and migrate your commercial databases and data warehouses to the cloud and gain your database freedom. Amazon AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) have been used to migrate tens of thousands of databases. These include Oracle and SQL Server to Amazon Aurora, Teradata and Netezza to Amazon Redshift, MongoDB to Amazon DynamoDB, and many other data source and target combinations. Learn how to easily and securely migrate your data and procedural code, enjoy flexibility and cost savings, and gain new opportunities.

Joining Datasets | Intro to Azure ML Part 6
Last time we prepared our dataset for a join. In this video we’ll use the join data module inside of Azure ML to cross reference each airport id with the airport table to find airport city, airport state, and airport name. We will briefly go over the different types of joins, then combine the three tables together. Each time we join we will add 3 columns to our dataset.

Deep learning idea for creating datasets
An idea to easily take snapshots or crops of images to break larger images into nice labled images for a database

ML #8 - Open Healthcare Datasets
Many people want healthcare data to play with, but don't know where to find it. In this chat we'll provide you the data resources you need to start doing machine learning.

Open Data Innovation: Building on Open Data Sets for Innovative Applications
An overarching conversation on open data innovation. The session highlights how democratizing access to information drives innovation and greater impact. Learn how organizations are using the cloud to gather data and discover insights to foster innovation, improve service delivery and address big societal problems. As data becomes more widely available (GIS, weather, research), having access to scalable technology and the multiple data sources that can feed into the technology solution can help create solutions for significant problems in the world. This session highlights real-world examples of how open data is enabling transformative innovation. Explore how the new Landsat open data set on AWS is spurring innovation among public and private entities or delivering applications to citizens and users.

VRmeta: Generating AI datasets one precise meta-tag at a time
VRmeta is the world's most precise means of adding time-based descriptive metadata to both digital and immersive video. Whether the goal is to create unmatched discoverability for your entire video library, leverage metrics from all that amazing content or license those clips to increase their inbound revenue - VRmeta makes it happen Today’s consumers have more choices than ever before for video entertainment and viewing platforms With this explosion of choice has come complexity. Finding engaging entertainment has become a time consuming and frustrating, resulting in declining engagement and viewer satisfaction The key to overcoming this discovery challenge lies in rich, time-based descriptive metadata VRmeta is your gateway to making this happen:​ VRmeta's patent-pending cross-hair and tactile navigation technology gives users the most precise means of applying metadata ever created​ VRmeta gives every clip it touches time and in-frame location data registered with in and out points, all saved into .csv and .xmp sidecar files​ VRmeta delivers AI precision now. VRmeta even learns your tagging vocabulary, offering users auto-completion for frequently used words and names​ By applying time-based descriptive metadata at the production level, stakeholders create additional value at every stage of the video content lifecycle​ VRmeta stands firmly at the nexus of artificial intelligence and healthcare, and is a recognized state-of-the-art solution central to the development of emotional AI datasets​ The science surrounding sentiment analysis involves natural language processing or linguistic algorithms that assign values to positive, negative or neutral text (converting supposition into monetizable data silos). VRmeta is the ideal method for inputting this data​ VRmeta is the tool of choice for broadcasters looking to develop information rich, statistical data silos for any variety of sports. Think team and player performance aggregate, post-game data and deep dive statistic development​ "Great content without accurate metadata is, after all, a missed opportunity"

Open Data Innovation: Building on Open Data Sets for Innovative Applications
An overarching conversation on open data innovation. The session highlights how democratizing access to information drives innovation and greater impact. Learn how organizations are using the cloud to gather data and discover insights to foster innovation, improve service delivery and address big societal problems. As data becomes more widely available (GIS, weather, research), having access to scalable technology and the multiple data sources that can feed into the technology solution can help create solutions for significant problems in the world. This session highlights real-world examples of how open data is enabling transformative innovation. Explore how the new Landsat open data set on AWS is spurring innovation among public and private entities or delivering applications to citizens and users.

Question Answering Beyond SQuAD: Larger Datasets and New Domains, with Branden Chan, deepset.ai
Branden Chan, an NLP Engineer at deepset.ai in Berlin, presents on Question Answering Beyond SQuAD: Larger Datasets and New Domains in an online program, May 26, 2020, organized and moderated by Seth Grimes for the New York Natural Language Processing meetup (https://www.meetup.com/NY-NLP) and partners.

How to Make Data Amazing - Intro to Deep Learning #5
Siraj Raval In this video, we'll go through data preprocessing steps for 3 different datasets. We'll also go in depth on a dimensionality reduction technique called Principal Component Analysis (PCA).

How to Learn from Little Data - Intro to Deep Learning #17
Siraj Raval One-shot learning! In this last weekly video of the course, i'll explain how memory augmented neural networks can help achieve one-shot classification for a small labeled image dataset. We'll also go over the architecture of it's inspiration (the neural turing machine).