Difference between revisions of "Feature Exploration/Learning"

From
Jump to: navigation, search
m
m
Line 5: Line 5:
 
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
 
|description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools  
 
}}
 
}}
[http://www.youtube.com/results?search_query=Feature+Exploration+machine+learning+ML YouTube search...]
+
[https://www.youtube.com/results?search_query=Feature+Exploration+machine+learning+ML YouTube search...]
[http://www.google.com/search?q=Feature+Exploration+machine+learning+ML ...Google search]
+
[https://www.google.com/search?q=Feature+Exploration+machine+learning+ML ...Google search]
  
* [http://en.wikipedia.org/wiki/Feature_selection Feature selection | Wikipedia]
+
* [https://en.wikipedia.org/wiki/Feature_selection Feature selection | Wikipedia]
* [http://www.kdnuggets.com/2018/10/notes-feature-preprocessing-what-why-how.html Notes on Feature Preprocessing: The What, the Why, and the How | Matthew Mayo - KDnuggets]
+
* [https://www.kdnuggets.com/2018/10/notes-feature-preprocessing-what-why-how.html Notes on Feature Preprocessing: The What, the Why, and the How | Matthew Mayo - KDnuggets]
 
* [[Evaluating Machine Learning Models]]
 
* [[Evaluating Machine Learning Models]]
 
* [[Algorithm Administration#Automated Learning|Automated Learning]]
 
* [[Algorithm Administration#Automated Learning|Automated Learning]]
Line 15: Line 15:
 
* [[Principal Component Analysis (PCA)]]
 
* [[Principal Component Analysis (PCA)]]
 
* [[Representation Learning]]
 
* [[Representation Learning]]
* [http://bookdown.org/max/FES/ Feature Engineering and Selection: A Practical Approach for Predictive Models | Max Kuhn and Kjell Johnson]
+
* [https://bookdown.org/max/FES/ Feature Engineering and Selection: A Practical Approach for Predictive Models | Max Kuhn and Kjell Johnson]
* [http://github.com/jontupitza Jon Tupitza's Famous Jupyter Notebooks:]
+
* [https://github.com/jontupitza Jon Tupitza's Famous Jupyter Notebooks:]
** [http://github.com/JonTupitza/Data-Science-On-Ramp/blob/master/01-Parametric-Tests.ipynb Parametric Tests: Tests Designed for Normally-Distributed Data]
+
** [https://github.com/JonTupitza/Data-Science-On-Ramp/blob/master/01-Parametric-Tests.ipynb Parametric Tests: Tests Designed for Normally-Distributed Data]
*** [http://github.com/JonTupitza/Data-Science-Process/blob/master/02-EDA-Univariate-Analysis.ipynb Exploratory Data Analysis - Univariate]
+
*** [https://github.com/JonTupitza/Data-Science-Process/blob/master/02-EDA-Univariate-Analysis.ipynb Exploratory Data Analysis - Univariate]
** [http://github.com/JonTupitza/Data-Science-On-Ramp/blob/master/02-Non-Parametric-Tests.ipynb Non-Parametric Tests: Tests Designed for Data That's Not Normally-Distributed]
+
** [https://github.com/JonTupitza/Data-Science-On-Ramp/blob/master/02-Non-Parametric-Tests.ipynb Non-Parametric Tests: Tests Designed for Data That's Not Normally-Distributed]
*** [http://github.com/JonTupitza/Data-Science-Process/blob/master/03-EDA-Bivariate-Analysis.ipynb Exploratory Data Analysis - Bivariate]
+
*** [https://github.com/JonTupitza/Data-Science-Process/blob/master/03-EDA-Bivariate-Analysis.ipynb Exploratory Data Analysis - Bivariate]
** [http://github.com/JonTupitza/Data-Science-Process/blob/master/04-EDA-Correlation-Analysis.ipynb Exploratory Data Analysis - Correlation]
+
** [https://github.com/JonTupitza/Data-Science-Process/blob/master/04-EDA-Correlation-Analysis.ipynb Exploratory Data Analysis - Correlation]
** [http://github.com/JonTupitza/Data-Science-Process/blob/master/05-Feature-Selection.ipynb Feature Selection Techniques]  
+
** [https://github.com/JonTupitza/Data-Science-Process/blob/master/05-Feature-Selection.ipynb Feature Selection Techniques]  
 
* [[AI Governance]] / [[Algorithm Administration]]
 
* [[AI Governance]] / [[Algorithm Administration]]
 
** [[Data Science]] / [[Data Governance]]
 
** [[Data Science]] / [[Data Governance]]
Line 38: Line 38:
 
* [[Visualization]]
 
* [[Visualization]]
 
* Tools:  
 
* Tools:  
** [http://www.qubole.com/solutions/by-project/ What’s Your Project? | Qubole]
+
** [https://www.qubole.com/solutions/by-project/ What’s Your Project? | Qubole]
** [http://www.trifacta.com/ From Messy Files To Automated Analytics | Trifacta]
+
** [https://www.trifacta.com/ From Messy Files To Automated Analytics | Trifacta]
** [http://databricks.com/product/automl-on-databricks  Accelerate discovery with a collaborative platform | Databricks]
+
** [https://databricks.com/product/automl-on-databricks  Accelerate discovery with a collaborative platform | Databricks]
** [http://www.paxata.com/  The Data Prep for AI Toolkit: Smarter ML Models Through Faster, More Accurate Data Prep | Paxata]
+
** [https://www.paxata.com/  The Data Prep for AI Toolkit: Smarter ML Models Through Faster, More Accurate Data Prep | Paxata]
** [http://www.alteryx.com/e-book/age-badass-analyst The Age of The Badass Analyst | Alteryx]
+
** [https://www.alteryx.com/e-book/age-badass-analyst The Age of The Badass Analyst | Alteryx]
  
A feature is an individual measurable property or characteristic of a phenomenon being observed. The concept of a “feature” is related to that of an explanatory variable, which is used in statistical techniques such as linear regression. Feature vectors combine all of the features for a single row into a numerical vector. Part of the art of choosing features is to pick a minimum set of independent variables that explain the problem. If two variables are highly correlated, either they need to be combined into a single feature, or one should be dropped. Sometimes people perform principal component analysis to convert correlated variables into a set of linearly uncorrelated variables. Some of the transformations that people use to construct new features or reduce the dimensionality of feature vectors are simple. For example, subtract Year of Birth from Year of Death and you construct Age at Death, which is a prime independent variable for lifetime and mortality analysis. In other cases, feature construction may not be so obvious. [http://www.infoworld.com/article/3394399/machine-learning-algorithms-explained.html Machine learning algorithms explained | Martin Heller - InfoWorld]
+
A feature is an individual measurable property or characteristic of a phenomenon being observed. The concept of a “feature” is related to that of an explanatory variable, which is used in statistical techniques such as linear regression. Feature vectors combine all of the features for a single row into a numerical vector. Part of the art of choosing features is to pick a minimum set of independent variables that explain the problem. If two variables are highly correlated, either they need to be combined into a single feature, or one should be dropped. Sometimes people perform principal component analysis to convert correlated variables into a set of linearly uncorrelated variables. Some of the transformations that people use to construct new features or reduce the dimensionality of feature vectors are simple. For example, subtract Year of Birth from Year of Death and you construct Age at Death, which is a prime independent variable for lifetime and mortality analysis. In other cases, feature construction may not be so obvious. [https://www.infoworld.com/article/3394399/machine-learning-algorithms-explained.html Machine learning algorithms explained | Martin Heller - InfoWorld]
  
 
{|<!-- T -->
 
{|<!-- T -->
Line 60: Line 60:
 
<youtube>WVclIFyCCOo</youtube>
 
<youtube>WVclIFyCCOo</youtube>
 
<b>Visualize your Data with Facets
 
<b>Visualize your Data with Facets
</b><br>In this episode of AI Adventures, Yufeng explains how to use Facets, a project from Google Research, to visualize your dataset, find interesting relationships, and clean your data for machine learning.  Learn more through our hands-on labs → http://goo.gle/38ZUlTD  Associated Medium post "Visualize your data with Facets": http://goo.gl/7FDWwk  Get Facets on GitHub: http://goo.gl/Xi8dTu
+
</b><br>In this episode of AI Adventures, Yufeng explains how to use Facets, a project from Google Research, to visualize your dataset, find interesting relationships, and clean your data for machine learning.  Learn more through our hands-on labs → https://goo.gle/38ZUlTD  Associated Medium post "Visualize your data with Facets": https://goo.gl/7FDWwk  Get Facets on GitHub: https://goo.gl/Xi8dTu
Play with Facets in the browser: http://goo.gl/fFLCEV    Watch more AI Adventures on the playlist: http://goo.gl/UC5usG  Subscribe to get all the episodes as they come out: http://goo.gl/S0AS51  #AIAdventures
+
Play with Facets in the browser: https://goo.gl/fFLCEV    Watch more AI Adventures on the playlist: https://goo.gl/UC5usG  Subscribe to get all the episodes as they come out: https://goo.gl/S0AS51  #AIAdventures
 
|}
 
|}
 
|}<!-- B -->
 
|}<!-- B -->
Line 70: Line 70:
 
<youtube>KvZ2KSxlWBY</youtube>
 
<youtube>KvZ2KSxlWBY</youtube>
 
<b>Stephen Elston - Data Visualization and Exploration with [[Python]]
 
<b>Stephen Elston - Data Visualization and Exploration with [[Python]]
</b><br>Visualization is an essential method in any data scientist’s toolbox and is a key data exploration method and is a powerful tool for presentation of results and understanding problems with analytics. Attendees are introduced to [[Python]] visualization packages, Matplotlib, Pandas, and Seaborn. [http://github.com/StephenElston/ExploringDataWithPython The Jupyter notebook]  Visualization of complex real-world datasets presents a number of challenges to data scientists. By developing skills in data visualization, data scientists can confidently explore and understand the relationships in complex data sets. Using the [[Python]] matplotlib, pandas plotting and seaborn packages attendees will learn to: • Explore complex data sets with visualization, to develop understanding of the inherent relationships. • Create multiple views of data to highlight different aspects of the inherent relationships, with different graph types. • Use plot aesthetics to project multiple dimensions. • Apply conditioning or faceting methods to project multiple dimensions  www.pydata.org
+
</b><br>Visualization is an essential method in any data scientist’s toolbox and is a key data exploration method and is a powerful tool for presentation of results and understanding problems with analytics. Attendees are introduced to [[Python]] visualization packages, Matplotlib, Pandas, and Seaborn. [https://github.com/StephenElston/ExploringDataWithPython The Jupyter notebook]  Visualization of complex real-world datasets presents a number of challenges to data scientists. By developing skills in data visualization, data scientists can confidently explore and understand the relationships in complex data sets. Using the [[Python]] matplotlib, pandas plotting and seaborn packages attendees will learn to: • Explore complex data sets with visualization, to develop understanding of the inherent relationships. • Create multiple views of data to highlight different aspects of the inherent relationships, with different graph types. • Use plot aesthetics to project multiple dimensions. • Apply conditioning or faceting methods to project multiple dimensions  www.pydata.org
 
|}
 
|}
 
|<!-- M -->
 
|<!-- M -->
Line 84: Line 84:
  
 
= <span id="Feature Selection"></span>Feature Selection =
 
= <span id="Feature Selection"></span>Feature Selection =
[http://www.youtube.com/results?search_query=Feature+Selection+machine+learning+ML YouTube search...]
+
[https://www.youtube.com/results?search_query=Feature+Selection+machine+learning+ML YouTube search...]
[http://www.google.com/search?q=Feature+Selection+machine+learning+ML ...Google search]
+
[https://www.google.com/search?q=Feature+Selection+machine+learning+ML ...Google search]
  
* [http://www.datacamp.com/community/tutorials/feature-selection-python Beginner's Guide to Feature Selection in Python | Sayak Paul]  ...Learn about the basics of feature selection and how to implement and investigate various feature selection techniques in [[Python]]
+
* [https://www.datacamp.com/community/tutorials/feature-selection-python Beginner's Guide to Feature Selection in Python | Sayak Paul]  ...Learn about the basics of feature selection and how to implement and investigate various feature selection techniques in [[Python]]
* [http://machinelearningmastery.com/feature-selection-machine-learning-python/ Feature Selection For Machine Learning in Python | Jason Brownlee]
+
* [https://machinelearningmastery.com/feature-selection-machine-learning-python/ Feature Selection For Machine Learning in Python | Jason Brownlee]
* [http://machinelearningmastery.com/feature-selection-with-categorical-data/ How to Perform Feature Selection with Categorical Data | Jason Brownlee]
+
* [https://machinelearningmastery.com/feature-selection-with-categorical-data/ How to Perform Feature Selection with Categorical Data | Jason Brownlee]
  
 
{|<!-- T -->
 
{|<!-- T -->

Revision as of 19:27, 28 January 2023

YouTube search... ...Google search

A feature is an individual measurable property or characteristic of a phenomenon being observed. The concept of a “feature” is related to that of an explanatory variable, which is used in statistical techniques such as linear regression. Feature vectors combine all of the features for a single row into a numerical vector. Part of the art of choosing features is to pick a minimum set of independent variables that explain the problem. If two variables are highly correlated, either they need to be combined into a single feature, or one should be dropped. Sometimes people perform principal component analysis to convert correlated variables into a set of linearly uncorrelated variables. Some of the transformations that people use to construct new features or reduce the dimensionality of feature vectors are simple. For example, subtract Year of Birth from Year of Death and you construct Age at Death, which is a prime independent variable for lifetime and mortality analysis. In other cases, feature construction may not be so obvious. Machine learning algorithms explained | Martin Heller - InfoWorld

AI Explained: Feature Importance
Fiddler Labs Learn more about feature importance, the different techniques, and the pros and cons of each. #ExplainableAI

Visualize your Data with Facets
In this episode of AI Adventures, Yufeng explains how to use Facets, a project from Google Research, to visualize your dataset, find interesting relationships, and clean your data for machine learning. Learn more through our hands-on labs → https://goo.gle/38ZUlTD Associated Medium post "Visualize your data with Facets": https://goo.gl/7FDWwk Get Facets on GitHub: https://goo.gl/Xi8dTu Play with Facets in the browser: https://goo.gl/fFLCEV Watch more AI Adventures on the playlist: https://goo.gl/UC5usG Subscribe to get all the episodes as they come out: https://goo.gl/S0AS51 #AIAdventures

Stephen Elston - Data Visualization and Exploration with Python
Visualization is an essential method in any data scientist’s toolbox and is a key data exploration method and is a powerful tool for presentation of results and understanding problems with analytics. Attendees are introduced to Python visualization packages, Matplotlib, Pandas, and Seaborn. The Jupyter notebook Visualization of complex real-world datasets presents a number of challenges to data scientists. By developing skills in data visualization, data scientists can confidently explore and understand the relationships in complex data sets. Using the Python matplotlib, pandas plotting and seaborn packages attendees will learn to: • Explore complex data sets with visualization, to develop understanding of the inherent relationships. • Create multiple views of data to highlight different aspects of the inherent relationships, with different graph types. • Use plot aesthetics to project multiple dimensions. • Apply conditioning or faceting methods to project multiple dimensions www.pydata.org

The Best Way to Visualize a Dataset Easily
Siraj Raval In this video, we'll visualize a dataset of body metrics collected by giving people a fitness tracking device. We'll go over the steps necessary to preprocess the data, then use a technique called T-SNE to reduce the dimensionality of our data so we can visualize it.

Feature Selection

YouTube search... ...Google search

Pre-Modeling: Data Preprocessing and Feature Exploration in Python
April Chen Data preprocessing and feature exploration are crucial steps in a modeling workflow. In this tutorial, I will demonstrate how to use Python libraries such as scikit-learn, statsmodels, and matplotlib to perform pre-modeling steps. Topics that will be covered include: missing values, variable types, outlier detection, multicollinearity, interaction terms, and visualizing variable distributions. Finally, I will show the impact of utilizing these techniques on model performance. Interactive Jupyter notebooks will be provided.

Recent Advances in Feature Selection: A Data Perspective part 1
Authors: Huan Liu, Department of Computer Science and Engineering, Arizona State University Jundong Li, School of Computing, Informatics and Decision Systems Engineering, Arizona State University Jiliang Tang, Department of Computer Science and Engineering, Michigan State University Feature selection, as a data preprocessing strategy, is imperative in preparing high-dimensional data for myriad of data mining and machine learning tasks. By selecting a subset of features of high quality, feature selection can help build simpler and more comprehensive models, improve data mining performance, and prepare clean and understandable data. The proliferation of big data in recent years has presented substantial challenges and opportunities for feature selection research. In this tutorial, we provide a comprehensive overview of recent advances in feature selection research from a data perspective. After we introduce some basic concepts, we review state-of-the-art feature selection algorithms and recent techniques of feature selection for structured, social, heterogeneous, and streaming data. In particular, we also discuss what the role of feature selection is in the context of deep learning and how feature selection is related to feature engineering. To facilitate and promote the research in this community, we present an open-source feature selection repository scikit-feature that consists of most of the popular feature selection algorithms. We conclude our discussion with some open problems and pressing issues in future research.

Alexandru Agachi - Introductory tutorial on data exploration and statistical models
This tutorial will focus on analyzing a dataset and building statistical models from it. We will describe and visualize the data. We will then build and analyze statistical models, including linear and logistic regression, as well as chi-square tests of independence. We will then apply 4 machine learning techniques to the dataset: decision trees, random forests, lasso regression, and clustering. I would be happy to conduct an introductory level tutorial on exploring a dataset with the pandas/StatsModels/scikit-learn framework: 1. Descriptive statistics. Here we will describe each variable depending on its type, as well as the dataset overall. 2. Visualization for categorical and quantitative variables. We will learn effective visualization techniques for each type of variable in the dataset. 3. Statistical modeling for quantitative and categorical, explanatory and response variables: chi-square tests of independence, linear regression and logistic regression. We will learn to test hypotheses, and to interpret our models, their strengths, and their limitations. 4. I will then expand to the application of machine learning techniques, including decision trees, random forests, lasso regression, and clustering. Here we will explore the advantages and disadvantages of each of these techniques, as well as apply them to the dataset. This would be a very applied, introductory tutorial, to the statistical exploration of a dataset and the building of statistical models from it. I would be happy to send you the ipython notebook for this tutorial as well. www.pydata.org

Feature Selection in Machine learning| Variable selection| Dimension Reduction
Feature selection is an important step in machine learning model building process. The performance of models depends in the following : Choice of algorithm Feature Selection

How do I select features for Machine Learning?
Selecting the "best" features for your Machine Learning model will result in a better performing, easier to understand, and faster running model. But how do you know which features to select? In this video, I'll discuss 7 feature selection tactics used by the pros that you can apply to your own model. At the end, I'll give you my top 3 tips for effective feature selection.

Lecture 15.6 — Anomaly Detection | Choosing What Features To Use — Andrew Ng
Artificial Intelligence - All in One


Sparse Coding - Feature Extraction

Neural networks [8.1] : Sparse coding - definition
Hugo Larochelle

Neural networks [8.8] : Sparse coding - feature extraction
Hugo Larochelle