Difference between revisions of "Feature Exploration/Learning"
m (→Feature Store) |
m |
||
(3 intermediate revisions by the same user not shown) | |||
Line 30: | Line 30: | ||
* [[Excel]] ... [[LangChain#Documents|Documents]] ... [[Database|Database; Vector & Relational]] ... [[Graph]] ... [[LlamaIndex]] | * [[Excel]] ... [[LangChain#Documents|Documents]] ... [[Database|Database; Vector & Relational]] ... [[Graph]] ... [[LlamaIndex]] | ||
* [[Analytics]] ... [[Visualization]] ... [[Graphical Tools for Modeling AI Components|Graphical Tools]] ... [[Diagrams for Business Analysis|Diagrams]] & [[Generative AI for Business Analysis|Business Analysis]] ... [[Requirements Management|Requirements]] ... [[Loop]] ... [[Bayes]] ... [[Network Pattern]] | * [[Analytics]] ... [[Visualization]] ... [[Graphical Tools for Modeling AI Components|Graphical Tools]] ... [[Diagrams for Business Analysis|Diagrams]] & [[Generative AI for Business Analysis|Business Analysis]] ... [[Requirements Management|Requirements]] ... [[Loop]] ... [[Bayes]] ... [[Network Pattern]] | ||
− | * [[Development]] ... [[Notebooks]] ... [[Development#AI Pair Programming Tools|AI Pair Programming]] ... [[Codeless Options, Code Generators, Drag n' Drop|Codeless | + | * [[Development]] ... [[Notebooks]] ... [[Development#AI Pair Programming Tools|AI Pair Programming]] ... [[Codeless Options, Code Generators, Drag n' Drop|Codeless]] ... [[Hugging Face]] ... [[Algorithm Administration#AIOps/MLOps|AIOps/MLOps]] ... [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)|AIaaS/MLaaS]] |
* [https://en.wikipedia.org/wiki/Feature_selection Feature selection | Wikipedia] | * [https://en.wikipedia.org/wiki/Feature_selection Feature selection | Wikipedia] | ||
* [https://www.kdnuggets.com/2018/10/notes-feature-preprocessing-what-why-how.html Notes on Feature Preprocessing: The What, the Why, and the How | Matthew Mayo - KDnuggets] | * [https://www.kdnuggets.com/2018/10/notes-feature-preprocessing-what-why-how.html Notes on Feature Preprocessing: The What, the Why, and the How | Matthew Mayo - KDnuggets] | ||
Line 76: | Line 76: | ||
− | Offerings | + | === Offerings === |
* <b>Continual Feature Store</b>: Open source, designed for real-time machine learning | * <b>Continual Feature Store</b>: Open source, designed for real-time machine learning | ||
Line 93: | Line 93: | ||
− | Use Case | + | == Use Case == |
+ | You could store these features in a feature store. This would make it easy to reuse the features for different machine learning models, and it would also make it easier to manage the features over time. For example, if you wanted to add a new feature, such as the student's participation in math class, you could simply add it to the feature store. | ||
Here are some other examples of when you might use a feature store: | Here are some other examples of when you might use a feature store: | ||
Line 101: | Line 102: | ||
* You are building a machine learning model to detect fraud. You could use a feature store to store features such as the customer's transaction history, their device information, and their location. | * You are building a machine learning model to detect fraud. You could use a feature store to store features such as the customer's transaction history, their device information, and their location. | ||
+ | To use a feature store API, you would first need to create an account with the feature store provider. Once you have an account, you can use the API to create and manage features, as well as to query and serve features to your machine learning models. | ||
+ | Most feature store APIs provide the following functionality: | ||
+ | |||
+ | * <b>Feature management</b>: Create, read, update, and delete features. | ||
+ | * <b>Feature transformation</b>: Preprocess and transform features before serving them to machine learning models. | ||
+ | * <b>Feature serving</b>: Serve features to machine learning models in real time or in batches. | ||
+ | |||
+ | |||
+ | To use the feature store API, you would typically send HTTP requests to the feature store server. The requests would specify the operation you want to perform (e.g., create a feature, query features, or serve features), as well as the relevant parameters. | ||
+ | |||
+ | For example, the following HTTP request would create a new feature called customer_id: | ||
+ | |||
+ | POST /features HTTP/1.1 | ||
+ | Host: featurestore.example.com | ||
+ | Content-Type: application/json | ||
+ | |||
+ | { | ||
+ | "name": "customer_id", | ||
+ | "type": "string" | ||
+ | } | ||
+ | The following HTTP request would query the feature store for the customer_id feature for a specific customer: | ||
+ | |||
+ | GET /features/customer_id?customer_id=12345 HTTP/1.1 | ||
+ | Host: featurestore.example.com | ||
+ | The following HTTP request would serve the customer_id feature for a list of customers to a machine learning model: | ||
+ | |||
+ | POST /features/customer_id/serve HTTP/1.1 | ||
+ | Host: featurestore.example.com | ||
+ | Content-Type: application/json | ||
+ | |||
+ | { | ||
+ | "customer_ids": [12345, 67890, 24680] | ||
+ | } | ||
+ | The feature store API would then return the appropriate response, depending on the operation you requested. For example, if you created a new feature, the API would return a confirmation message. If you queried the feature store for a feature, the API would return the value of the feature. If you served the feature to a machine learning model, the API would return a list of feature values. | ||
+ | |||
+ | Feature store APIs are a powerful tool for developing and deploying machine learning models. By using a feature store API, you can improve the quality, efficiency, and scalability of your machine learning development and deployment. | ||
+ | |||
+ | Here are some additional tips for using a feature store API: | ||
+ | |||
+ | * Use the API documentation to learn about the specific features and functionality that are available. | ||
+ | * Start by using the API to perform basic operations, such as creating and reading features. | ||
+ | * Once you have a good understanding of the API, you can start using it to perform more complex operations, such as transforming and serving features. | ||
+ | * If you have any questions or problems using the API, contact the feature store provider for support. | ||
+ | |||
+ | = Feature Exploration = | ||
{|<!-- T --> | {|<!-- T --> | ||
| valign="top" | | | valign="top" | | ||
Line 160: | Line 206: | ||
|| | || | ||
<youtube>_XOKz5VlTQY</youtube> | <youtube>_XOKz5VlTQY</youtube> | ||
− | <b>Recent Advances in Feature Selection: A Data Perspective part 1 | + | <b>Recent Advances in Feature Selection: A Data [[Perspective]] part 1 |
− | </b><br>Authors: Huan Liu, Department of Computer Science and Engineering, Arizona State University Jundong Li, School of Computing, Informatics and Decision Systems Engineering, Arizona State University Jiliang Tang, Department of Computer Science and Engineering, Michigan State University Feature selection, as a data preprocessing strategy, is imperative in preparing high-dimensional data for myriad of data mining and machine learning tasks. By selecting a subset of features of high quality, feature selection can help build simpler and more comprehensive models, improve data mining performance, and prepare clean and understandable data. The proliferation of big data in recent years has presented substantial challenges and opportunities for feature selection research. In this tutorial, we provide a comprehensive overview of recent advances in feature selection research from a data perspective. After we introduce some basic concepts, we review state-of-the-art feature selection algorithms and recent techniques of feature selection for structured, social, heterogeneous, and streaming data. In particular, we also discuss what the role of feature selection is in the [[context]] of deep learning and how feature selection is related to feature engineering. To facilitate and promote the research in this community, we present an open-source feature selection repository scikit-feature that consists of most of the popular feature selection algorithms. We conclude our discussion with some open problems and pressing issues in future research. | + | </b><br>Authors: Huan Liu, Department of Computer Science and Engineering, Arizona State University Jundong Li, School of Computing, Informatics and Decision Systems Engineering, Arizona State University Jiliang Tang, Department of Computer Science and Engineering, Michigan State University Feature selection, as a data preprocessing strategy, is imperative in preparing high-dimensional data for myriad of data mining and machine learning tasks. By selecting a subset of features of high quality, feature selection can help build simpler and more comprehensive models, improve data mining performance, and prepare clean and understandable data. The proliferation of big data in recent years has presented substantial challenges and opportunities for feature selection research. In this tutorial, we provide a comprehensive overview of recent advances in feature selection research from a data [[perspective]]. After we introduce some basic concepts, we review state-of-the-art feature selection algorithms and recent techniques of feature selection for structured, social, heterogeneous, and streaming data. In particular, we also discuss what the role of feature selection is in the [[context]] of deep learning and how feature selection is related to feature engineering. To facilitate and promote the research in this community, we present an open-source feature selection repository scikit-feature that consists of most of the popular feature selection algorithms. We conclude our discussion with some open problems and pressing issues in future research. |
|} | |} | ||
|}<!-- B --> | |}<!-- B --> |
Latest revision as of 15:56, 28 April 2024
YouTube ... Quora ...Google search ...Google News ...Bing News
- Data Science ... Governance ... Preprocessing ... Exploration ... Interoperability ... Master Data Management (MDM) ... Bias and Variances ... Benchmarks ... Datasets
- Data Quality ...validity, accuracy, cleaning, completeness, consistency, encoding, padding, augmentation, labeling, auto-tagging, normalization, standardization, and imbalanced data
- Evaluating Machine Learning Models
- Artificial General Intelligence (AGI) to Singularity ... Curious Reasoning ... Emergence ... Moonshots ... Explainable AI ... Automated Learning
- Recursive Feature Elimination (RFE)
- Principal Component Analysis (PCA)
- Representation Learning
- Managed Vocabularies
- Excel ... Documents ... Database; Vector & Relational ... Graph ... LlamaIndex
- Analytics ... Visualization ... Graphical Tools ... Diagrams & Business Analysis ... Requirements ... Loop ... Bayes ... Network Pattern
- Development ... Notebooks ... AI Pair Programming ... Codeless ... Hugging Face ... AIOps/MLOps ... AIaaS/MLaaS
- Feature selection | Wikipedia
- Notes on Feature Preprocessing: The What, the Why, and the How | Matthew Mayo - KDnuggets
- Feature Engineering and Selection: A Practical Approach for Predictive Models | Max Kuhn and Kjell Johnson
- Jon Tupitza's Famous Jupyter Notebooks:
- AI Governance / Algorithm Administration
- Tools:
A feature is an individual measurable property or characteristic of a phenomenon being observed. The concept of a “feature” is related to that of an explanatory variable, which is used in statistical techniques such as linear regression. Feature vectors combine all of the features for a single row into a numerical vector. Part of the art of choosing features is to pick a minimum set of independent variables that explain the problem. If two variables are highly correlated, either they need to be combined into a single feature, or one should be dropped. Sometimes people perform principal component analysis to convert correlated variables into a set of linearly uncorrelated variables. Some of the transformations that people use to construct new features or reduce the dimensionality of feature vectors are simple. For example, subtract Year of Birth from Year of Death and you construct Age at Death, which is a prime independent variable for lifetime and mortality analysis. In other cases, feature construction may not be so obvious. Machine learning algorithms explained | Martin Heller - InfoWorld
Contents
Feature Examples
For example, if you were building a machine learning model to predict whether someone would like a particular movie, you might use features like the person's age, gender, and favorite genres of movies. You might also use features about the movie itself, such as the genre, director, and rating. Features are important because they allow machine learning models to learn about the world. By providing models with features, we can teach them to identify patterns and make predictions.
Here is an example of a feature in AI that a 7th grader might understand:
Imagine you are building a machine learning model to predict whether a student will pass or fail a math test. You might use the following features:
- The student's grades on previous math tests
- The student's attendance record in math class
- The student's homework completion rate
- The student's score on the math portion of the standardized test
Your machine learning model would learn to identify patterns in this data. For example, the model might learn that students who have high grades on previous math tests and good attendance are more likely to pass the test. The model could also learn that students who miss a lot of class or have incomplete homework are more likely to fail the test. Once your machine learning model is trained, you can use it to predict whether a new student is likely to pass or fail the math test. You can do this by providing the model with the student's features, such as their grades on previous math tests and their attendance record. The model will then use this information to make a prediction.
Feature Store
A feature store in AI is a system for managing and serving features to machine learning models. Features are measurable pieces of data that can be used to train and evaluate models. Feature stores provide a central repository for features, making them easier to discover, reuse, and manage. Feature stores are important because they can help to improve the quality, efficiency, and scalability of machine learning development and deployment. For example, feature stores can help to:
- Reduce the time and effort required to develop and maintain machine learning models
- Improve the performance and accuracy of machine learning models
- Make machine learning models more reproducible and scalable
- Ensure that machine learning models are using consistent and up-to-date data
Offerings
- Continual Feature Store: Open source, designed for real-time machine learning
- Databricks Feature Store: Fully integrated with Databricks
- FEAST: Open source, cloud-native, scalable and performant
- FeatureBase: Commercial, offered by Google Cloud, easy to use
- Feathr: Commercial, offered by AWS, scalable and performant
- Hopsworks Feature Store: Open source, versatile, offers open APIs
- Jukebox Feature Store: Commercial, cloud-based, designed for real-time serving
- Metarank Feature Store: Commercial, cloud-based, designed for machine learning ranking
- Microsoft Azure Feature Store: Commercial, offered by Microsoft Azure, easy to use
- Nexus Feature Store: Commercial, cloud-based, designed for large-scale machine learning
- Salesforce Einstein Feature Store: Commercial, offered by Salesforce, easy to use
- Vertex AI: Commercial, offered by Google Cloud, part of Vertex AI platform
- Amazon SageMaker Feature Store: Commercial, offered by AWS, scalable and performant
Use Case
You could store these features in a feature store. This would make it easy to reuse the features for different machine learning models, and it would also make it easier to manage the features over time. For example, if you wanted to add a new feature, such as the student's participation in math class, you could simply add it to the feature store.
Here are some other examples of when you might use a feature store:
- You are building a machine learning model to predict whether a customer will churn (cancel their subscription). You could use a feature store to store features such as the customer's past purchase history, their engagement with your product, and their support tickets.
- You are building a machine learning model to recommend products to customers. You could use a feature store to store features such as the customer's past purchase history, their browsing history, and their product ratings.
- You are building a machine learning model to detect fraud. You could use a feature store to store features such as the customer's transaction history, their device information, and their location.
To use a feature store API, you would first need to create an account with the feature store provider. Once you have an account, you can use the API to create and manage features, as well as to query and serve features to your machine learning models.
Most feature store APIs provide the following functionality:
- Feature management: Create, read, update, and delete features.
- Feature transformation: Preprocess and transform features before serving them to machine learning models.
- Feature serving: Serve features to machine learning models in real time or in batches.
To use the feature store API, you would typically send HTTP requests to the feature store server. The requests would specify the operation you want to perform (e.g., create a feature, query features, or serve features), as well as the relevant parameters.
For example, the following HTTP request would create a new feature called customer_id:
POST /features HTTP/1.1 Host: featurestore.example.com Content-Type: application/json
{
"name": "customer_id", "type": "string"
} The following HTTP request would query the feature store for the customer_id feature for a specific customer:
GET /features/customer_id?customer_id=12345 HTTP/1.1 Host: featurestore.example.com The following HTTP request would serve the customer_id feature for a list of customers to a machine learning model:
POST /features/customer_id/serve HTTP/1.1 Host: featurestore.example.com Content-Type: application/json
{
"customer_ids": [12345, 67890, 24680]
} The feature store API would then return the appropriate response, depending on the operation you requested. For example, if you created a new feature, the API would return a confirmation message. If you queried the feature store for a feature, the API would return the value of the feature. If you served the feature to a machine learning model, the API would return a list of feature values.
Feature store APIs are a powerful tool for developing and deploying machine learning models. By using a feature store API, you can improve the quality, efficiency, and scalability of your machine learning development and deployment.
Here are some additional tips for using a feature store API:
- Use the API documentation to learn about the specific features and functionality that are available.
- Start by using the API to perform basic operations, such as creating and reading features.
- Once you have a good understanding of the API, you can start using it to perform more complex operations, such as transforming and serving features.
- If you have any questions or problems using the API, contact the feature store provider for support.
Feature Exploration
|
|
|
|
Feature Selection
YouTube search... ...Google search
- Beginner's Guide to Feature Selection in Python | Sayak Paul ...Learn about the basics of feature selection and how to implement and investigate various feature selection techniques in Python
- Feature Selection For Machine Learning in Python | Jason Brownlee
- How to Perform Feature Selection with Categorical Data | Jason Brownlee
|
|
|
|
|
|
Sparse Coding - Feature Extraction
|
|