Difference between revisions of "Data Preprocessing"
m |
m |
||
| (17 intermediate revisions by the same user not shown) | |||
| Line 2: | Line 2: | ||
|title=PRIMO.ai | |title=PRIMO.ai | ||
|titlemode=append | |titlemode=append | ||
| − | |keywords=artificial, intelligence, machine, learning, models | + | |keywords=ChatGPT, artificial, intelligence, machine, learning, GPT-4, GPT-5, NLP, NLG, NLC, NLU, models, data, singularity, moonshot, Sentience, AGI, Emergence, Moonshot, Explainable, TensorFlow, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Hugging Face, OpenAI, Tensorflow, OpenAI, Google, Nvidia, Microsoft, Azure, Amazon, AWS, Meta, LLM, metaverse, assistants, agents, digital twin, IoT, Transhumanism, Immersive Reality, Generative AI, Conversational AI, Perplexity, Bing, You, Bard, Ernie, prompt Engineering LangChain, Video/Image, Vision, End-to-End Speech, Synthesize Speech, Speech Recognition, Stanford, MIT |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools |
| − | |description=Helpful resources for your journey with artificial intelligence; videos, articles, techniques, courses, profiles, and tools | + | |
| + | <!-- Google tag (gtag.js) --> | ||
| + | <script async src="https://www.googletagmanager.com/gtag/js?id=G-4GCWLBVJ7T"></script> | ||
| + | <script> | ||
| + | window.dataLayer = window.dataLayer || []; | ||
| + | function gtag(){dataLayer.push(arguments);} | ||
| + | gtag('js', new Date()); | ||
| + | |||
| + | gtag('config', 'G-4GCWLBVJ7T'); | ||
| + | </script> | ||
}} | }} | ||
| − | [https://www.youtube.com/results?search_query=Data+Preprocessing | + | [https://www.youtube.com/results?search_query=ai+Data+Preprocessing YouTube] |
| − | [https://www.google.com/search?q=Data+Preprocessing+ | + | [https://www.quora.com/search?q=ai%20Data%20Preprocessing ... Quora] |
| + | [https://www.google.com/search?q=ai+Data+Preprocessing ...Google search] | ||
| + | [https://news.google.com/search?q=ai+Data+Preprocessing ...Google News] | ||
| + | [https://www.bing.com/news/search?q=ai+Data+Preprocessing&qft=interval%3d%228%22 ...Bing News] | ||
| − | * [[ | + | * [[Data Science]] ... [[Data Governance|Governance]] ... [[Data Preprocessing|Preprocessing]] ... [[Feature Exploration/Learning|Exploration]] ... [[Data Interoperability|Interoperability]] ... [[Algorithm Administration#Master Data Management (MDM)|Master Data Management (MDM)]] ... [[Bias and Variances]] ... [[Benchmarks]] ... [[Datasets]] |
| − | + | * [[Data Quality]] ...[[AI Verification and Validation|validity]], [[Evaluation - Measures#Accuracy|accuracy]], [[Data Quality#Data Cleaning|cleaning]], [[Data Quality#Data Completeness|completeness]], [[Data Quality#Data Consistency|consistency]], [[Data Quality#Data Encoding|encoding]], [[Data Quality#Zero Padding|padding]], [[Data Quality#Data Augmentation, Data Labeling, and Auto-Tagging|augmentation, labeling, auto-tagging]], [[Data Quality#Batch Norm(alization) & Standardization| normalization, standardization]], and [[Data Quality#Imbalanced Data|imbalanced data]] | |
| − | + | * [[Risk, Compliance and Regulation]] ... [[Ethics]] ... [[Privacy]] ... [[Law]] ... [[AI Governance]] ... [[AI Verification and Validation]] | |
| − | + | * [[Natural Language Processing (NLP)#Managed Vocabularies |Managed Vocabularies]] | |
| − | + | * [[Excel]] ... [[LangChain#Documents|Documents]] ... [[Database|Database; Vector & Relational]] ... [[Graph]] ... [[LlamaIndex]] | |
| − | + | * [[Analytics]] ... [[Visualization]] ... [[Graphical Tools for Modeling AI Components|Graphical Tools]] ... [[Diagrams for Business Analysis|Diagrams]] & [[Generative AI for Business Analysis|Business Analysis]] ... [[Requirements Management|Requirements]] ... [[Loop]] ... [[Bayes]] ... [[Network Pattern]] | |
| − | + | * [[Development]] ... [[Notebooks]] ... [[Development#AI Pair Programming Tools|AI Pair Programming]] ... [[Codeless Options, Code Generators, Drag n' Drop|Codeless]] ... [[Hugging Face]] ... [[Algorithm Administration#AIOps/MLOps|AIOps/MLOps]] ... [[Platforms: AI/Machine Learning as a Service (AIaaS/MLaaS)|AIaaS/MLaaS]] | |
| − | |||
| − | |||
| − | * | ||
| − | * | ||
| − | |||
| − | |||
| − | * [[ | ||
* [[Algorithm Administration#Hyperparameter|Hyperparameter]]s | * [[Algorithm Administration#Hyperparameter|Hyperparameter]]s | ||
| − | * [[Evaluation]] | + | * [[Strategy & Tactics]] ... [[Project Management]] ... [[Best Practices]] ... [[Checklists]] ... [[Project Check-in]] ... [[Evaluation]] ... [[Evaluation - Measures|Measures]] |
| − | + | * [[AI Solver]] ... [[Algorithms]] ... [[Algorithm Administration|Administration]] ... [[Model Search]] ... [[Discriminative vs. Generative]] ... [[Train, Validate, and Test]] | |
| − | * [[Train, Validate, and Test]] | + | * [[Python]] ... [[Generative AI with Python|GenAI w/ Python]] ... [[JavaScript]] ... [[Generative AI with JavaScript|GenAI w/ JavaScript]] ... [[TensorFlow]] ... [[PyTorch]] |
| − | * [[Python]] | ||
* [https://scale.com/ Scale] ... data collection, curation, labeling, and annotation | * [https://scale.com/ Scale] ... data collection, curation, labeling, and annotation | ||
* [https://scikit-learn.org/stable/modules/preprocessing.html Sklearn.preprocessing] | * [https://scikit-learn.org/stable/modules/preprocessing.html Sklearn.preprocessing] | ||
Latest revision as of 20:30, 26 April 2024
YouTube ... Quora ...Google search ...Google News ...Bing News
- Data Science ... Governance ... Preprocessing ... Exploration ... Interoperability ... Master Data Management (MDM) ... Bias and Variances ... Benchmarks ... Datasets
- Data Quality ...validity, accuracy, cleaning, completeness, consistency, encoding, padding, augmentation, labeling, auto-tagging, normalization, standardization, and imbalanced data
- Risk, Compliance and Regulation ... Ethics ... Privacy ... Law ... AI Governance ... AI Verification and Validation
- Managed Vocabularies
- Excel ... Documents ... Database; Vector & Relational ... Graph ... LlamaIndex
- Analytics ... Visualization ... Graphical Tools ... Diagrams & Business Analysis ... Requirements ... Loop ... Bayes ... Network Pattern
- Development ... Notebooks ... AI Pair Programming ... Codeless ... Hugging Face ... AIOps/MLOps ... AIaaS/MLaaS
- Hyperparameters
- Strategy & Tactics ... Project Management ... Best Practices ... Checklists ... Project Check-in ... Evaluation ... Measures
- AI Solver ... Algorithms ... Administration ... Model Search ... Discriminative vs. Generative ... Train, Validate, and Test
- Python ... GenAI w/ Python ... JavaScript ... GenAI w/ JavaScript ... TensorFlow ... PyTorch
- Scale ... data collection, curation, labeling, and annotation
- Sklearn.preprocessing
- The Passenger Screening Kaggle challenge 1st place solution was won in part due to data preparation/generation.
- Data Pre Processing Techniques You Should Know | Maneesha Rajaratne - Towards Data Science
- Machine Learning(ML) — Data Preprocessing | Raji Adam Bifola
- Most Influential Data Preprocessing Algorithms | S. García, J. Luengo, F. Herrera
- How to fix an Unbalanced Dataset | Will Badr - Amazon Web Services
- Creating and Using Datasources | Amazon Web Services
- Jon Tupitza Famous Jupyter Notebooks:
- The COVID Tracking Project - software used
Contents
Splitting Data - training and testing sets
Time-Series Data
- Backtesting
- Time-based Algorithms
- A Comparison of Time Series Databases and Netsil’s Use of Druid | Netsil
- Microsoft announces the general availability of Azure Time Series Insights | Ryan Waite - Microsoft
- Top 10 Time Series Databases | Outlyer
Categorical Variables
Categorical variables require special attention in regression analysis because, unlike dichotomous or continuous variables, they cannot by entered into the regression equation just as they are. Instead, they need to be recoded into a series of variables which can then be entered into the regression model. There are a variety of coding systems that can be used when recoding categorical variables. Coding Systems for Categorical Variables In Regression Analysis | UCLA institute for Digital Research & Education Statistical Consulting
SQL Database Optimization