Difference between revisions of "Tree-based..."

From
Jump to: navigation, search
Line 4: Line 4:
 
* [[Capabilities]]  
 
* [[Capabilities]]  
  
Tree-based models are a supervised machine learning method commonly used in soil survey and ecology for exploratory data analysis and prediction due to their simplistic nonparametric design. Instead of fitting a model to the data, tree-based models recursively partition the data into increasingly homogenous groups based on values that minimize a loss function (such as Sum of Squared Errors (SSE) for regression or Gini Index for classification) (McBratney et al.,2013). The two most common packages for generating tree-based models in R are rpart and randomForest. The rpart package creates a regression or classification tree based on binary splits that maximize homogeneity and minimize impurity. The output is a single decision tree that can be further “pruned” or trimmed back using the cross-validation error statistic to reduce over-fitting. The randomForest package is similar to rpart, but is double random in that each node is split using a random subset of predictors AND observations at each node and this process is repeated hundreds of times (as specified by the user). Unlike rpart, random forests do not produce a graphical decision tree since the predictions are averaged across hundreds or thousands of trees. Instead, random forests produce a variable importance plot and a tabular statistical summary. [http://ncss-tech.github.io/stats_for_soil_survey/chapters/8_Tree_models/treemodels.html Tree-based Models | Katey Yoast]
+
Decision forests (regression, two-class, and multi class), decision jungles (two-class and multi class), and boosted decision trees (regression and two-class) are all based on decision trees, a foundation machine learning concept. There are many variants of decision trees, but they all do the same thing—subdivide the feature space into regions with mostly the same label. These can be regions of consistent category or of constant value, depending on whether you are doing classification or regression. - Dinesh Chandrasekar
  
 
https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/16533dca42cafce4b00d224727dc5d977ef7d67e/8-Figure3-1.png
 
https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/16533dca42cafce4b00d224727dc5d977ef7d67e/8-Figure3-1.png

Revision as of 20:00, 3 June 2018

YouTube search...

Decision forests (regression, two-class, and multi class), decision jungles (two-class and multi class), and boosted decision trees (regression and two-class) are all based on decision trees, a foundation machine learning concept. There are many variants of decision trees, but they all do the same thing—subdivide the feature space into regions with mostly the same label. These can be regions of consistent category or of constant value, depending on whether you are doing classification or regression. - Dinesh Chandrasekar

8-Figure3-1.png