Difference between revisions of "Objective vs. Cost vs. Loss vs. Error Function"

From
Jump to: navigation, search
m
m
Line 14: Line 14:
 
</script>
 
</script>
 
}}
 
}}
[http://www.youtube.com/results?search_query=objective+function+cost+loss YouTube Search]
+
[https://www.youtube.com/results?search_query=ai+objective+cost+loss+error+function YouTube]
[http://www.google.com/search?q=objective+function+cost+loss+deep+machine+learning+ML+artificial+intelligence ...Google search]
+
[https://www.quora.com/search?q=ai%20objective%20cost%20loss%20error%20function ... Quora]
 +
[https://www.google.com/search?q=ai+objective+cost+loss+error+function ... Google search]
 +
[https://news.google.com/search?q=ai+objective+cost+loss+error+function ... Google News]
 +
[https://www.bing.com/news/search?q=ai+objective+cost+loss+error+function&qft=interval%3d%228%22 ... Bing News]
  
 
* [[Backpropagation]] ... [[Feed Forward Neural Network (FF or FFNN)|FFNN]] ... [[Forward-Forward]] ... [[Activation Functions]] ...[[Softmax]] ... [[Loss]] ... [[Boosting]] ... [[Gradient Descent Optimization & Challenges|Gradient Descent]] ... [[Algorithm Administration#Hyperparameter|Hyperparameter]] ... [[Manifold Hypothesis]] ... [[Principal Component Analysis (PCA)|PCA]]
 
* [[Backpropagation]] ... [[Feed Forward Neural Network (FF or FFNN)|FFNN]] ... [[Forward-Forward]] ... [[Activation Functions]] ...[[Softmax]] ... [[Loss]] ... [[Boosting]] ... [[Gradient Descent Optimization & Challenges|Gradient Descent]] ... [[Algorithm Administration#Hyperparameter|Hyperparameter]] ... [[Manifold Hypothesis]] ... [[Principal Component Analysis (PCA)|PCA]]
Line 26: Line 29:
  
  
"The function we want to minimize or maximize is called the objective function, or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function - these terms are synonymous. The cost function is used more in optimization problem and loss function is used in parameter estimation.
+
"The function we want to minimize or maximize is called the objective function, or criterion. When we are minimizing it, we may also call it the cost function, [[Loss|loss function]], or error function - these terms are synonymous. The cost function is used more in optimization problem and [[Loss|loss function]] is used in parameter estimation.
  
The loss function (or error) is for a single training example, while the cost function is over the entire training set (or mini-batch for mini-batch gradient descent). Therefore, a loss function is a part of a cost function which is a type of an objective function.  [http://stats.stackexchange.com/questions/179026/objective-function-cost-function-loss-function-are-they-the-same-thing Objective function, cost function, loss function: are they the same thing? | StackExchange]
+
The [[Loss|loss function (or error)]] is for a single training example, while the cost function is over the entire training set (or mini-batch for mini-batch gradient descent). Therefore, a [[loss]] function is a part of a cost function which is a type of an objective function.  [http://stats.stackexchange.com/questions/179026/objective-function-cost-function-loss-function-are-they-the-same-thing Objective function, cost function, loss function: are they the same thing? | StackExchange]
  
* <b>Loss function</b> is usually a function defined on a data point, prediction and label, and measures the penalty. For example:
+
* <b>[[Loss|Loss function]]</b> is usually a function defined on a data point, prediction and label, and measures the penalty. For example:
 
** square loss l(f(xi|θ),yi)=(f(xi|θ)−yi)2, used in linear [[Regression]]
 
** square loss l(f(xi|θ),yi)=(f(xi|θ)−yi)2, used in linear [[Regression]]
 
** hinge loss l(f(xi|θ),yi)=max(0,1−f(xi|θ)yi), used in SVM
 
** hinge loss l(f(xi|θ),yi)=max(0,1−f(xi|θ)yi), used in SVM
 
** 0/1 loss l(f(xi|θ),yi)=1⟺f(xi|θ)≠yi, used in theoretical analysis and definition of accuracy
 
** 0/1 loss l(f(xi|θ),yi)=1⟺f(xi|θ)≠yi, used in theoretical analysis and definition of accuracy
  
* <b>Cost function</b> is usually more general. It might be a sum of loss functions over your training set plus some model complexity penalty (regularization). For example:
+
* <b>Cost function</b> is usually more general. It might be a sum of [[Loss|loss functions]] over your training set plus some model complexity penalty (regularization). For example:
 
** Mean Squared Error MSE(θ)=1N∑Ni=1(f(xi|θ)−yi)2
 
** Mean Squared Error MSE(θ)=1N∑Ni=1(f(xi|θ)−yi)2
 
** SVM cost function SVM(θ)=∥θ∥2+C∑Ni=1ξi (there are additional constraints connecting ξi with C and with training set)
 
** SVM cost function SVM(θ)=∥θ∥2+C∑Ni=1ξi (there are additional constraints connecting ξi with C and with training set)
  
* <b>Objective function</b> is the most general term for any function that you optimize during training. For example, a probability of generating training set in maximum likelihood approach is a well defined objective function, but it is not a loss function nor cost function (however you could define an equivalent cost function). For example:
+
* <b>Objective function</b> is the most general term for any function that you optimize during training. For example, a probability of generating training set in maximum likelihood approach is a well defined objective function, but it is not a [[Loss|loss function]] nor cost function (however you could define an equivalent cost function). For example:
 
** MLE is a type of objective function (which you maximize)
 
** MLE is a type of objective function (which you maximize)
 
**Divergence between classes can be an objective function but it is barely a cost function, unless you define something artificial, like 1-Divergence, and name it a cost
 
**Divergence between classes can be an objective function but it is barely a cost function, unless you define something artificial, like 1-Divergence, and name it a cost
  
* <b>Error function</b> - [[Backpropagation]]; or automatic differentiation, is commonly used by the gradient descent optimization algorithm to adjust the [[Activation Functions#Weights|weight]] of neurons by calculating the gradient of the loss function. This technique is also sometimes called backward propagation of errors, because the error is calculated at the output and distributed back through the network layers. [http://en.wikipedia.org/wiki/Backpropagation Backpropagation | Wikipedia]
+
* <b>Error function</b> - [[Backpropagation]]; or automatic differentiation, is commonly used by the gradient descent optimization algorithm to adjust the [[Activation Functions#Weights|weight]] of neurons by calculating the gradient of the [[Loss|loss function]]. This technique is also sometimes called backward propagation of errors, because the error is calculated at the output and distributed back through the network layers. [http://en.wikipedia.org/wiki/Backpropagation Backpropagation | Wikipedia]
  
  

Revision as of 09:39, 31 August 2023

YouTube ... Quora ... Google search ... Google News ... Bing News


1*3MsFzl7zRZE3TihIC9JmaQ.png


"The function we want to minimize or maximize is called the objective function, or criterion. When we are minimizing it, we may also call it the cost function, loss function, or error function - these terms are synonymous. The cost function is used more in optimization problem and loss function is used in parameter estimation.

The loss function (or error) is for a single training example, while the cost function is over the entire training set (or mini-batch for mini-batch gradient descent). Therefore, a loss function is a part of a cost function which is a type of an objective function. Objective function, cost function, loss function: are they the same thing? | StackExchange

  • Loss function is usually a function defined on a data point, prediction and label, and measures the penalty. For example:
    • square loss l(f(xi|θ),yi)=(f(xi|θ)−yi)2, used in linear Regression
    • hinge loss l(f(xi|θ),yi)=max(0,1−f(xi|θ)yi), used in SVM
    • 0/1 loss l(f(xi|θ),yi)=1⟺f(xi|θ)≠yi, used in theoretical analysis and definition of accuracy
  • Cost function is usually more general. It might be a sum of loss functions over your training set plus some model complexity penalty (regularization). For example:
    • Mean Squared Error MSE(θ)=1N∑Ni=1(f(xi|θ)−yi)2
    • SVM cost function SVM(θ)=∥θ∥2+C∑Ni=1ξi (there are additional constraints connecting ξi with C and with training set)
  • Objective function is the most general term for any function that you optimize during training. For example, a probability of generating training set in maximum likelihood approach is a well defined objective function, but it is not a loss function nor cost function (however you could define an equivalent cost function). For example:
    • MLE is a type of objective function (which you maximize)
    • Divergence between classes can be an objective function but it is barely a cost function, unless you define something artificial, like 1-Divergence, and name it a cost
  • Error function - Backpropagation; or automatic differentiation, is commonly used by the gradient descent optimization algorithm to adjust the weight of neurons by calculating the gradient of the loss function. This technique is also sometimes called backward propagation of errors, because the error is calculated at the output and distributed back through the network layers. Backpropagation | Wikipedia


word-image-4.png