0

What is the error function? Is it the same as the cost function?

Is the error function known or unknown?

When I get the outcome of a neural net I compare it with the target value. The difference between both is called the error. When I get mutiple error values e.g. when I pass a batch through the NN I will get as many error value as the size of my batch. Is the error function the plot of the points? If yes, to me the error function would be unknown. I would only know some point on the graph of the error function.

nbro
  • 39,006
  • 12
  • 98
  • 176
MScott
  • 445
  • 4
  • 12

1 Answers1

2

In deep learning, the error function is sometimes known as loss function or cost function (but I do not exclude the possibility that these terms/expressions have also been used to refer to different although related functions, so you should take into account your context).

In statistical learning theory, the function that you want to minimize is known as expected risk

\begin{align} R(\theta) &= \int L(y, f_{\theta}(x)) d P(x, y) \\ &= \mathbb{E}_{P(x, y)} \left[ L(y, f_{\theta}(x)) \right], \end{align} where $\theta$ are the parameters of your model $f$, and the function that you actually minimize is the empirical risk given the dataset $D = \{ (x_i, y_i) \}_{i=1}^n$,

$$ R_{\mathrm{emp}}(\theta)=\frac{1}{n} \sum_{i=1}^{n}L(y_i, f_{\theta}(x_i)), $$ which is a generalization of the commonly used cost functions, such as mean squared error (MSE) or binary cross-entropy. For example, in the case of MSE, $L = \left(y_{i}-f_\theta(x_i)\right)^{2}$, which can be called the loss function (though this terminology may not be standard).

You optimize the empirical risk, a proxy for the expected risk, because the expected risk is incomputable (given that $P(x, y)$ is generally unknown), so, in this sense, the expected risk is unknown.

If you use the term error function to refer to the expected risk, then, yes, the error function is typically unknown, but error function is typically used to refer to an instance of the empirical risk, so, in that case, it is known and computable.

Note that I purposely used the term loss function above to refer to $L$ and the term cost function to refer to the empirical risks, such as the MSE (i.e., in this case, I did not use loss function and cost function as synonyms), which shows that terminology is not always used consistently, so take into account your context.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • I am not sure if this answers your question, given that I don't fully understand your doubts. If not, please, let me know. – nbro Aug 22 '20 at 17:32
  • I got a little confused when I looked up all the explanations on how the error is considered. E.g. when the error function is mentioned sth like E=(y'-y)^2 is meant. and a parabola is shown. Sometimes however (maybe in another context) I see a sort of scatter plot that just shows how big an error is (for every entry maybe in chronological order as the batch is passed through the net). I am just asking what is what. – MScott Aug 22 '20 at 18:29
  • @MScott As I said, sometimes it depends on the context. $(y' - y)^2$ could be called the **error function** and the sum of all errors could be called the **empirical risk** or often in the context of deep learning also the **error function** (yes, error function again!). – nbro Aug 22 '20 at 19:06