6

By reading the abstract of Neural Networks and Statistical Models paper it would seem that ANNs are statistical models.

In contrast Machine Learning is not just glorified Statistics.

I am looking for a more concise/summarized answer with focus on ANNs.

nbro
  • 39,006
  • 12
  • 98
  • 176
Leo Gallucci
  • 216
  • 2
  • 8

3 Answers3

5

According to Wikipedia:

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, the data-generating process.

Answer to your question:

To build any neural network model we assume the train, test and validation data are coming from a probability distribution. So, if you produce a neural network model based on statistical data then the network is a statistical model.

Moreover, neural networks' cost function is generally a parametric model and parametric modes are statistical models.

Please look at Goodfellow's Deep Learning book chapter Deep Feedforward Networks page 174 and 175.

From Goodfellow's book

Fortunately, the cost functions for neural networks are more or less the same as those for other parametric models, such as linear models. In most cases, our parametric model defines a distribution $p(y \mid x; \theta)$ and we simply use the principle of maximum likelihood.

In conclusion, ANNs (e. g. MLP, CNN, etc.) are statistical models

nbro
  • 39,006
  • 12
  • 98
  • 176
Ta_Req
  • 101
  • 1
  • 5
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackexchange.com/rooms/105466/discussion-on-answer-by-ta-req-are-neural-networks-statistical-models). – nbro Mar 12 '20 at 03:06
4

What is a statistical model?

According to Anthony C. Davison (in the book Statistical Models), a statistical model is a probability distribution constructed to enable inferences to be drawn or decisions made from data. The probability distribution represents the variability of the data.

Are neural networks statistical models?

Do neural networks construct or represent a probability distribution that enables inferences to be drawn or decisions made from data?

MLP for binary classification

For example, a multi-layer perceptron (MLP) trained to solve a binary classification task can be thought of as model of the probability distribution $\mathbb{P}(y \mid x; \theta)$. In fact, there are many examples of MLPs with a softmax or sigmoid function as the activation function of the output layer in order to produce a probability or a probability vector. However, it's important to note that, although many neural networks produce a probability or a probability vector, a probability distribution is not the same thing. A probability alone does not describe a full probability distribution and different distributions are defined by different parameters (e.g. a Bernoulli is defined by $p$, while a Gaussian by $\mu$ and $\sigma$). However, for example, if you make your neural network produce a probability, i.e. model $\mathbb{P}(y = 1 \mid x; \theta)$, at least in the case of binary classification, you could obviously derive the probability of the other label as follows: $\mathbb{P}(y = 0 \mid x; \theta) = 1 - \mathbb{P}(y = 1 \mid x; \theta)$. In any case, in this example, you only need the parameter $p = \mathbb{P}(y = 1 \mid x; \theta)$ to define the associated Bernoulli distribution.

So, these neural networks (for binary classification) that model and learn some probability distribution given the data in order to make inferences or predictions could be considered statistical models. However, note that, once the weights of the neural network are fixed, given the same input, they always produce the same output.

Generative models

Variational auto-encoders (VAEs) construct a probability distribution (e.g. a Gaussian or $\mathbb{P}(x)$ that represents the probability distribution over images, if you want to generate images), so VAEs can be considered statistical models.

Bayesian neural networks

There are also Bayesian neural networks, which are neural networks that maintain a probability distribution (usually, a Gaussian) for each unit (or neuron) of the neural network, rather than only a point estimate. BNNs can thus also be considered statistical models.

Perceptron

The perceptron may be considered a "statistical model", in the sense that it learns from data, but it doesn't produce any probability vector or distribution, i.e. it is not a probabilistic model/classifier.

Conclusion

So, whether or not a neural network is a statistical model depends on your definition of a statistical model and which machine learning models you would consider neural networks. If you are interested in more formal definitions of a statistical model, take a look at this paper.

Parametric vs non-parametric

Statistical models are often also divided into parametric and non-parametric models. Neural networks are often classified as non-parametric because they make fewer assumptions than e.g. linear regression models (which are parametric) and are typically more generally applicable, but I will not dwell on this topic.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • 1
    This is a great and detailed answer! I have a question - one can have an assumption on the probabilistic nature of the data, on which the model is trained, and search for a single optimal solution. On the other hand, a probability distribution can be imposed on the model params (like for mentioned Bayesian NN) - and the result would be like mean and the std of the optimal params. Is there any specific name for this kind of problem? – spiridon_the_sun_rotator Jul 02 '21 at 12:18
  • @spiridon_the_sun_rotator Are you asking whether there's a sub-field of ML that studies Bayesian neural networks? If that's the question, the name is usually "Bayesian deep learning", or "Bayesian ML" or even "Probabilistic ML", but note that these terms may not just refer to techniques like BNNs, but other things too, like Bayesian optimization or graphical models, as far as I remember. – nbro Jul 02 '21 at 14:55
2

A dataset can be thought of as a set of ordered pairs $\subset R^f \times R^l$, where $f$ is the feature dimensions and $l$ the label dimensions.

Ordered pairs give rise to a statistical model (i.e. a function from $R^f$ to a probability distribution over $R^l$)

Then this statistical model is turned into function $R^f \mapsto R^l$ by taking the argmax.

A neural network (trained on a dataset) is a compression of this statistical model composed with argmax. (think of how much smaller the GPT3 weights are than the dataset).

So they are approximations to statistical models composed with argmax.

Tom Huntington
  • 316
  • 1
  • 6