Can I always interpret features as random variables in machine learning safely?

Question

Consider the following statements from Chapter 5: Machine Learning Basics from the book titled Deep Learning (by Aaron Courville et al.)

Machine learning tasks are usually described in terms of how the machine learning system should process an example. An example is a collection of features that have been quantitatively measured from some object or event that we want the machine learning system to process. We typically represent an example as a vector $\mathbf{x} \in \mathbb{R}^n$ where each entry $x_i$ of the vector is another feature. For example, the features of an image are usually the values of the pixels in the image.

Here, an example is described as a collection of features, which are real numbers. In probability theory, a random variable is also a real-valued function.

Can I always interpret features in machine learning as random variables or are there any exceptions for this interpretation?

score 2 · Accepted Answer · edited Jan 14 '22 at 23:56

In general terms yes. Because what the ML algorithms do in general is to learn the hidden probability density function of the target examples (cats, dogs..). And that is done by learning the conditional probability function between inputs, $X$, and target outputs, $y$, for discriminative models or by learning the joint probability function for generative models.

More on discriminative VS generative models and their relation with probabilities in this nice article from where I took the picture.

It also helps to think that given a set of different examples (say the image of cats). What the ML algorithm really learns is the invariant features between all the examples, i.e. the cats. So the ML algorithm learns to discriminate all the variability (backgrounds, occluding objects) and learn what is invariant: the representation of a cat.

This representation, since images are pixels, can be described as a probability density function of values that would represent a cat. It would be like the ML algorithm said: "If those pixels values behave close to the probability density function I learned, then those pixels must represent a cat!"

You say "... hidden probability density function", but maybe you mean "... **unknown** probability density function"? Because $p(y \mid x)$ is typically unknown. Hidden refers to something different. For example, the hidden space in VAEs or hidden variable in Hidden Markov models. — nbro, Aug 27 '21 at 13:48
You are right, thinking it through I think we could say: “unknown but existent”, like although it is unknown, we know it is there — JVGD, Aug 29 '21 at 15:54

Can I always interpret features as random variables in machine learning safely?

1 Answers1

Linked