3

I'm doing machine learning projects. I took a look at many datasets I worked with, mostly there are already famous datasets that everyone uses.

Let's say I decided to make my own dataset. Is there a possibility that my data are so random so that no relationship exists between my inputs and outputs? This is interesting because if this is possible, then no machine learning model will achieve to find an inputs outputs relationship in the data and will fail to solve the regression or classification problem.

Moreover, is it mathematically possible that some values have absolutely no relationship between them? In other words, there is no function (linear or nonlinear) that can map those inputs to the outputs.

Now, I thought about this problem and concluded that, if there is a possibility for this, then it will likely happen in regression because maybe the target outputs are in the same range and the same features values can correspond to the same output values and that will confuse the machine learning model.

Have you ever encountered this or a similar issue?

nbro
  • 39,006
  • 12
  • 98
  • 176
basilisk
  • 213
  • 1
  • 8
  • Well, it depends on what do you mean under the "relationship". For any two set of inputs and outputs I may build a polynomial $f$ of many variables such that $outputs=f(inputs).$ Is it a relationship? – Leox Mar 05 '20 at 10:24

2 Answers2

3

Of course, it's possible to define a problem where there is no relationship between input $x$ and output $y$. In general, if the mutual information between $x$ and $y$ is zero (i.e. $x$ and $y$ are statistically independent) then the best prediction you can do is independent of $x$. The task of machine learning is to learn a distribution $q(y|x)$ that is as close as possible to the real data generating distribution $p(y|x)$.

For example, looking at the common cross-entropy loss, we have $$ \begin{align} H(p,q) = -\mathbb{E}_{y,x \sim p}\left[\log q(y|x)\right] & = \mathbb{E}_{x\sim p}\left[\text{H}(p(y|x)) + \text{D}_{\text{KL}}(p(y|x)\|q(y|x))\right] \\ & = \text{H}(p(y)) + \mathbb{E}_{x \sim p}\left[\text{D}_{\text{KL}}(p(y)\|q(y|x))\right], \end{align} $$ where we have used the fact that $p(y|x)=p(y)$ since $y$ and $x$ are independent. From this, we can see that the optimal predicted distribution $q(y|x)$ is equal to $p(y)$, and actually independent of $x$. Also, the best loss you can get is equal to the entropy of $y$.

nbro
  • 39,006
  • 12
  • 98
  • 176
Chris Cundy
  • 276
  • 1
  • 4
0

Not sure if I can answer the question as a whole, but a pure random input/output pair doesn't quite have "no relationship" at all. At the very least, for any fixed training set input/output pair, you can do an if...then mapping to construct a 1-to-1 function, such that you can classify the training set with 100% accuracy (assuming no duplicates of input).

In any case, I assume you mean uniform random, because if you have something like gaussian random, you can still learn some latent structure from how the random numbers are generated.

But even if you assume uniform random, and your algorithm is only guessing, your algorithm is technically still operating optimally per the data generating distribution, which basically means its as optimal as it gets.

The only such case that I can imagine which would satisfy your question, would be if you had a separate training/validation set, where the only element of the training input/output is [1,1], but the validation set only has elements of [1,-1], or something along those lines.

From reading your comments, I suspect that your intention with the question was: "Can there be a relationship of data such that no method can learn it?". To the extent that the data-generating distribution exists, then by the universal approximation theorem of neural networks, then it is reasonable that you can at least partially learn it.

However it is important to note that the universal approximation theorem doesn't mean that such a data generating distribution can be learned by a neural net, it only means that you can get "non-zero as close as you want" to the data generating distribution. More explicitly: there is a setting of weights that gives you results as good as you want, but gradient descent doesn't necessarily learn it.

k.c. sayz 'k.c sayz'
  • 2,061
  • 10
  • 26
  • thanks for the answer, my goal is not to make an if. if I used machine learning model with a validation and test set the model would overfitt if i made if else all the time!! by relationship I mean a pattern that map such data to an output. precisely, is there a dataset where i can look and say no sorry there is no model here that can find relationship in this data, hence, machine learning can do nothing for me in this case – basilisk Mar 04 '20 at 07:32
  • @basilisk strictly speaking, an if....then.. algorithm is as valid of a classification algorithm as any other, the main issue being that it is not differentiable, and thus you can't run things like gradient descent on it, and what i said applies for an overtrained model anyhow. >"is there a dataset [...] no model here that can find relationship in this data" i have already answered your question, but to spell it out for you: strictly speaking, likely no, unless your training/validation set pair was poorly picked, and you can always overfit your set if you don't have a training/validation pair. – k.c. sayz 'k.c sayz' Mar 04 '20 at 13:38
  • a relationship between data doesnt mean memorizing the values with an if then statements! a relationship is a function that can be created to map input x to output y. Furthermore, of course you can use gradient descent in classification, you need to choose the right differentiable loss function for it (at least in a certrain range). Your answer does not answer my question. as you used in the first paragraph, using if then will give you 100% accuracy on training data but who validate on trainingdata?for capturing a pattern you need to test on test data, in your case the model giv0% on test data – basilisk Mar 04 '20 at 13:46
  • >"a relationship between data doesn't mean [...]" yes it does. both a "usual ML function" and an "if then statement" are functions that map from input space to output space. if you don't acknowledge this, then you lack understanding. your initial question asks for a theoretical result regarding a /fixed dataset/, and hence i was able to cheese an answer by suggesting you can always achieve your intended result by overfitting. now, your intention seems be about learning the underlying "data generating distribution", which is very different from how you initially posed the question – k.c. sayz 'k.c sayz' Mar 04 '20 at 13:56
  • and in any case, in the last paragraph i already answered your question regarding separate validation/training sets – k.c. sayz 'k.c sayz' Mar 04 '20 at 13:57
  • my question clearly shows that I'm asking from a machine learning aspects and not just mathematics. Please read the question again, I stated whether mL model would fail sometimes to find pattern in data because of randomness. you think I lack understanding, I would say who doesnt? . Moreover, I think everyone knows that data can be memorized with if then.. but I'm asking for a relationship, a pattern, a mapping function.. I don't know what to say more. also I didn't understand the last paragraph you wrote, can you clear more – basilisk Mar 04 '20 at 17:35
  • ...machine learning is literally math/stats......... you asked if something might happen in theory, so i answered your question "as in theory", which is why i gave you an answer this is "technically correct" – k.c. sayz 'k.c sayz' Mar 04 '20 at 17:44
  • last paragraph tl;dr: if your validation/training set are "obviously unrelated" then you might argue that you cannot learn any relationship between them. but the term "obviously unrelated" is loaded with tricky epistemic (as in: philosophical) issues, which makes it non-trivial to unpack, which is why i gave the example as above. – k.c. sayz 'k.c sayz' Mar 04 '20 at 17:48
  • you might be interested in this: https://stats.stackexchange.com/questions/320375/what-does-it-mean-for-the-training-data-to-be-generated-by-a-probability-distrib if you've read that, then the "formal interpretation" of what i meant was that the training/validation set comes from a different "data-generating distribution". if you've read this far you might come to the suspicion that i am again being pedantic by providing an answer like this. but i'll add something to my answer that might address your original intent of the question. – k.c. sayz 'k.c sayz' Mar 04 '20 at 17:52
  • actually i've came to realize that you've asked the same question here: https://datascience.stackexchange.com/questions/69083/is-there-a-possibility-that-there-is-no-relationship-between-some-inputs-and-out , which is a better place to ask that question anyhow. @Dave 's answer there pretty much superseeds mine and is as clear as it gets – k.c. sayz 'k.c sayz' Mar 04 '20 at 18:46