1

Is it true that a bias is said to be inductive iff it is useful in generalising the data?

Or does inductive bias can also refer to the assumptions that may cause a decrease in performance?


Suppose I have a dataset on which I want to use a deep neural network to do my task. I think, based on some knowledge, that a DNN with 5 or 11 layers may work well. But, after implementation, suppose 11 layer one worked well. Then can I call both of them inductive bias? or only the 11 layer assumption?

nbro
  • 39,006
  • 12
  • 98
  • 176
hanugm
  • 3,571
  • 3
  • 18
  • 50
  • @nbro It seems that any type of bias can be treated as an inductive bias from both the answers. – hanugm Sep 08 '21 at 01:31

3 Answers3

2

The inductive bias is the prior knowledge that you incorporate in the learning process that biases the learning algorithm to choose from a specific set of functions [1].

For example, if you choose the hypothesis class

$$\mathcal{H}_\text{lines} = \{f(x) = ax + b \mid a, b \in \mathbb{R} \}$$ rather than $$\mathcal{H}_\text{parabolas} = \{f(x) = ax^2 + b \mid a, b \in \mathbb{R} \},$$ then you're assuming (implicitly or explicitly, depending on whether you're aware of these concepts) that your target function (the function that you want to learn) lies in the set $\mathcal{H}_\text{lines}$. If that's the case, then your learning algorithm is more likely to find it.

In most cases, you do not know exactly the nature of your target function, so you could think that it may be a good idea to choose the largest set of possible functions, but this would make learning infeasible (i.e. you have too many functions to choose from) and could lead to over-fitting, i.e. you choose a function that performs well on your training data, but it's actually quite different from your target function, so it performs badly on unseen data (from your target function). This can happen because the training data could not be representative of your target function (you don't usually know this a priori, so you cannot really or completely solve this issue).

So, the definition above does not imply that the inductive bias will not necessarily lead to over-fitting or, equivalently, will not negatively affect the generalization of your chosen function. Of course, if you chose to use a CNN (rather than an MLP) because you are dealing with images, then you will probably get better performance. However, if you mistakingly assume that your target function is linear and you choose $\mathcal{H}_\text{lines}$ from which your learning algorithm can pick functions, then it will choose a bad function.

Section 2.3 of the book Understanding Machine Learning: From Theory to Algorithms and section 1.4.4. of the book Machine Learning A Probabilistic Perspective (by Murphy) provide more details about the inductive bias (the first more from a learning theory perspective, while the second more from a probabilistic one).

You may also be interested in this answer that I wrote a while ago about the difference between approximation and estimation errors (although if you know nothing about learning theory it may not be very understandable). In any case, the idea is that the approximation error (AE) can be a synonym for inductive bias because the AE is the error due to the choice of hypothesis class.

(As a side note, I think it is called "inductive bias" because this bias is the one that can make inductive inference feasible and successful [2] and maybe to differentiate it from other biases, e.g. the bias term in linear regression, although that term can also be an inductive bias).

nbro
  • 39,006
  • 12
  • 98
  • 176
1

Is it true that a bias is said to be inductive iff it is useful in generalising the data?

Or does inductive bias can also refer to the assumptions that may cause a decrease in performance?

Tom M. Mitchell defines bias as:

Any basis for choosing one generalization over another, other than strict consistency with the observed training instances.

Gordon, Desjardins extend the definition of bias to include any factor (including consistency with the instances) that influences the definition or selection of inductive hypotheses.

Basically inductive bias is any type of bias that a learning algorithm introduces in order to provide a prediction.

For example:

  1. In SVM we attempt to maximize the width of the boundary between two classes
  2. In Nearest neighbors we assume that most of the cases in a small neighborhood in feature space belong to the same class
  3. In cross-validation choosing the model that minimizes the cross-validation error is an inductive bias, because we are choosing an hypothesis over another

However it must be said that cross-validation score has a well grounded theoretical background, and trying to avoid its bias by only considering a hypothesis class is itself a bias because we are discarding all the other hypothesis a priori.

In your case:

Suppose I have a dataset on which I want to use a deep neural network to do my task. I think, based on some knowledge, that a DNN with 5 or 11 layers may work well. But, after implementation, suppose 11 layer one worked well. Then can I call both of them inductive bias? or only the 11 layer assumption?

In this case you have introduced an inductive bias by using a Neural network between 5 and 11 layers (you chose a function class) then you introduced another bias by choosing the number of layers that minimizes the cross-validation score.

Conclusion

In the end we want to minimize the error on the real-world task that we need to solve, for example maximizing the accuracy on MNIST. To do this we use certain hypothesis/models, and methods to compare them, i.e. cross-validation score. The choice of the models and methods are up to the researcher and its prior beliefs. All these beliefs result in choices about the learning algorithm and introduce an inductive bias.

  • You're quoting a Wikipedia article that does not cite any external resources (e.g. for the part of cross-validation that you quote). Although what you wrote doesn't seem wrong to me, I would encourage you to search for more reliable resources (e.g. reliable books). – nbro Sep 07 '21 at 16:59
  • @nbro Thank you for pointing that out, I edited the answer by including better references and, I think, a better definition of inductive bias – LetteraUnica Sep 08 '21 at 13:06
0

The inductive bias assumed by CNN is that if we translate an image, the output does not change (the image has translational symmetry), and we can see that this assumption is valid. Similarly, spherical CNN  has rotational symmetry as inductive bias capture by the SO3 group (a collection of all the special orthogonal $3 \times 3$ matrices), and this is valid when data is on a sphere. The inductive bias of linear regression is that the data can be separated linearly. 

We must choose algorithms such that the inductive bias captures the correct assumption about the data distribution. For example, linear regression is better than polynomial regression if data is perfectly linear, since the inductive bias of linear regression matches with the data distribution. 

nbro
  • 39,006
  • 12
  • 98
  • 176
Swakshar Deb
  • 673
  • 4
  • 12
  • 1
    So, can you conclude that the inductive bias doesn't necessarily lead to better "performance"? – nbro Sep 07 '21 at 20:29
  • Inductive bias leads to better generalization for example, we use CNN for image classification rather than NN, NN will need lot of translational images to identify that image has translational symmetry, but CNN already assume translation symmetry(inductive bias of CNN). It will also cause curse of dimensionality in case of NN and we may not have enough sample to learn that symmetry. – Swakshar Deb Sep 08 '21 at 07:37
  • If the inductive bias goes with the assumption of the data, then it leads to better reasult. For example, CNN is better than linear regression for images since CNN inductive bias captures the underlying assumption but linear regression's inductive bias does not go with the assumption – Swakshar Deb Sep 08 '21 at 07:41