Is algorithmic bias due to the training dataset used?

Question

I recently read about algorithmic bias in facial recognition.

Is algorithmic bias due to the training dataset used, or is it due to something else?

score 4 · Answer 1 · answered Jun 05 '17 at 10:03

As the name implies, algorithmic bias is related with the used algorithm. Due to the way it was programmed or devised, the algorithm will be biased in some of its samples.

From Communications of the ACM:

[Algorithms] often inadvertently pick up the human biases that are incorporated when the algorithm is programmed, or when humans interact with that algorithm.

A machine learning model can be biased if the wrong data-set is used, of course. That is usually only referred to as bias, and often associated with the bias-variance tradeoff.

Tshilidzi Mudau · Answer 2 · 2017-06-07T14:29:57.697

Just to add to what has already been said in @BlueMoon93's answer:

Algorithmic bias is the bias built into the algorithm. Now for the long answer:

As stated by the so called No free lunch theorem: regardless of the algorithm you use, you cant get learning "for free"(i.e by just looking at the training examples). The reason for this is that the only thing you know about the data is based on the limited examples you have seen in the training set. To generalize, your algorithm has to make some sort of assumptions about the underlying nature of the dataset and the manner in which it can be represented/interpreted.

Built into every algorithm is a set of assumptions about the dataset, for example, built into the convolution neural network is the assumption that the dataset(e.g images you are using to train your Convolution neural network) can be understood by mimicking how the human eye is known to work(i.e a bias). Some algorithms may have such strong biases such that they are incapable of learning certain kinds of functions. For example, linear models assumes that the underlying data is linear(a bias), note this might not be the case for the dataset at hand in which case the bias built into the model was bad for the dataset at hand.

As pointed out by @BlueMoon93, there is another form of bias, commonly referred to simply as "bias" which is introduced by the dataset used.

Is algorithmic bias due to the training dataset used?

2 Answers2