10

How do I determine the time complexity of the forward pass algorithm of a feedforward neural network? How many multiplications are done to generate the output?

nbro
  • 39,006
  • 12
  • 98
  • 176
Artificial
  • 103
  • 1
  • 6

1 Answers1

11

Let's suppose that we have an MLP with $15$ inputs, $20$ hidden neurons and $2$ output neurons. The operations performed are only in the hidden and output neurons, given that the input neurons only represent the inputs (so they do not perform any operation).

Each hidden neuron performs a linear combination of its inputs followed by the application of a non-linear (or activation) function. So, each hidden neuron $j$ performs the following operation

\begin{align} o_j = \sigma \left(\sum_{i}^{15} w_{ij}x_i \right),\tag{1}\label{1} \end{align}

where $i$ is the input coming from the input neuron $i$, $w_{ij}$ is the weight of the connection from the input neuron $i$ to the hidden neuron $j$, and $o_j$ is used to denote the output of neuron $j$.

There are $20$ hidden neurons and, for each of them, according to equation $\ref{1}$, we perform $15$ multiplications (ignoring any multiplications that might be associated with the activation function), so $15*20 = 300$ multiplications are performed at the (only) hidden layer. In general, if there are $n$ inputs and $m$ hidden neurons, then $n*m$ multiplications will be performed in the first hidden layer.

Now, each neuron $j$ at the next layer (in this case, the output layer), also performs a linear combination followed by the application of an activation function

\begin{align} o_j = \tau \left(\sum_{i}^{20} w_{ij}x_i \right),\tag{2}\label{2} \end{align}

where $\tau$ is another activation function which might or not be equal to $\sigma$, but we ignore all multiplications that might involve the application of the activation functions (we just want to count the ones in the linear combinations). Of course, in this case, $x_i$ corresponds to the activation of neuron $i$ (of the hidden layer).

Similarly to the previous reasoning, there are $2$ output neurons and, to compute the output of each of them, $20$ multiplications are performed (in the linear combination), so there are a total of $2*20 = 40$ multiplications at the output layer.

So, an MLP with $15$ inputs, $20$ hidden neurons and $2$ output neurons will perform $15*20 + 20*2 = 340$ multiplications (excluding activation functions). Of course, in this case, the number of multiplication depends not only on the number of neurons but also on the input size.

In general, an MLP with $n$ inputs, $M$ hidden layers, where the $i$th hidden layer contains $m_i$ hidden neurons, and $k$ output neurons will perform the following number of multiplications (excluding activation functions)

\begin{align} nm_{1} + m_{1}m_{2} + m_{2}m_{3} + \dots + m_{M-1}m_{M} + m_{M}k = nm_{1} + m_{M}k + \sum_{i=1}^{M-1} m_{i}m_{i+1} \end{align}

which, in big-O notation, can be written as

\begin{align} \Theta\left(nm_{1} + m_{M}k + \sum_{i=1}^{M-1} m_{i}m_{i+1} \right) \end{align}

where $\Theta(\cdot)$ is used (as opposed to $\mathcal{O}(\cdot)$) because this is a strict bound. If you have just one hidden layer, the number of multiplications becomes

\begin{align} \Theta\left(nm_{1} + m_{1}k \right) \end{align}

Of course, at each layer, the number of multiplications can be computed independently of the multiplications of the other layers (you can think of each layer as a perceptron), hence we sum (and not e.g. multiply) the multiplications of each layer when computing the total number of multiplications of the whole MLP.

In general, when analyzing the time complexity of an algorithm, we do it with respect to the size of the input. However, in this case, the time complexity (more precisely, the number of multiplications involved in the linear combinations) also depends on the number of layers and the size of each layer. The time complexity of a forward pass of a trained MLP thus is architecture-dependent (which is a similar concept to an output-sensitive algorithm).

You can easily include other operations (sums, etc.) in this reasoning to calculate the actual time complexity of a trained MLP.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • 1
    Thank you for your complete answer. It helped me a lot. Unfortunately my reputation is not enough yet for voting you up. So virtually i give you +1 for your good explanation :). Thanks. – Artificial Jul 28 '19 at 10:08
  • From a different perspective, if we see Neural Networks as plain matrix products (post training feedforward), shouldn't it be something of the form $O(n^3)$? Just my two cents... – Karan Shah Jul 29 '19 at 16:17
  • @KaranShah No, it shouldn't, but what is $n$ in your case? The complexity is proportional to the number of layers and the number of neurons per layer. – nbro Jul 29 '19 at 16:25
  • Yes, it is a function of the number of layers. So assuming $O(n^3)$ for 1 matrix product, it scales for L layers in the same manner, that's how I'm seeing it... – Karan Shah Jul 29 '19 at 17:04
  • 2
    @KaranShah Matrix-matrix multiplication is $O(n^3)$ (actually, you can have a better bound than this). Anyway, the input is a vector, so you multiply a vector by a matrix (and not a matrix by another matrix), which is not $n^3$. See [https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra](https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations#Matrix_algebra). – nbro Jul 29 '19 at 17:09
  • Ah I got that! When m, n, p become equal that is similar to O(cubic). We're multiplying linear vectors, so it makes sense to be the way you showed it. Thanks. Solved. – Karan Shah Jul 29 '19 at 17:27
  • @KaranShah Yes, but, in this case, either $p$ or $n$ are $1$, so that becomes e.g. $nm$ (which is what I am actually saying in my answer if you have noticed it). – nbro Jul 29 '19 at 20:34
  • @nbro I've read the comments and get a little confused. If we use a NN within a for loop. and the for loop has O(n) (before adding the NN) what will be overall compelxity after adding the NN? Suppose we have used the same NN i mentioned in the question. since its input and output size always the same and perform the same of multiplication (same hiddens and neurons) in each iteration. The overall complexity will be O((some-constant)n) or just O(n)? Could you please clarify this? – Artificial Aug 01 '19 at 13:07
  • 1
    @Artificial If you execute a forward pass of the NN with $15$ inputs, $20$ hidden neurons and $2$ output neurons inside a for loop with $n$ iterations, then the total number of multiplications is $\Theta(n*(340))$ multiplications, given that, at each iteration, you perform $340$ multiplications (one forward pass) and there are $n$ iterations. – nbro Aug 01 '19 at 13:20
  • +1 for your valuable time editing my post. Thank you dear nbro. – Artificial Aug 22 '19 at 08:04
  • why only take into account the multiplication and not add the bias? All these time complexity computation approaches seem to be very fuzzy and more like the author's thoughts than any real analysis. What about the space complexity? – user3352632 Jul 28 '22 at 07:48