Questions tagged [backpropagation]

For questions about the back-propagation (aka "backprop", and often abbreviated as "BP") algorithm, which is used to compute the gradient of the objective function (e.g. the mean squared error) with respect to the parameters (or weights) of the neural network, when trained with gradient descent.

260 questions
40
votes
4 answers

What is the time complexity for training a neural network using back-propagation?

Suppose that a NN contains $n$ hidden layers, $m$ training examples, $x$ features, and $n_i$ nodes in each layer. What is the time complexity to train this NN using back-propagation? I have a basic idea about how they find the time complexity of…
18
votes
1 answer

Are these two versions of back-propagation equivalent?

Just for fun, I am trying to develop a neural network. Now, for backpropagation I saw two techniques. The first one is used here and in many other places too. What it does is: It computes the error for each output neuron. It backpropagates it into…
14
votes
2 answers

Is the mean-squared error always convex in the context of neural networks?

Multiple resources I referred to mention that MSE is great because it's convex. But I don't get how, especially in the context of neural networks. Let's say we have the following: $X$: training dataset $Y$: targets $\Theta$: the set of parameters…
11
votes
5 answers

What is "backprop"?

What does "backprop" mean? Is the "backprop" term basically the same as "backpropagation" or does it have a different meaning?
kenorb
  • 10,423
  • 3
  • 43
  • 91
10
votes
1 answer

Can non-differentiable layer be used in a neural network, if it's not learned?

For example, AFAIK, the pooling layer in a CNN is not differentiable, but it can be used because it's not learning. Is it always true?
10
votes
2 answers

What are the learning limitations of neural networks trained with backpropagation?

In 1969, Seymour Papert and Marvin Minsky showed that Perceptrons could not learn the XOR function. This was solved by the backpropagation network with at least one hidden layer. This type of network can learn the XOR function. I believe I was once…
10
votes
2 answers

How do evolutionary algorithms have advantages over the conventional backpropagation methods?

How does employing evolutionary algorithms to design and train artificial neural networks have advantages over using the conventional backpropagation algorithms?
9
votes
2 answers

What exactly is averaged when doing batch gradient descent?

I have a question about how the averaging works when doing mini-batch gradient descent. I think I now understood the general gradient descent algorithm, but only for online learning. When doing mini-batch gradient descent, do I have to: forward…
9
votes
1 answer

Is back-propagation applied for each data point or for a batch of data points?

I am new to deep learning and trying to understand the concept of back-propagation. I have a doubt about when the back-propagation is applied. Assume that I have a training data set of 1000 images for handwritten letters, Is back-propagation…
8
votes
3 answers

How do I know if my backpropagation is implemented correctly?

I'm working on an implementation of the backpropagation algorithm for a simple neural network, which predicts a probability of survival (1 or 0). However, I can't get it above 80%, no matter how much I try to set the right hyperparameters. I suspect…
8
votes
1 answer

What do symmetric weights mean and how does it make backpropagation biologically implausible?

I was reading a paper on alternatives to backpropagation as a learning algorithm in neural networks. In this paper, the author talks about the disadvantages of backpropagation, and one of the disadvantages stated is that backpropagation requires…
0jas
  • 83
  • 4
8
votes
3 answers

How does backprop work through the random sampling layer in a variational autoencoder?

Implementations of variational autoencoders that I've looked at all include a sampling layer as the last layer of the encoder block. The encoder learns to generate a mean and standard deviation for each input, and samples from it to get the input's…
Luke Wolcott
  • 183
  • 4
7
votes
1 answer

How is division by zero avoided when implementing back-propagation for a neural network with sigmoid at the output neuron?

I am building a neural network for which I am using the sigmoid function as the activation function for the single output neuron at the end. Since the sigmoid function is known to take any number and return a value between 0 and 1, this is causing…
7
votes
3 answers

How to compute the derivative of the error with respect to the input of a convolutional layer when the stride is bigger than 1?

I read that to compute the derivative of the error with respect to the input of a convolution layer is the same to make of a convolution between deltas of the next layer and the weight matrix rotated by $180°$, i.e. something…
7
votes
1 answer

How is the gradient calculated for the middle layer's weights?

I am trying to understand backpropagation. I used a simple neural network with one input $x$, one hidden layer $h$ and one output layer $y$, with weight $w_1$ connecting $x$ to $h$, and $w_2$ connecting $h$ to $y$ $$ x \rightarrow (w_1) \rightarrow…
Eka
  • 1,036
  • 8
  • 23
1
2 3
17 18