0

I was following Daniel Shiffman's tutorials on how to write your own neural network from scratch. I specifically looked into his videos and the code he provided in here. I rewrote his code in Python, however, 3 out of 4 of my outputs are the same. The neural network has two input nodes, one hidden layer with two nodes and one output node. Can anyone help me to find my mistake? Here is my full code.

import random

nn = NeuralNetwork(2,2,1)
inputs  = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
targets = np.array([[0], [1], [1], [0]])
zipped = zip(inputs, targets)
list_zipped = list(zipped)

for _ in range(9000):
    x, y = random.choice(list_zipped)
    nn.train(x, y)

output = [nn.feedforward(i) for i in inputs]

for i in output:
   print("Output ", i)

#Output  [ 0.1229546]  when it should be around 0
#Output  [ 0.6519492]  ~1
#Output  [ 0.65180228] ~1
#Output  [ 0.66269853] ~0

EDIT_1: I tried debugging my code by choosing all weights and bias' values to 0.5. I did this in both my code and Daniel's. This obviously ended up showing me all outputs with the same value.

After that I increased my weights and bias' values variety from [0 , 1) to [-1, 1). By running this a few times, I would sometimes get the correct output:

[ 0.93749991] # should be ~1
[ 0.93314793] # ~1 
[ 0.07001175] # ~0
[ 0.06576194] # ~0

If I ran nn.train() 100 000 times, I get the correct output 2/3 times. Is this the issue of gradient descent, where it converges to the local minima?

Gabriele
  • 243
  • 1
  • 9

1 Answers1

2

Local minima.

You have the exact same issue of this question. If you randomize your initial weights, you'll see sometimes you get the correct results, and others you won't. It's because when the weights are initialized with a certain range of values, they will converge to a local minima which you cannot escape with a low learning rate.

A simple solution is to increase the size of your hidden layer, which will make the network more robust to such issues.

When you have only 2 dimensions, a local minima exists. When you have more dimensions, this minima gets harder and harder to reach, as its likelihood decreases. Intuitively, you have a lot more dimensions through which you can improve than if you only had 2 dimensions.

The problem still exists, even with 1000 neurons you could find a specific set of weights which was a local minimum. However, it just becomes so much less likely.

BlueMoon93
  • 906
  • 5
  • 16
  • XD So that is why 2 layers failed. I have some changes to make. – FreezePhoenix Apr 26 '18 at 12:46
  • I see. So is it just a complete luck that if I ran Daniel's code, I get correct answer 9/10 times and if I run mine, I get the correct answer only 2/10 times? – Gabriele Apr 27 '18 at 08:15
  • 1
    It has to do with initialization, most likely. I havent seen his code, but i assume he uses some initialization method different than yours – BlueMoon93 Apr 27 '18 at 08:18