I am writing a Recurrent Neural Network using only the NumPy library for a binary classification problem. When I initialize the weights with np.random.randn, after 1000 epochs it gets ~60% accuracy, whereas when I divide the weights by 1000 first, it reaches 100% accuracy after the same amount of epochs.
Why is this? Do RNNs work better with smaller weights or does the number 1000 mean something?
Any and all help is welcome, thanks.