Most Popular

1500 questions
5
votes
1 answer

How to detect vanishing gradients?

Can vanishing gradients be detected by the change in distribution (or lack thereof) of my convolution's kernel weights throughout the training epochs? And if so how? For example, if only 25% of my kernel's weights ever change throughout the epochs,…
5
votes
1 answer

How to define an action space when an agent can take multiple sub-actions in a step?

I'm attempting to design an action space in OpenAI's gym and hitting the following roadblock. I've looked at this post which is closely related but subtly different. The environment I'm writing needs to allow an agent to make between $1$ and $n$…
5
votes
1 answer

Why not more TD() in actor-critic algorithms?

Is there either an empirical or theoretical reason that actor-critic algorithms with eligibility traces have not been more fully explored? I was hoping to find a paper or implementation or both for continuous tasks (not episodic) in continuous…
5
votes
1 answer

Is there a reason to use TensorFlow over PyTorch for research purposes?

I've been using PyTorch to do research for a while and it seems to be quite easy to implement new things with. Also, it is easy to learn and I didn't have any problem with following other researchers code so far. However, I wonder whether…
SpiderRico
  • 960
  • 8
  • 18
5
votes
1 answer

Is the LSTM component a neuron or a layer?

Given the standard illustrative feed-forward neural net model, with the dots as neurons and the lines as neuron-to-neuron connection, what part is the (unfold) LSTM cell (see picture)? Is it a neuron (a dot) or a layer?
MScott
  • 445
  • 4
  • 12
5
votes
1 answer

How powerful is OpenAI's Gym and Universe in board games area?

I'm a big fan of computer board games and would like to make Python chess/go/shogi/mancala programs. Having heard of reinforcement learning, I decided to look at OpenAI Gym. But first of all, I would like to know, is it possible using OpenAI…
Taissa
  • 63
  • 4
5
votes
2 answers

What are examples of approaches to dimensionality reduction of feature vectors?

Given a pre-trained CNN model, I extract feature vector of images in reference and query dataset with several thousands of elements. I would like to apply some augmentation techniques to reduce the feature vector dimension to speed up cosine…
5
votes
2 answers

In deep learning, is it possible to use discontinuous activation functions?

In deep learning, is it possible to use discontinuous activation functions (e.g. one with jump discontinuity)? (My guess: for example, ReLU is non-differentiable at a single point, but it still has a well-defined derivative. If an activation…
5
votes
1 answer

Which deep learning models are suitable for image-to-image mapping?

I am working on a problem in which I need to train a neural network to map one or more input images to one or more output images (1 channel for image). Below I report some examples of input&output. In this case I report 1 input and 1 output image,…
5
votes
1 answer

Autoencoder produces repeated artifacts after convergence

As experiment, I have tried using an autoencoder to encode height data from the alps, however the decoded image is very pixellated after training for several hours as show in the image below. This repeating patter is larger than the final kernel…
5
votes
1 answer

Why is a softmax used rather than dividing each activation by the sum?

Just wondering why a softmax is typically used in practice on outputs of most neural nets rather than just summing the activations and dividing each activation by the sum. I know it's roughly the same thing but what is the mathematical reasoning…
user8714896
  • 717
  • 1
  • 4
  • 21
5
votes
1 answer

Why do we average gradients and not loss in distributed training?

I'm running some distributed trainings in Tensorflow with Horovod. It runs training separately on multiple workers, each of which uses the same weights and does forward pass on unique data. Computed gradients are averaged within the communicator…
pSoLT
  • 161
  • 2
5
votes
1 answer

Is running more epochs really a direct cause of overfitting?

I've seen some comments in online articles/tutorials or Stack Overflow questions which suggest that increasing the number of epochs can result in overfitting. But my intuition tells me that there should be no direct relationship at all between the…
Alexander Soare
  • 1,319
  • 2
  • 11
  • 26
5
votes
1 answer

What is a "batch" in batch normalization?

I'm working on an example of CNN with the MNIST hand-written numbers dataset. Currently I've got convolution -> pool -> dense -> dense, and for the optimiser I'm using Mini-Batch Gradient Descent with a batch size of 32. Now this concept of batch…
Alexander Soare
  • 1,319
  • 2
  • 11
  • 26
5
votes
2 answers

How to understand the concept of self-supervised learning in AI?

I am new to self-supervised learning and it all seems a little magical at the moment. The only way I can get an intuitive understanding is to assume that, for real-world problems, features are still embedded at a per-object level. For example, to…