Most Popular
1500 questions
16
votes
4 answers
1 hidden layer with 1000 neurons vs. 10 hidden layers with 100 neurons
These types of questions may be problem-dependent, but I have tried to find research that addresses the question whether the number of hidden layers and their size (number of neurons in each layer) really matter or not.
So my question is, does it…

Stephen Johnson
- 969
- 2
- 8
- 9
16
votes
3 answers
What roles knowledge bases play now and will play in the future?
Nowadays, artificial intelligence seems almost equal to machine learning, especially deep learning. Some have said that deep learning will replace human experts, traditionally very important for feature engineering, in this field. It is said that…

Lerner Zhang
- 877
- 1
- 7
- 19
16
votes
2 answers
Why does GPT-2 Exclude the Transformer Encoder?
After looking into transformers, BERT, and GPT-2, from what I understand, GPT-2 essentially uses only the decoder part of the original transformer architecture and uses masked self-attention that can only look at prior tokens.
Why does GPT-2 not…

Athena Wisdom
- 311
- 2
- 5
16
votes
2 answers
What is the difference between Q-learning, Deep Q-learning and Deep Q-network?
Q-learning uses a table to store all state-action pairs. Q-learning is a model-free RL algorithm, so how could there be the one called Deep Q-learning, as deep means using DNN; or maybe the state-action table (Q-table) is still there but the DNN is…

Dee
- 1,283
- 1
- 11
- 35
16
votes
2 answers
Why is Sanskrit the best language for AI?
According to NASA scientist Rick Briggs, Sanskrit is the best language for AI. I want to know how Sanskrit is useful. What's the problem with other languages? Are they really using Sanskrit in AI programming or going to do so? What part of an AI…

Rahul
- 169
- 1
- 1
- 3
16
votes
1 answer
Why is automated theorem proving so hard?
The problem of automated theorem proving (ATP) seems to be very similar to playing board games (e.g. chess, go, etc.): it can also be naturally stated as a problem of a decision tree traversal. However, there is a dramatic difference in progress on…

Ivan Ivanov
- 161
- 2
16
votes
8 answers
How to classify data which is spiral in shape?
I have been messing around in tensorflow playground. One of the input data sets is a spiral. No matter what input parameters I choose, no matter how wide and deep the neural network I make, I cannot fit the spiral. How do data scientists fit data of…

Souradeep Nanda
- 263
- 1
- 2
- 7
16
votes
1 answer
Will parameter sweeping on one split of data followed by cross validation discover the right hyperparameters?
Let's call our dataset splits train/test/evaluate. We're in a situation where we require months of data. So we prefer to use the evaluation dataset as infrequently as possible to avoid polluting our results. Instead, we do 10 fold cross validation…

Philipp Cannons
- 161
- 6
16
votes
2 answers
How can I automate the choice of the architecture of a neural network for an arbitrary problem?
Assume that I want to solve an issue with a neural network that either I can't fit to existing architectures (perceptron, Konohen, etc) or I'm simply not aware of the existence of those or I'm unable to understand their mechanics and I rely on my…

Zoltán Schmidt
- 623
- 7
- 14
16
votes
1 answer
How to stay a up-to-date researcher in ML/RL community?
As a student who wants to work on machine learning, I would like to know how it is possible to start my studies and how to follow it to stay up-to-date. For example, I am willing to work on RL and MAB problems, but there are huge literatures on…

Amin
- 471
- 2
- 11
16
votes
3 answers
Is the optimal policy always stochastic if the environment is also stochastic?
Is the optimal policy always stochastic (that is, a map from states to a probability distribution over actions) if the environment is also stochastic?
Intuitively, if the environment is deterministic (that is, if the agent is in a state $s$ and…

nbro
- 39,006
- 12
- 98
- 176
15
votes
1 answer
What is the difference between a receptive field and a feature map?
In a CNN, the receptive field is the portion of the image used to compute the filter's output. But one filter's output (which is also called a "feature map") is the next filter's input.
What's the difference between a receptive field and a feature…

Monica Heddneck
- 251
- 2
- 4
15
votes
1 answer
Why does the policy network in AlphaZero work?
In AlphaZero, the policy network (or head of the network) maps game states to a distribution of the likelihood of taking each action. This distribution covers all possible actions from that state.
How is such a network possible? The possible actions…

chessprogrammer
- 2,215
- 2
- 12
- 23
15
votes
4 answers
What does "stationary" mean in the context of reinforcement learning?
I think I've seen the expressions "stationary data", "stationary dynamics" and "stationary policy", among others, in the context of reinforcement learning. What does it mean? I think stationary policy means that the policy does not depend on time,…

Paula Vega
- 428
- 4
- 8
15
votes
3 answers
Does Monte Carlo tree search qualify as machine learning?
To the best of my understanding, the Monte Carlo tree search (MCTS) algorithm is an alternative to minimax for searching a tree of nodes. It works by choosing a move (generally, the one with the highest chance of being the best), and then performing…

Inertial Ignorance
- 501
- 3
- 13