For questions that ask about or call for proofs for specific assertions, whether they be proofs of theorems or corollaries, proofs of concept through working implementation, counter proofs, or counter examples.
Questions tagged [proofs]
116 questions
27
votes
3 answers
Where can I find the proof of the universal approximation theorem?
The Wikipedia article for the universal approximation theorem cites a version of the universal approximation theorem for Lebesgue-measurable functions from this conference paper. However, the paper does not include the proofs of the theorem. Does…

Leroy Od
- 435
- 1
- 4
- 4
22
votes
3 answers
Why doesn't Q-learning converge when using function approximation?
The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied
$\sum_{t} \alpha_t(s, a) = \infty$
$\sum_{t}…

nbro
- 39,006
- 12
- 98
- 176
14
votes
1 answer
What are the implications of the "No Free Lunch" theorem for machine learning?
The No Free Lunch (NFL) theorem states (see the paper Coevolutionary Free Lunches by David H. Wolpert and William G. Macready)
any two algorithms are equivalent when their performance is averaged across all possible problems
Is the "No Free Lunch"…
user9947
13
votes
1 answer
Why is A* optimal if the heuristic function is admissible?
A heuristic is admissible if it never overestimates the true cost to reach the goal node from $n$. If a heuristic is consistent, then the heuristic value of $n$ is never greater than the cost of its successor, $n'$, plus the successor's heuristic…

Wizard
- 303
- 1
- 2
- 6
13
votes
5 answers
Is there a rigorous proof that AGI is possible, at least, in theory?
It is often implicitly assumed in computer science that the human mind, or at least some mechanical calculations that humans perform (see the Church-Turing thesis), can be replicated with a Turing machine, therefore Artificial General Intelligence…

yters
- 387
- 2
- 10
11
votes
2 answers
How do we prove the n-step return error reduction property?
In section 7.1 (about the n-step bootstrapping) of the book Reinforcement Learning: An Introduction (2nd edition), by Andrew Barto and Richard S. Sutton, the authors write about what they call the "n-step return error reduction property":
But they…

123learn
- 111
- 3
9
votes
2 answers
Why is baseline conditional on state at some timestep unbiased?
In the homework for the Berkeley RL class, problem 1, it asks you to show that the policy gradient is still unbiased if the baseline subtracted is a function of the state at time step $t$.
$$ \triangledown _\theta \sum_{t=1}^T \mathbb{E}_{(s_t,a_t)…

Laura C
- 91
- 2
9
votes
2 answers
Why are the Bellman operators contractions?
In these slides, it is written
\begin{align}
\left\|T^{\pi} V-T^{\pi} U\right\|_{\infty} & \leq \gamma\|V-U\|_{\infty} \tag{9} \label{9} \\
\|T V-T U\|_{\infty} & \leq \gamma\|V-U\|_{\infty} \tag{10} \label{10}
\end{align}
where
$F$ is the space of…

kevin
- 191
- 1
- 4
8
votes
1 answer
How can a neural network approximate all functions when the weights are not allowed to grow exponentially?
It has been proven in the paper "Approximation by Superpositions of a Sigmoidal Function" (by Cybenko, in 1989) that neural networks are universal function approximators. I have a related question.
Assume the neural network's input and output…

Yan King Yin
- 235
- 1
- 8
8
votes
1 answer
Can deep learning be used to help mathematical research?
I am currently learning about deep learning and artificial intelligence and exploring his possibilities, and, as a mathematician at heart, I am inquisitive about how it can be used to solve problems in mathematics.
Seeing how well recurrent neural…

Antoine Labelle
- 141
- 6
8
votes
2 answers
What is the proof that policy evaluation converges to the optimal solution?
Although I know how the algorithm of iterative policy evaluation using dynamic programming works, I am having a hard time realizing how it actually converges.
It appeals to intuition that, with each iteration, we get a better and better…

SAGALPREET SINGH
- 147
- 1
- 6
7
votes
2 answers
How is this Pytorch expression equivalent to the KL divergence?
I found the following PyTorch code (from this link)
-0.5 * torch.sum(1 + sigma - mu.pow(2) - sigma.exp())
where mu is the mean parameter that comes out of the model and sigma is the sigma parameter out of the encoder. This expression is apparently…

user8714896
- 717
- 1
- 4
- 21
7
votes
1 answer
Why does a negative reward for every step really encourage the agent to reach the goal as quickly as possible?
If we shift the rewards by any constant (which is a type of reward shaping), the optimal state-action value function (and so optimal policy) does not change. The proof of this fact can be found here.
If that's the case, then why does a negative…

nbro
- 39,006
- 12
- 98
- 176
7
votes
0 answers
Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?
It is proved that the Bellman update is a contraction (1).
Here is the Bellman update that is used for Q-Learning:
$$Q_{t+1}(s, a) = Q_{t}(s, a) + \alpha*(r(s, a, s') + \gamma \max_{a^*} (Q_{t}(s',
a^*)) - Q_t(s,a)) \tag{1} \label{1}$$
The proof…

sirfroggy
- 71
- 3
6
votes
0 answers
Proof that there always exists a dominating policy in an MDP
I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\pi'$: $$V_\pi (s) \geq V_{\pi'}(s) \quad…

MMM
- 185
- 3