Highest Voted 'proofs' Questions - Artificial Intelligence Stack Exchange

27

votes

3 answers

Where can I find the proof of the universal approximation theorem?

The Wikipedia article for the universal approximation theorem cites a version of the universal approximation theorem for Lebesgue-measurable functions from this conference paper. However, the paper does not include the proofs of the theorem. Does…

asked Jul 11 '19 at 08:40

Leroy Od

435
1
4
4

22

votes

3 answers

Why doesn't Q-learning converge when using function approximation?

The tabular Q-learning algorithm is guaranteed to find the optimal $Q$ function, $Q^*$, provided the following conditions (the Robbins-Monro conditions) regarding the learning rate are satisfied $\sum_{t} \alpha_t(s, a) = \infty$ $\sum_{t}…

reinforcement-learning q-learning deep-rl proofs function-approximation

asked Apr 05 '19 at 18:23

nbro

39,006
12
98
176

14

votes

1 answer

What are the implications of the "No Free Lunch" theorem for machine learning?

The No Free Lunch (NFL) theorem states (see the paper Coevolutionary Free Lunches by David H. Wolpert and William G. Macready) any two algorithms are equivalent when their performance is averaged across all possible problems Is the "No Free Lunch"…

machine-learning optimization proofs no-free-lunch-theorems

asked Sep 27 '19 at 13:52

user9947

13

votes

1 answer

Why is A* optimal if the heuristic function is admissible?

A heuristic is admissible if it never overestimates the true cost to reach the goal node from $n$. If a heuristic is consistent, then the heuristic value of $n$ is never greater than the cost of its successor, $n'$, plus the successor's heuristic…

search proofs heuristics a-star admissible-heuristic

asked Apr 13 '18 at 02:25

Wizard

303
1
2
6

13

votes

5 answers

Is there a rigorous proof that AGI is possible, at least, in theory?

It is often implicitly assumed in computer science that the human mind, or at least some mechanical calculations that humans perform (see the Church-Turing thesis), can be replicated with a Turing machine, therefore Artificial General Intelligence…

philosophy agi proofs neuroscience computational-theory-of-mind

asked Jul 03 '17 at 16:55

yters

387
2
10

11

votes

2 answers

How do we prove the n-step return error reduction property?

In section 7.1 (about the n-step bootstrapping) of the book Reinforcement Learning: An Introduction (2nd edition), by Andrew Barto and Richard S. Sutton, the authors write about what they call the "n-step return error reduction property": But they…

reinforcement-learning q-learning math proofs sutton-barto

asked Dec 08 '18 at 05:24

123learn

111
3

9

votes

2 answers

Why is baseline conditional on state at some timestep unbiased?

In the homework for the Berkeley RL class, problem 1, it asks you to show that the policy gradient is still unbiased if the baseline subtracted is a function of the state at time step $t$. $$ \triangledown _\theta \sum_{t=1}^T \mathbb{E}_{(s_t,a_t)…

reinforcement-learning policy-gradients proofs

asked Sep 09 '18 at 20:31

Laura C

91
2

9

votes

2 answers

Why are the Bellman operators contractions?

In these slides, it is written \begin{align} \left\|T^{\pi} V-T^{\pi} U\right\|_{\infty} & \leq \gamma\|V-U\|_{\infty} \tag{9} \label{9} \\ \|T V-T U\|_{\infty} & \leq \gamma\|V-U\|_{\infty} \tag{10} \label{10} \end{align} where $F$ is the space of…

reinforcement-learning proofs policy-iteration bellman-equations bellman-operators

asked Jul 31 '20 at 02:48

kevin

191
1
4

8

votes

1 answer

How can a neural network approximate all functions when the weights are not allowed to grow exponentially?

It has been proven in the paper "Approximation by Superpositions of a Sigmoidal Function" (by Cybenko, in 1989) that neural networks are universal function approximators. I have a related question. Assume the neural network's input and output…

neural-networks machine-learning proofs

asked Aug 05 '18 at 16:12

Yan King Yin

235
1
8

8

votes

1 answer

Can deep learning be used to help mathematical research?

I am currently learning about deep learning and artificial intelligence and exploring his possibilities, and, as a mathematician at heart, I am inquisitive about how it can be used to solve problems in mathematics. Seeing how well recurrent neural…

deep-learning math proofs coq

asked May 21 '20 at 15:40

Antoine Labelle

141
6

8

votes

2 answers

What is the proof that policy evaluation converges to the optimal solution?

Although I know how the algorithm of iterative policy evaluation using dynamic programming works, I am having a hard time realizing how it actually converges. It appeals to intuition that, with each iteration, we get a better and better…

reinforcement-learning reference-request proofs bellman-equations policy-evaluation

asked Apr 16 '20 at 06:44

SAGALPREET SINGH

147
1
6

7

votes

2 answers

How is this Pytorch expression equivalent to the KL divergence?

I found the following PyTorch code (from this link) -0.5 * torch.sum(1 + sigma - mu.pow(2) - sigma.exp()) where mu is the mean parameter that comes out of the model and sigma is the sigma parameter out of the encoder. This expression is apparently…

pytorch proofs implementation variational-autoencoder kl-divergence

asked Feb 13 '21 at 09:53

user8714896

717
1
4
21

7

votes

1 answer

Why does a negative reward for every step really encourage the agent to reach the goal as quickly as possible?

If we shift the rewards by any constant (which is a type of reward shaping), the optimal state-action value function (and so optimal policy) does not change. The proof of this fact can be found here. If that's the case, then why does a negative…

reinforcement-learning proofs reward-shaping reward-functions

asked Nov 01 '20 at 23:09

nbro

39,006
12
98
176

7

votes

0 answers

Is the Bellman equation that uses sampling weighted by the Q values (instead of max) a contraction?

It is proved that the Bellman update is a contraction (1). Here is the Bellman update that is used for Q-Learning: $$Q_{t+1}(s, a) = Q_{t}(s, a) + \alpha*(r(s, a, s') + \gamma \max_{a^*} (Q_{t}(s', a^*)) - Q_t(s,a)) \tag{1} \label{1}$$ The proof…

reinforcement-learning q-learning proofs convergence bellman-equations

asked Jul 23 '20 at 17:32

sirfroggy

71
3

6

votes

0 answers

Proof that there always exists a dominating policy in an MDP

I think that it is common knowledge that for any infinite horizon discounted MDP $(S, A, P, r, \gamma)$, there always exists a dominating policy $\pi$, i.e. a policy $\pi$ such that for all policies $\pi'$: $$V_\pi (s) \geq V_{\pi'}(s) \quad…

reinforcement-learning markov-decision-process proofs policies

asked Jul 16 '21 at 16:21

MMM

185
3

Questions tagged [proofs]