Questions tagged [notation]

For questions related to notation (in general).

72 questions
11
votes
1 answer

What is the meaning of $V(D,G)$ in the GAN objective function?

Here is the GAN objective function. $$\min _{G} \max _{D} V(D, G)=\mathbb{E}_{\boldsymbol{x} \sim p_{\text {data }}(\boldsymbol{x})}[\log D(\boldsymbol{x})]+\mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}(\boldsymbol{z})}[\log…
8
votes
1 answer

How is the policy gradient calculated in REINFORCE?

Reading Sutton and Barto, I see the following in describing policy gradients: How is the gradient calculated with respect to an action (taken at time t)? I've read implementations of the algorithm, but conceptually I'm not sure I understand how the…
6
votes
2 answers

How are the reward functions $R(s)$, $R(s, a)$ and $R(s, a, s')$ equivalent?

In this video, the lecturer states that $R(s)$, $R(s, a)$ and $R(s, a, s')$ are equivalent representations of the reward function. Intuitively, this is the case, according to the same lecturer, because $s$ can be made to represent the state and the…
5
votes
1 answer

What does the argmax of the expectation of the log likelihood mean?

What does the following equation mean? What does each part of the formula represent or mean? $$\theta^* = \underset {\theta}{\arg \max} \Bbb E_{x \sim p_{data}} \log {p_{model}(x|\theta) }$$
5
votes
1 answer

What does the notation $\mathcal{N}(z; \mu, \sigma)$ stand for in statistics?

I know that the notation $\mathcal{N}(\mu, \sigma)$ stands for a normal distribution. But I'm reading the book "An Introduction to Variational Autoencoders" and in it, there is this notation: $$\mathcal{N}(z; 0, I)$$ What does it mean? picture of…
5
votes
1 answer

What is the meaning of the square brackets in ant colony optimization?

I'm studying the paper "Minimizing Total Tardiness on a Single Machine Using Ant Colony Optimization" which has proposed to use Ant colony optimization to SMTWTP. According to this paper: Each artificial ant iteratively and independently decides…
5
votes
1 answer

Understanding the equation of TD(0) in the paper "Learning to predict by the methods of temporal differences"

In the paper Learning to predict by the methods of temporal differences (p. 15), the weights in the temporal difference learning are updated as given by the equation $$ \Delta w_t = \alpha \left(P_{t+1} - P_t\right) \sum_{k=1}^{t}{\lambda^{t-k}…
4
votes
1 answer

Is the Bandit Problem an MDP?

I've read Sutton and Barto's introductory RL book. They define a policy as a mapping from states to probabilities of selecting each possible action. If the agent is following policy $\pi$ at time $t$, then $\pi(a|s)$ as the probability of taking…
4
votes
2 answers

Why do we use $X_{I_t,t}$ and $v_{I_t}$ to denote the reward received and the at time step $t$ and the distribution of the chosen arm $I_t$?

I'm doing some introductory research on classical (stochastic) MABs. However, I'm a little confused about the common notation (e.g. in the popular paper of Auer (2002) or Bubeck and Cesa-Bianchi (2012)). As in the latter study, let us consider an…
4
votes
1 answer

What does the term $|\mathcal{A}(s)|$ mean in the $\epsilon$-greedy policy?

I've been looking online for a while for a source that explains these computations but I can't find anywhere what does the $|A(s)|$ mean. I guess $A$ is the action set but I'm not sure about that notation: $$\frac{\varepsilon}{|\mathcal{A}(s)|}…
4
votes
2 answers

Why are the value functions sometimes written with capital letters and other times with lower-case letters?

Why are the state-value and action-value functions are sometimes written in small letters and other times in capitals? For instance, why in the Q-learning algorithm (page 131 of Barto and Sutton's book but not only), we the capitals are used $Q(S,…
d56
  • 223
  • 1
  • 7
4
votes
1 answer

What do the subscripts mean in $N_{t,n,\sigma,L}$?

A neural network can apparently be denoted as $N_{t,n,\sigma,L}$. What do these subscripts $t, n, \sigma$ and $L$ mean? Could you link me to a paper, article or webpage with an explanation for this?
J. Doe
  • 143
  • 5
4
votes
1 answer

Being confused of distribution notations in Deep Learning book

In chapter 5 of Deep Learning book of Ian Goodfellow, some notations in the loss function as below make me really confused. I tried to understand $x,y \sim p_{data}$ means a sample $(x, y)$ sampled from original dataset distribution (or $y$ is the…
David Ng
  • 143
  • 4
3
votes
1 answer

What does the notation $\nabla_\theta \mathcal{L}$ mean?

Here's the general algorithm of maximum entropy inverse reinforcement learning. This uses a gradient descent algorithm. The point that I do not understand is there is only a single gradient value $\nabla_\theta \mathcal{L}$, and it is used to…
3
votes
1 answer

How to interpret the policy notation $\pi_{\theta}(a_{t}|s_{t})$ in Reinforcement Learning?

In the context of Reinforcement Learning, I have seen that the policy $\pi$ (for some algorithms) is nothing but a Neural Network architecture (for example a Feedforward Neural Network). This policy is usually annotated as $\pi_{\theta}$, suggesting…
moth123
  • 31
  • 2
1
2 3 4 5