Highest Voted 'probability-distribution' Questions - Artificial Intelligence Stack Exchange

8

votes

1 answer

What are the main benefits of using Bayesian networks?

I have some trouble understanding the benefits of Bayesian networks. Am I correct that the key benefit of the network is that one does not need to use the chain rule of probability in order to calculate joint distributions? So, using the chain…

asked Feb 18 '19 at 11:53

Sebastian Dine

181
1

7

votes

2 answers

Why is KL divergence used so often in Machine Learning?

The KL Divergence is quite easy to compute in closed form for simple distributions -such as Gaussians- but has some not-very-nice properties. For example, it is not symmetrical (thus it is not a metric) and it does not respect the triangular…

probability-distribution kl-divergence wasserstein-metric total-variational-distance

asked Dec 15 '20 at 14:20

Federico Taschin

233
1
6

7

votes

1 answer

What loss function to use when labels are probabilities?

What loss function is most appropriate when training a model with target values that are probabilities? For example, I have a 3-output model. I want to train it with a feature vector $x=[x_1, x_2, \dots, x_N]$ and a target $y=[0.2, 0.3, 0.5]$. It…

neural-networks machine-learning objective-functions probability-distribution

asked Apr 14 '19 at 22:13

Thomas Johnson

173
3

5

votes

1 answer

Many of the best probabilistic models represent probability distributions only implicitly

I am currently studying Deep Learning by Goodfellow, Bengio, and Courville. In chapter 5.1.2 The Performance Measure, P, the authors say the following: The choice of performance measure may seem straightforward and objective, but it is often…

deep-learning probability-distribution

asked Feb 25 '20 at 08:09

The Pointer

527
3
17

5

votes

1 answer

Why is the Jensen-Shannon divergence preferred over the KL divergence in measuring the performance of a generative network?

I have read articles on how Jensen-Shannon divergence is preferred over Kullback-Leibler in measuring how good a distribution mapping is learned in a generative network because of the fact that JS-divergence better measures distribution similarity…

objective-functions generative-adversarial-networks probability-distribution kl-divergence jensen-shannon-divergence

asked Nov 11 '19 at 16:01

ashenoy

1,409
4
18

4

votes

1 answer

Why do we sample vectors from a standard normal distribution for the generator?

I am new to GANs. I noticed that everybody generates a random vector (usually 100 dimensional) from a standard normal distribution $N(0, 1)$. My question is: why? Why don't they sample these vectors from a uniform distribution $U(0, 1)$? Does the…

generative-adversarial-networks probability-distribution normal-distribution uniform-distribution

asked Mar 13 '21 at 17:35

dato nefaridze

862
6
20

4

votes

1 answer

In deep learning, do we learn a continuous distribution based on the training dataset?

At least at some level, maybe not end-to-end always, but deep learning always learns a function, essentially a mapping from a domain to a range. The domain and range, at least in most cases, would be multi-variate. So, when a model learns a…

neural-networks deep-learning probability-distribution computational-learning-theory

asked Sep 18 '19 at 15:05

ashenoy

1,409
4
18

4

votes

1 answer

How are the parameters of the Bernoulli distribution learned?

In the paper Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask, they learn a mask for the network by setting up the mask parameters as $M_i = Bern(\sigma(v_i))$. Where $M$ is the parameter mask ($f(x;\theta, M) = f(x;M \odot \theta$),…

machine-learning probability-distribution weights

asked Jul 05 '19 at 13:55

mshlis

2,349
7
23

3

votes

0 answers

Relation between SDE diffusion and DDPM/DDIM

Background & Definitions In DDPM, the diffusion backward step is described as follows (where $z\sim \mathcal{N}(0,I)$ and $x_{T}\sim \mathcal{N}(0,I)$): and in DDIM we have while in the SDE formulation (from the Fokker-Planck equation) the step…

neural-networks probability-distribution diffusion-models

asked Aug 24 '23 at 12:46

snatchysquid

129
5

3

votes

1 answer

How can I make an MNIST digit recognizer that rejects out-of-distribution data?

I've done an MNIST digit recognition neural network. When you put images in that are completely unlike its training data, it still tries to classify them as digits. Sometimes it strongly classifies nonsense data as being a specific digit. I am…

probability-distribution mnist

asked Apr 24 '23 at 09:39

river

133
6

3

votes

1 answer

How can a probability density value be used for the likelihood calculation?

Consider our parametric model $p_\theta$ for an underlying probabilistic distribution $p_{data}$. Now, the likelihood of an observation $x$ is generally defined as $L(\theta|x) = p_{\theta}(x)$. The purpose of the likelihood is to quantify how good…

machine-learning comparison probability-distribution maximum-likelihood

asked Jan 08 '21 at 10:35

hanugm

3,571
3
18
50

3

votes

1 answer

Is this referring to the true underlying distribution, or the distribution of our sample?

I am currently studying the paper Learning and Evaluating Classifiers under Sample Selection Bias by Bianca Zadrozny. In the introduction, the author says the following: One of the most common assumptions in the design of learning algorithms is…

machine-learning classification probability-distribution selection-bias

asked Nov 25 '20 at 05:40

The Pointer

527
3
17

3

votes

2 answers

When should one prefer using Total Variational Divergence over KL divergence in RL

In RL, both the KL divergence (DKL) and Total variational divergence (DTV) are used to measure the distance between two policies. I'm most familiar with using DKL as an early stopping metric during policy updates to ensure the new policy doesn't…

reinforcement-learning comparison probability-distribution kl-divergence total-variational-distance

asked Oct 07 '20 at 17:03

mugoh

531
4
20

3

votes

1 answer

How does $\mathbb{E}$ suddenly change to $\mathbb{E}_{\pi'}$ in this equation?

In Sutton-Barto's book on page 63 (81 of the pdf): $$\mathbb{E}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_t=s,A_t=\pi'(s)] = \mathbb{E}_{\pi'}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_{t} = s]$$ How does $\mathbb{E}$ suddenly change to…

reinforcement-learning probability probability-distribution expectation statistics

asked Jun 07 '20 at 02:17

ZERO NULLS

147
8

3

votes

1 answer

What is the difference between model and data distributions?

Is there any difference between the model distribution and data distribution, or are they the same?

machine-learning comparison models probability-distribution statistics

asked Apr 05 '20 at 09:30

Bhuwan Bhatt

394
1
11

Questions tagged [probability-distribution]