Questions tagged [probability]

For question involving probability as related to AI methods. (This tag is for general usage. Feel free to utilize in conjunction with the "math" and more specific probability tags.)

https://en.wikipedia.org/wiki/Probability

59 questions
17
votes
2 answers

Are softmax outputs of classifiers true probabilities?

BACKGROUND: The softmax function is the most common choice for an activation function for the last dense layer of a multiclass neural network classifier. The outputs of the softmax function have mathematical properties of probabilities and are--in…
10
votes
2 answers

Is Nassim Taleb right about AI not being able to accurately predict certain types of distributions?

So Taleb has two heuristics to generally describe data distributions. One is Mediocristan, which basically means things that are on a Gaussian distribution such as height and/or weight of people. The other is called Extremistan, which describes a…
7
votes
2 answers

What is a Markov chain and how can it be used in creating artificial intelligence?

I believe a Markov chain is a sequence of events where each subsequent event depends probabilistically on the current event. What are examples of the application of a Markov chain and can it be used to create artificial intelligence? Would a…
WilliamKF
  • 2,493
  • 1
  • 24
  • 31
6
votes
2 answers

Are probabilistic models dead ends in AI?

I am a strong believer of Marvin Minsky's idea about Artificial General Intelligence (AGI) and one of his thoughts was that probabilistic models are dead ends in the field of AGI. I would really like to know the thoughts and ideas of people who…
Parth Raghav
  • 345
  • 1
  • 7
5
votes
1 answer

What does the argmax of the expectation of the log likelihood mean?

What does the following equation mean? What does each part of the formula represent or mean? $$\theta^* = \underset {\theta}{\arg \max} \Bbb E_{x \sim p_{data}} \log {p_{model}(x|\theta) }$$
3
votes
1 answer

How does $\mathbb{E}$ suddenly change to $\mathbb{E}_{\pi'}$ in this equation?

In Sutton-Barto's book on page 63 (81 of the pdf): $$\mathbb{E}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_t=s,A_t=\pi'(s)] = \mathbb{E}_{\pi'}[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_{t} = s]$$ How does $\mathbb{E}$ suddenly change to…
3
votes
1 answer

How can supervised learning be viewed as a conditional probability of the labels given the inputs?

In the literature and textbooks, one often sees supervised learning expressed as a conditional probability, e.g., $$\rho(\vec{y}|\vec{x},\vec{\theta})$$ where $\vec{\theta}$ denotes a learned set of network parameters, $\vec{x}$ is an arbitrary…
3
votes
1 answer

Viterbi versus filtering

In Chapter 15 of Russel and Norvig's Artificial Intelligence -- A Modern Approach (Third Edition), they describe three basic tasks in temporal inference: Filtering, Likelihood, and Finding the Most Likely Sequence. My question is on the…
vdbuss
  • 81
  • 3
2
votes
1 answer

SEIF motion update algorithm doubt

I want to implement Sparse Extended information slam. There is four step to implement it. The algorithm is available in Probabilistic Robotics Book at page 310, Table 12.3. In this algorithm line no:13 is not very clear to me. I have 15 landmarks.…
2
votes
1 answer

Why do I get small probabilities when implementing a multinomial naive Bayes text classification model?

When applying multinomial Naive Bayes text classification, I get very small probabilities (around $10e^{-48}$), so there's no way for me to know which classes are valid predictions and which ones are not. I'd the probabilities to be in the interval…
2
votes
1 answer

How can I improve this word-prediction AI?

I'm relatively new to AI, and I've tried to create one that "speaks". Here's how it works: 1. Get training data e.g 'Jim ran to the shop to buy candy' 2. The data gets split into overlapping 'chains' of three e.g ['Jim ran to', 'ran to the', 'to the…
user117279
  • 35
  • 2
2
votes
1 answer

What exactly is a Parzen?

I came across the term "Parzen" while reading the research paper titled Generative Adversarial Nets. It has been used in the research paper in two contexts. #1: In phrase "Parzen window" We estimate probability of the test set data under $p_g$ by…
hanugm
  • 3,571
  • 3
  • 18
  • 50
2
votes
0 answers

PPO2: Intuition behind Gumbel Softmax and Exploration?

I'm trying to understand the logic behind the magic of using the gumbel distribution for action sampling inside the PPO2 algorithm. This code snippet implements the action sampling, taken from here: def sample(self): u =…
2
votes
1 answer

Aren't scores in the Wasserstein GAN probabilities?

I am quite new to GAN and I am reading about WGAN vs DCGAN. Relating to the Wasserstein GAN (WGAN), I read here Instead of using a discriminator to classify or predict the probability of generated images as being real or fake, the WGAN changes or…
2
votes
0 answers

Is the generator distribution in GAN's continuous or discrete?

I have some trouble with the probability densities described in the original paper. My question is based on Goodfellow's paper and tutorial, respectively: Generative Adversarial Networks and NIPS 2016 Tutorial: Generative Adversarial Networks. When…
1
2 3 4