For questions related to maximum likelihood estimation (MLE), which is a frequentist approach for estimating the parameters of an assumed probability distribution given some observed data. This is done by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The resulting estimate is known as the maximum likelihood estimate.
Questions tagged [maximum-likelihood]
14 questions
3
votes
1 answer
How can a probability density value be used for the likelihood calculation?
Consider our parametric model $p_\theta$ for an underlying probabilistic distribution $p_{data}$.
Now, the likelihood of an observation $x$ is generally defined as $L(\theta|x) = p_{\theta}(x)$.
The purpose of the likelihood is to quantify how good…

hanugm
- 3,571
- 3
- 18
- 50
3
votes
0 answers
Is maximum likelihood estimation meaningless for a dataset of only outliers?
From my understanding, maximum likelihood estimation chooses the set of parameters for the estimator that maximizes likelihood with the ground truth distribution.
I always interpreted it as the training set having a tendency to have most examples…

ashenoy
- 1,409
- 4
- 18
3
votes
1 answer
What is the relationship between MLE and naive Bayes?
I have found various references describing Naive Bayes and they all demonstrated that it used MLE for the calculation. However, this is my understanding:
$P(y=c|x)$ $\propto$ $P(x|y=c)P(y=c)$
with $c$ is the class the model may classify $y$ as.
And…

Shrike Danny
- 31
- 1
2
votes
1 answer
Understanding the math behind using maximum likelihood for linear regression
I understand both terms, linear regression and maximum likelihood, but, when it comes to the math, I am totally lost. So I am reading this article The Principle of Maximum Likelihood (by Suriyadeepan Ramamoorthy). It is really well written, but, as…

xava
- 423
- 1
- 3
- 9
2
votes
1 answer
What is emperical distribution in MLE?
I was reading the book Deep Learning by Ian Goodfellow. I had a doubt in the Maximum likelihood estimation section (Pg 131). I understand till the Eq 5.58 which describes what is being maximized in the problem.
$$
\theta_{\text{ML}} =…

ANIRUDH BUVANESH
- 23
- 3
2
votes
0 answers
Can the cross-entropy loss be used for a NLP task with LSTM?
I am trying to build an LSTM model to generate Shakspeare-like poems. I have training set $\{s_1,s_2, \dots,s_m\}$, which are sentences of Shakespeare poems, and each sentence contains words $\{w_1,w_2, \dots,w_n\}$.
To my understanding, each…

Leey
- 43
- 3
1
vote
1 answer
I am confused of derivation steps of MAP for linear regression
I am taking ML course and I am confused about some derivations of math
Could you explain the two steps I marked on the slides? For the first step, I thought $P(beta|X,y) = \frac{P(X,y|beta)P(beta)}{P(X,y)}$ but I don't know the further steps to…

tesio
- 185
- 4
1
vote
0 answers
Is VAE the same as the E-step of the EM algorithm?
EM(Expectation Maximum)
Target: maximize $p_\theta(x)$
$ p_\theta(x)=\frac{p_\theta(x, z)}{p_\theta(z \mid x)} \\\\$
Take log on both sides:
$ \log p_\theta(x)=\log p_\theta(x, z)-\log p_\theta(z \mid x) \\\\$
Introduce distribution $q_\phi(z)$:
$…

Garfield
- 11
- 1
1
vote
0 answers
Optimize parametric Log-Likelihood with a Decision Tree
Suppose there are some objects with features, and the target is parametric density estimation. Density estimation is model-based. Parameters are obtained by maximizing log-likelihood.
$LL = \sum_{i \in I_1} \log \left( \sum_{j \in K_i} \theta_j…

nekrald
- 11
- 2
1
vote
0 answers
Why can't recurrent neural network handle large corpus for obtaining embeddings?
In order to learn the embeddings, we need to train a model based on some objective function. The model can be an RNN and the objective function can be the likelihood. We learn the embeddings by calculating the likelihood, and the embeddings are…

hanugm
- 3,571
- 3
- 18
- 50
1
vote
0 answers
Estimating $\sigma_i$ according to maximum likelihood method
Let be a Bayesian multivariate normal distribution classifier with distinct covariance matrices for each class and isotropic, i.e. with equal values over the entire diagonal and zero otherwise, $\mathbf{\Sigma}_i=\sigma_i^2\mathbf{I},~\forall…

David
- 113
- 3
1
vote
2 answers
Can maximum likelihood be used as a classifier?
I am confused in understanding the maximum likelihood as a classifier. I know what is Bayesian network and I know that ML is used for estimating the parameters of models. Also, I read that there are two methods to learn the parameters of a Bayesian…

Atena
- 131
- 1
0
votes
1 answer
Can I sample finite or infinite images with AutoRegressive Models?
I'm learning about AutoRegressive Models used on images, and I've studied the training phase, where you model each pixel on the basis of the previous ones using a certain model architecture.
My question is about generating new images…

SuperFluo
- 1
- 1
0
votes
1 answer
Does the Bayesian MAP give a probability distribution over unseen t*?
I'm working my way through the Bayesian world. So far, I've understood that the MLE or the MPA are point estimates, therefore using such models just output one specific value and not a distribution.
Moreover, vanilla neuronal networks do in fact…

Micha Christ
- 19
- 2