From the part titled Introducing Latent Variables under subsection 2.2 in this tutorial:
Introducing Latent Variables. Suppose we want to model an $m$-dimensional unknown probability distribution $q$ (e.g., each component of a sample corresponds to one of m pixels of an image). Typically, not all variables $\mathbf{X} = (X_v)_{v \in V}$ in an MRF need to correspond to some observed component, and the number of nodes is larger than $m$. We split $\mathbf{X}$ into visible (or observed) variables $\mathbf{V} = (V_1,...,V_m)$ corresponding to the components of the observations and latent (or hidden) variables $\mathbf{H} = (H_1,...,H_n)$ given by the remaining $n = |\mathbf{V}| − m$ variables. Using latent variables allows to describe complex distributions over the visible variables by means of simple (conditional) distributions. In this case the Gibbs distribution of an MRF describes the joint probability distribution of $(\mathbf{V},\mathbf{H})$ and one is usually interested in the marginal distribution of $\mathbf{V}$ which is given by: $$p(\mathbf{v}) = \sum_{\mathbf{h}} p(\mathbf{v},\mathbf{h}) = \frac{1}{Z} \sum_{\mathbf{h}} e^{-E(\mathbf{v},\mathbf{h})}$$ where $Z = \sum_{\mathbf{v},\mathbf{h}} e^{-E(\mathbf{v},\mathbf{h})}$. While the visible variables correspond to the components of an observation, the latent variables introduce dependencies between the visible variables (e.g., between pixels of an input image).
I have a question about this part:
While the visible variables correspond to the components of an observation, the latent variables introduce dependencies between the visible variables (e.g., between pixels of an input image).
Given a set of nodes $\mathbf{X}$ in a Markov Random Field $G$, the joint distribution of all the nodes is given by:
$$p(\mathbf{X}) = \frac{1}{Z} \prod_{c \in C} \phi(c)$$
Where $Z$ is the partition function and $C$ is the set of cliques in $G$. To ensure that the joint distribution is positive, the following factors can be used:
$$\phi(c) = e^{-E(c)}$$
Such that:
$$p(\mathbf{X}) = \frac{1}{Z} e^{-\sum_{c \in C} E(c)}$$
Where $E$ is the energy function.
I am not sure why there is a need to introduce hidden variables and express $p(\mathbf{v})$ as a marginalization of $p(\mathbf{v},\mathbf{h})$ over $\mathbf{h}$. Why can't $p(\mathbf{v})$ be expressed as:
$$p(\mathbf{v}) = \frac{1}{Z} e^{-\sum_{v \in \mathbf{v}} E(v)}$$
directly? I think it may be because the factors only encode dependencies between variables in cliques, and so may not be able to encode dependencies between variables that are in two separate cliques. The purpose of the hidden variables are then to encode these "long-range" dependencies between visible variables not in cliques. However, I am not sure about this reasoning.
Any help would be greatly appreciated.
By the way, I am aware of this question, but I think the answer is not specific enough.