0

I don't quite understand why, in Conditional Variational Autoencoder (CVAE), we concatenate a conditioning vector two times, at encoder and decoder respectively. After we concatenate it once at the beginning, isn't the latent distribution going to already incorporate knowledge about the label?

I know that, from a practical perspective we need to concatenate it at the decoder as well, since we want to be able then to generate new instances associating a specific label, but I'm missing more the mathematical motivations for which we concatenate it two times.

To be more precise, let's consider how the objective function of CVAE is defined:

$$\mathcal{L}_{CVAE} = \mathbb{E}_\textbf{z}[log \, p_\theta(\textbf{x}|\textbf{z},\textbf{c})] - D_{KL}[q_\phi(\textbf{z}|\textbf{x},\textbf{c})||p(\textbf{z}|\textbf{c})] $$

My question: Why do we condition on $\textbf{c}$ in both terms?

nbro
  • 39,006
  • 12
  • 98
  • 176
James Arten
  • 297
  • 1
  • 8
  • To make sure I understand your question, in the objective function, $c$ is what you call _label_? [This paper](https://papers.nips.cc/paper/2015/file/8d55a249e6baa5c06772297520da2051-Paper.pdf) seems to use a different notation for the objective function. It might be a good idea to provide the reference you're using. – nbro Mar 06 '22 at 00:04
  • Exactly, $\textbf{c}$ is the label in the objective function I have specified – James Arten Mar 06 '22 at 00:16
  • So, your question is: "In CVAE, why are both the encoder and decoder defined as a function or probability distribution that is conditioned on the label"? – nbro Mar 07 '22 at 08:59
  • Yes that's pretty much what I wanted to ask – James Arten Mar 07 '22 at 09:52
  • Ok, can you please provide the source of this equation? Because I'm looking at the original paper, as I already said, and it uses a different notation, and it's harder for me to associate the symbols there with yours. It seems that you're using $c$ to denote what the authors of the paper denote by $x$, and you use $x$ to denote what they denote by $y$, the label or output variable. $z$ is the latent vector both in your case and in the original paper. $c$ seems to be the input to the auto-encoder, not a label. – nbro Mar 17 '22 at 10:04
  • 1
    So, they are not conditioning on a "label" (which would be $y$ in the paper and $x$ in your notation), but conditioning on the inputs. So, please, edit your post to reflect this comment and to either make your notation equal to the original paper's one or to provide a reference that uses this notation. Moreover, your question "Why do we condition both on encoder and decoder?" is not specific and clear enough. Because, even in the original ELBO, you condition on something. You want to ask why we're conditioning on $c$ in both terms. – nbro Mar 17 '22 at 10:05
  • That's exactly what I'm trying to better frame. Why do we condition on $\textbf{c}$ in both terms of the loss? I've modified the question clearly expressing this doubt. – James Arten Mar 17 '22 at 14:14
  • I had understood that this was your question, but I recommend that you follow my other suggestions in the comments above. – nbro Mar 17 '22 at 14:15

0 Answers0