Let $f(x)$ be an output of a neural network with input $x$.
My data is a pair $(x,y)$ and my loss function is a function of $f(x)$ and $f(y)$, i.e., $g(f(x),f(y))$.
What kind of architecture enables this learning? I could make two copies of the same neural network, but can the weights be coordinated in the back propagation?