Given the samples $\vec{x_i} \in \mathbb{R}^d, i \in [1,..,l]$ where $l$ is the number of training samples, $d$ is the number of input features, the related target values $y_i \in \mathbb{R}$, and the $l \times l$ matrix defined below:
Si,j=e−γS||→xi−→xj||2=e−γS(→x′i→xi−2→x′i→xj+→x′j→xj)
where $i \in [1,..,l], j \in [1,..,l]$, and $\gamma_S$ is another hyper-parameter, we would like to use with PyTorch the following custom loss for a regression task:
l∑i=1l∑j=1√|pi−yi|√|pj−yj|Si,j
where $p_i$ is the $i$-th estimation.
Our loss is implemented with this code:
def ourLoss(out, lab):
global stra, sc
abserr = torch.abs(out - lab).flatten().float()
serr = torch.sqrt(abserr)
bm = stra[sc : sc + out.shape[0], sc : sc + out.shape[0]].float()
loss = torch.dot(serr, torch.matmul(bm, serr))
return loss
where 'stra' is $S$, sc is a counter used for batch evaluations, and then the Adam optimizer returns a nan loss value...