What's wrong with our loss and PyTorch?

Question

Given the samples $\vec{x_i} \in \mathbb{R}^d, i \in [1,..,l]$ where $l$ is the number of training samples, $d$ is the number of input features, the related target values $y_i \in \mathbb{R}$, and the $l \times l$ matrix defined below:

$S_{i,j} = e^{-\gamma_S ||\vec{x}_i - \vec{x}_j||^2} = e^{-\gamma_S (\vec{x}_i' \vec{x}_i -2 \vec{x}_i' \vec{x}_j + \vec{x}_j'\vec{x}_j)}$

where $i \in [1,..,l], j \in [1,..,l]$, and $\gamma_S$ is another hyper-parameter, we would like to use with PyTorch the following custom loss for a regression task:

$\sum_{i=1}^l \sum_{j=1}^l \sqrt{|p_i-y_i|}\sqrt{|p_j-y_j|} S_{i,j}$ where $p_i$ is the $i$-th estimation.

Our loss is implemented with this code:

def ourLoss(out, lab):
    global stra, sc
    abserr = torch.abs(out - lab).flatten().float()
    serr = torch.sqrt(abserr)
    bm = stra[sc : sc + out.shape[0], sc : sc + out.shape[0]].float()
    loss = torch.dot(serr, torch.matmul(bm, serr))
    return loss

where 'stra' is $S$, sc is a counter used for batch evaluations, and then the Adam optimizer returns a nan loss value...

Can you try to lower your learning rate (substantially). It might be that it just explodes becuase of exploding gradients. — Robin van Hoorn, Feb 25 '23 at 16:26
We tried, other nan's compare. We think the problem is related with the fact that as some serr goes to $0$, its derivative goes to infinity. We tried to clip the gradient but it did not work. — Filippo Portera, Feb 27 '23 at 12:03
We tried also to add a small value ( 1E-7 ) to serr, but we still have nan loss values. — Filippo Portera, May 18 '23 at 08:07

What's wrong with our loss and PyTorch?

0 Answers0