0

Given the samples $\vec{x_i} \in \mathbb{R}^d, i \in [1,..,l]$ where $l$ is the number of training samples, $d$ is the number of input features, the related target values $y_i \in \mathbb{R}$, and the $l \times l$ matrix defined below:

Si,j=eγS||xixj||2=eγS(xixi2xixj+xjxj)

where $i \in [1,..,l], j \in [1,..,l]$, and $\gamma_S$ is another hyper-parameter, we would like to use with PyTorch the following custom loss for a regression task:

li=1lj=1|piyi||pjyj|Si,j

where $p_i$ is the $i$-th estimation.

Our loss is implemented with this code:

def ourLoss(out, lab):
    global stra, sc
    abserr = torch.abs(out - lab).flatten().float()
    serr = torch.sqrt(abserr)
    bm = stra[sc : sc + out.shape[0], sc : sc + out.shape[0]].float()
    loss = torch.dot(serr, torch.matmul(bm, serr))
    return loss

where 'stra' is $S$, sc is a counter used for batch evaluations, and then the Adam optimizer returns a nan loss value...

  • 1
    Can you try to lower your learning rate (substantially). It might be that it just explodes becuase of exploding gradients. – Robin van Hoorn Feb 25 '23 at 16:26
  • We tried, other nan's compare. We think the problem is related with the fact that as some serr goes to $0$, its derivative goes to infinity. We tried to clip the gradient but it did not work. – Filippo Portera Feb 27 '23 at 12:03
  • We tried also to add a small value ( 1E-7 ) to serr, but we still have nan loss values. – Filippo Portera May 18 '23 at 08:07

0 Answers0