1

I'm reading this paper Global-Locally Self-Attentive Dialogue State Tracker and follow through the implementation published in GLAD.

I was wondering if someone can clarify what variable or score is used to calculate the global and local self-attention scores in Figure 4 (the heatmap).

For me, it is not really clear how to derive these scores. The only score that would match the given dimension would be in the scoring module $p_{utt}=softmax(a_{utt})$. However, I do not see in their implementation that anything is done with this value.

So, what I did was the following:

 q_utts = []
 a_utts=[]
 for c_val in C_vals:
     q_utt, a_utt = attend(H_utt, c_val.unsqueeze(0).expand(len(batch), *c_val.size()), lens=utterance_len)
     q_utts.append(q_utt)
     a_utts.append(a_utt)
attention_score= torch.mean(torch.stack(a_utts,dim=1),dim=1)

But the resulting attention score differs very much from what I expect.

nbro
  • 39,006
  • 12
  • 98
  • 176
greedsin
  • 111
  • 2

0 Answers0