What is the score used to visualize attention in this paper?

Asked Jul 07 '20 at 09:22

Active Jul 07 '20 at 12:38

Viewed 27 times

I'm reading this paper Global-Locally Self-Attentive Dialogue State Tracker and follow through the implementation published in GLAD.

I was wondering if someone can clarify what variable or score is used to calculate the global and local self-attention scores in Figure 4 (the heatmap).

For me, it is not really clear how to derive these scores. The only score that would match the given dimension would be in the scoring module $p_{utt}=softmax(a_{utt})$. However, I do not see in their implementation that anything is done with this value.

So, what I did was the following:

 q_utts = []
 a_utts=[]
 for c_val in C_vals:
     q_utt, a_utt = attend(H_utt, c_val.unsqueeze(0).expand(len(batch), *c_val.size()), lens=utterance_len)
     q_utts.append(q_utt)
     a_utts.append(a_utt)
attention_score= torch.mean(torch.stack(a_utts,dim=1),dim=1)

But the resulting attention score differs very much from what I expect.

edited Jul 07 '20 at 12:38

nbro

39,006
12
98
176

asked Jul 07 '20 at 09:22

greedsin

What is the score used to visualize attention in this paper?

0 Answers0