1

I would like to classify the subject of a conversation. I could classify each messages of the conversation, but I will loose some imformation because of related messages.

I also need to do it gradually and not at the end of the conversation.

I searched near recurrent neural network and connectionist classification but I'm not sure it answer really well my issue.

nbro
  • 39,006
  • 12
  • 98
  • 176
  • I think you are looking for "text summarization". Maybe take a look at [this paper](https://www.sciencedirect.com/science/article/pii/S1319157820303712) (although I haven't yet read it). – nbro Sep 24 '20 at 15:52
  • @nbro Summarisation is a different task: it usually selects salient sentences from a text, and would not work in a conversation. – Oliver Mason Sep 25 '20 at 09:03
  • @OliverMason Ok. I don't have experience with any "text summarization" technique, but aren't these techniques also able to summarise singles sentences, and don't they attempt to extract the important parts of the text anyway (which may be the subject of the conversation, assuming the text is a conversation)? – nbro Sep 25 '20 at 10:02
  • @nbro, thanks for the "text summarization". I've never heard about it. It's not really what I want, because i'm searching for a classification. But the idea is good and could be another aproach to find the subject of a conversation without predifined labels. – Benjamin Binard Sep 25 '20 at 14:58

2 Answers2

1

This is a difficult problem.

First, how do you define 'subject'? Do you have a (closed) lists of labels you want to assign? What about subjects that overlap, or don't occur in your list? What even is a subject? This is a non-trivial issue.

Second, and this is even harder, how do you want to recognise subjects? A simple solution could be using a list of associated keywords, but this is problematic as many words have multiple meanings, and words are not really a good indicator of a conversation topic in the first place.

Instead of jumping to an implementation method, be clear about how you want to tackle these two items first. Start by annotating a conversation transcript by hand. You will then get a feeling for the problems and possible solutions. After you have done this, you can think about how to get a machine to do it efficiently.

UPDATE: For a scheme to annotate the functions of lines within a conversation have a look at Francis & Hunston (1992) Analysing Everyday Conversation. In Coulthard, M. (ed.) "Advances in Spoken Discourse Analysis". London: Routledge. pp.123-161. This is more oriented towards linguistics, but might give you some ideas on how to proceed.

Oliver Mason
  • 5,322
  • 12
  • 32
  • Hello @oliver-mason. Yes, subject is a set of labels. I didn't determinated them yet, but it will be the case. I read litterature about classifying texts. So it can be a good start to classify message from the converstation. But we could lack some information because message are interconnected. So my second idea is to concatenate all the message from a conversation to create a large text and classify it. What do you think about it ? – Benjamin Binard Sep 25 '20 at 13:21
  • That wouldn't work; conversations are interleaved. Have a look at papers in Conversation Analysis, which discuss turn-taking etc. – Oliver Mason Sep 25 '20 at 15:59
  • Ok, thanks, I will look at Conversation Analysis. – Benjamin Binard Sep 28 '20 at 06:43
0

Thank you very much for your help, all of you.

I finally find on the Internet key words : "Dialog act classification".

I don't know yet how to implement it, but it's a good start !