1

I'm processing a semi-structured scientific document and trying to extract some specific concepts. I've actually made quite good progress without machine-learning so far, but I got to a block of true free text and I'm wondering whether a very narrow sense NLP/learning algorithm can help.

Specifically, there are concepts I know to be important that are discussed in this section, but I'll need some NLP to get the 'sentiment'. I thought this might 'entity sentiment' analysis, however, I'm not trying to capture the writer's emotion about a concept. It's literally whether the writer of the text thinks the entity is present, absent, or is uncertain about an entity.

Simple example. "First, there are horns. And second, the sheer size of this enormous fossil record argues for an herbivore or omnivore. The jaws are large, but this is not a carnivore."

And say my entities are horns (presence or absence), and type of dinosaur (herbivore, omnivore, carnivore). Desired output:

Horns (present) Carnivore (absent) Herbivore (possible/present) -- fine if it thinks 'present' Omnivore (possible/present) -- fine if it thinks 'present'

What is the class of NLP analysis that takes an explicit input entity (or list of entities) and tries to assess based on context whether that entity is present or absent according to the writer of the input text? It's actually fine if this isn't a learning algorithm (maybe better). Bonus if you have suggestions for python packages that could be used in this narrow sense. I've looked casually through NLTK and spacy packages but they're both vast, wasn't obvious which class of model or functions I'd need to solve this problem.

vector07
  • 111
  • 2
  • _If_ I understood you correctly, you are looking for: Aspect Based Sentiment Analysis (ABSA), but in your case you replace the sentiment with your "present, absent, possible/present". – Isbister Oct 07 '20 at 18:49
  • Right. I think many of the sentiment based methods seem to 'hardcode' the sentiment part though, with dictionaries of positive and negative emotions for example. I guess I need a dictionary of presence/absence/equivocal terms. Does that exist, and if so where? – vector07 Oct 07 '20 at 19:19
  • I would not use dictionaries due to the amount of manual work, and not using the power of embeddings... What about annotating some data and building a multi-label classifier for this purpose? You should focus on generating ambiguous examples, since those will be the hardest! – Isbister Oct 07 '20 at 19:45

0 Answers0