1

I am trying to build an entity linking system that links entities found in a text to entities on Wikipedia. If no Wikipedia page seems to match this entity, the systems will assume to have found a new entity and adds it to an internal knowledge base. I am currently working to build a system that works like this:

  1. Extract mentions of entities in the text: in the sentence "Napoleon commissioned a grand reconstruction of Paris", Napoleon is extracted as it refers to some entity.
  2. Search for the alias "Napoleon" in Wikipedia and compute the embeddings of the pages that appear.
  3. Compute the embedding of the sentence where the mention is found and compare it against the embeddings computed from Wikipedia. In this case, Napoleon III should be the best fit, although Napoleon I has a higher prior probability, given his popularity.
  4. If no page from Wikipedia surpasses a given similarity threshold, we assume that we have found an entity with no corresponding page on Wikipedia; thus, we store it in our entity kb.
  5. Save on our kb the previously computed embeddings of Wikipedia pages.

Can this approach work effectively? How can I improve it?

Any help you can provide would be greatly appreciated!

0 Answers0