Should we use a pre-trained model or a blank model for custom entity training of NER in spacy?

Question

Further to my last question, I am training a custom entity of FOODITEM to be recognized by Spacy's Name Entity Recognition engine. I am following tutorials online, following is the advise given in most of the tutorials;

Load the model or create an empty model

We can create an empty model and train it with our annotated dataset or we can use the existing spacy model and re-train with our annotated data.

But none of the tutorials tell how/why to choose between the two options. Also, I don't understand how will the choice affect my final output or the training of the model.

How do I make the choice between a pre-trained model or a blank model? What are the factors to consider?

score 1 · Accepted Answer · answered Jan 06 '21 at 05:02

The reason you would load a pre-existing model is that it offers something of value to your task (e.g. named entity recognition for food) and the cost of training it from scratch is not worth it. For example, to train GPT-3 from scratch would cost several million dollars. Typically someone will use a model like BERT and fine tune it. This is called transfer learning. With spaCy you will typically use en_core_web_sm which was trained on the OntoNotes corpus and includes named entities. Making a custom food NER using en_core_web_sm should be more accurate than making one from scratch. You should be able to build a good model with and without transfer learning fairly quickly if you have a GPU.

Should we use a pre-trained model or a blank model for custom entity training of NER in spacy?

1 Answers1