What are "development test sets" used for?

Question

This is a theoretical question. I am a newbie to artificial intelligence and machine learning, and the more I read the more I like this. So far, I have been reading about the evaluation of language models (I am focused on ASR), but I still don't get the concept of the development test sets.

The clearest explanation I have come across is the following (taken from chapter 3 of the book Speech and Language Processing (3rd ed. draft) by Dan Jurafsky and James H. Martin)

Sometimes we use a particular test set so often that we implicitly tune to its characteristics. We then need a fresh test set that is truly unseen. In such cases, we call the initial test set the development test set or, devset.

In any case, I still don't understand why an additional test has to be used. In other words, why aren't training and test sets enough?

score 6 · Accepted Answer · edited Jan 21 '21 at 00:11

In machine learning, you normally split your data into 3 parts (80-10-10%).

The first part (80% of your initial data) is for the training of your ML model: this is known as the training dataset.

The second part (10%) is the development set (or dataset), aka validation set. This is used as measuring your performance with various hyperparameters (e.g. in neural networks: layer size).

After you found your best hyperparameters, you learn the model again on the training set, and then test it on your test dataset (10%), which the model has never seen before. Your result on the test data is now a good indicator how your model prediction quality is in the real world (because it was never optimized for this test data).

What are "development test sets" used for?

1 Answers1

Linked