Suppose that I'm training a machine learning model to predict people's age by a picture of their faces. Lets say that I have a dataset of people from 1 year olds to 100 year olds. But I want to choose just 9 (arbitrary) ages out of this 100 age dataset and still the model should be able to predict the age of a given person. My question is how should I choose the optimal 9 (arbitrary) ages out of the 100 age dataset, so the trained model would perform better across most of the ages?
The model will perform better if I train the model with the entire population, so the question is, how to approximate the performance of this model but with the minimum possible sample selection (not the number of observations, but with minimum number of ages.)
I should address the following questions
- How many no of ages should I select from the entire age spectrum
- what are the best ages that I should select to train the model