I am not sure whether that solves your problem at hand, but one approach you could look into is k-fold Cross Validation (CV).
In this approach, you split your combined train, development, and test data into $k$ randomized and equally-sized partitions.
Afterwards, you train and evaluate your model $k$ times. In the $i^{th}$ iteration, you train your model on all but the $i^{th}$ partition. After training on the $k-1$ partitions is done, you evaluate your model on the $i^{th}$ partition. You repeat this process for all $i \in {1, 2, ..., k}$.
To be clear, you keep your initially randomized partitions fixed during k-fold CV.
Then, you take the average performance over all $k$ performed test runs to assess the quality of your model based on the averaged test performance. Afterwards, you could train your model on all train, development, and test data and deliver the resulting model as your final one.
In the most extreme case, you would perform Leave-One-Out-CV, where you set $k$ equal to the number of all the data points at your disposal. That is the most expensive approach, but yields the most accurate performance estimates.
For more information, see this website.
Generally, using that approach, you don't waste any data by reserving it exclusively for development/testing. Also, it might be important to mention that this approach is compatible with other sampling techniques as well. For example, in the $i^{th}$ iteration of your CV algorithm, you could apply stratified sampling to your $k-1$ partitions used for training during that ($i^{th}$) iteration.
I am not entirely sure whether I get the second part of your question right.
If it is about how to later introduce the new feature to a given model, I would say the following.
When it comes to introducing new features, I think you are pretty much out of luck with respect to recycling your old model. Of course, (assuming that the introduction of new features to the existing model is technically possible) there might be types of models which allow continual learning under certain circumstances, but in the worst case that might cause catastrophic forgetting since you change the distribution of your underlying training data when adding new features, which not all models might be able to deal with.
An example of this case is when you add more diverse training images for a given Convolutional Neural Net (CNN), which the CNN then has to learn to map to an already existing set of classes.
In other cases, introducing new features might not even be technically possible if their introduction would require adding new input (or output) nodes to an existing model.
However, when your second part of the question asks for how to fill gaps in your older data, caused by missing data, there are different strategies you could try for imputing the missing data, some of which are briefly mentioned here.