Questions tagged [testing]

For questions related to the concept of testing (or evaluating) machine learning models and algorithms, e.g. in terms of some performance measure (such as accuracy or cumulative reward).

34 questions
7
votes
4 answers

Why is my test error lower than the training error?

I am trying to train a CNN regression model using the ADAM optimizer, dropout and weight decay. My test accuracy is better than training accuracy. But, as far as I know, usually, the training accuracy is better than test accuracy. So I wonder how…
5
votes
1 answer

How to decide a train-test split?

In almost every ML model, a train-test (or train-test-val split) is critical to assess the model's performance. However, I have always wondered what the rationale is to decide a particular train-test split. I've seen that some people like an 80-20…
user48670
4
votes
4 answers

What is the difference between training and testing in reinforcement learning?

In reinforcement learning (RL), what is the difference between training and testing an algorithm/agent? If I understood correctly, testing is also referred to as evaluation. As I see it, both imply the same procedure: select an action, apply to the…
Cristian M
  • 249
  • 2
  • 6
4
votes
1 answer

How to evaluate an RL algorithm when used in a game?

I'm planning to create a web-based RL board game, and I wondered how I would evaluate the performance of the RL agent. How would I be able to say, "Version X performed better than version Y, as we can see that Z is much better/higher/lower." I…
3
votes
0 answers

Are there standardized forms of the Turing Test?

Most computer science instructors will tell you that the Turing Test is more a theoretical or conceptual thought experiment than an actual exam that someone (or something!) can formally sit and receive a score on. A thread here on AI Stack Exchange…
3
votes
1 answer

What is the most statistically acceptable method for tuning neural network hyperparameters on very small datasets?

Neural networks are usually evaluated by dividing a dataset into three splits: training, validation, and test The idea is that critical hyperparameters of the network such as the number of epochs and the learning rate can be tuned by testing the…
3
votes
1 answer

Should we also shuffle the test dataset when training with SGD?

When training machine learning models (e.g. neural networks) with stochastic gradient descent, it is common practice to (uniformly) shuffle the training data into batches/sets of different samples from different classes. Should we also shuffle the…
3
votes
2 answers

How can I predict the true label for data with incomplete features based on the trained model with data with more features?

Suppose I have a model that was trained with a dataset that contains the features (f1, f2, f3, f4, f5, f6). However, my test dataset does not contain all features of the training dataset, but only (f1, f2, f3). How can I predict the true label of…
3
votes
1 answer

Is the test time the phase when the model's accuracy is calculated with test data set?

When papers talk about the "test time", does this mean the phase when the model is passed with new data instances to derive the accuracy of the test data set? Or is "test time" the phase when the model is fully trained and launched for real-world…
MJimitater
  • 133
  • 4
3
votes
1 answer

Should I use leave-one-out cross-validation for testing?

I am currently working with a small dataset of 20x300. Since I have so few data points, I was wondering if I could use an approach similar to leave-one-out cross-validation but for testing. Here's what I was thinking: train/test split the data,…
3
votes
1 answer

What is the reason behind using a test batch size?

If one examines the SSD: Single Shot MultiBox Detector code from this GitHub repository, it can be seen that, for a testing phase (evaluating network on test data set), there is a parameter test batch size. It is not mentioned in the paper. I am not…
2
votes
0 answers

Which evaluation metrics should be used in training, validation and testing of a model?

Which specific performance evaluation metrics are used in training, validation, and testing, and why? I am thinking error metrics (RMSE, MAE, MSE) are used in validation, and testing should use a wide variety of metrics? I don't think performance is…
2
votes
1 answer

Is it a good idea to train a CNN to detect the hydration value (percentage) in skin images and evaluate it with the MSE?

I have a large dataset of skin images, each one associated with a hydration value (percentage). Now I'm looking into predicting the hydration value from an image. My thinking: train a CNN on the dataset and evaluate the model with a mean square…
2
votes
1 answer

Why doesn't dropout mislead results during evaluation?

I have seen that, usually, the dropout layer is used differently in training and evaluation modes, i.e. it is recommended to use during training but not in evaluation/testing. Dropout does remove a few nodes at random so that model does not end up…
1
vote
1 answer

How do I check that the combination of these models is good?

I've selected more than 10 discriminative (classification) models, each wrapped with a BaggingClassifier object, optimized with a GridSearchCV, and all of them placed within a VotingClassifier object. Alone, they all bring around 70% accuracy, on a…
1
2 3