Highest Voted 'testing' Questions - Artificial Intelligence Stack Exchange

7

votes

4 answers

Why is my test error lower than the training error?

I am trying to train a CNN regression model using the ADAM optimizer, dropout and weight decay. My test accuracy is better than training accuracy. But, as far as I know, usually, the training accuracy is better than test accuracy. So I wonder how…

asked Oct 29 '17 at 10:54

이희준

73
1
4

5

votes

1 answer

How to decide a train-test split?

In almost every ML model, a train-test (or train-test-val split) is critical to assess the model's performance. However, I have always wondered what the rationale is to decide a particular train-test split. I've seen that some people like an 80-20…

machine-learning training cross-validation testing

asked Oct 12 '21 at 14:13

user48670

4

votes

4 answers

What is the difference between training and testing in reinforcement learning?

In reinforcement learning (RL), what is the difference between training and testing an algorithm/agent? If I understood correctly, testing is also referred to as evaluation. As I see it, both imply the same procedure: select an action, apply to the…

reinforcement-learning training comparison testing

asked May 04 '20 at 14:39

Cristian M

249
2
6

4

votes

1 answer

How to evaluate an RL algorithm when used in a game?

I'm planning to create a web-based RL board game, and I wondered how I would evaluate the performance of the RL agent. How would I be able to say, "Version X performed better than version Y, as we can see that Z is much better/higher/lower." I…

reinforcement-learning rewards performance return testing

asked Aug 10 '19 at 15:22

mason7663

603
3
10

3

votes

0 answers

Are there standardized forms of the Turing Test?

Most computer science instructors will tell you that the Turing Test is more a theoretical or conceptual thought experiment than an actual exam that someone (or something!) can formally sit and receive a score on. A thread here on AI Stack Exchange…

implementation intelligence-testing turing-test testing

asked Oct 15 '21 at 18:05

Robert Columbia

131
6

3

votes

1 answer

What is the most statistically acceptable method for tuning neural network hyperparameters on very small datasets?

Neural networks are usually evaluated by dividing a dataset into three splits: training, validation, and test The idea is that critical hyperparameters of the network such as the number of epochs and the learning rate can be tuned by testing the…

neural-networks hyperparameter-optimization hyper-parameters testing statistics

asked Apr 14 '21 at 05:09

Mike NZ

401
2
6

3

votes

1 answer

Should we also shuffle the test dataset when training with SGD?

When training machine learning models (e.g. neural networks) with stochastic gradient descent, it is common practice to (uniformly) shuffle the training data into batches/sets of different samples from different classes. Should we also shuffle the…

machine-learning training datasets stochastic-gradient-descent testing

asked Nov 03 '20 at 04:45

SpiderRico

960
8
18

3

votes

2 answers

How can I predict the true label for data with incomplete features based on the trained model with data with more features?

Suppose I have a model that was trained with a dataset that contains the features (f1, f2, f3, f4, f5, f6). However, my test dataset does not contain all features of the training dataset, but only (f1, f2, f3). How can I predict the true label of…

machine-learning deep-learning training datasets testing

asked Jul 23 '20 at 10:44

Dae-Young Park

43
4

3

votes

1 answer

Is the test time the phase when the model's accuracy is calculated with test data set?

When papers talk about the "test time", does this mean the phase when the model is passed with new data instances to derive the accuracy of the test data set? Or is "test time" the phase when the model is fully trained and launched for real-world…

machine-learning training definitions testing

asked May 13 '20 at 08:55

MJimitater

133
4

3

votes

1 answer

Should I use leave-one-out cross-validation for testing?

I am currently working with a small dataset of 20x300. Since I have so few data points, I was wondering if I could use an approach similar to leave-one-out cross-validation but for testing. Here's what I was thinking: train/test split the data,…

training cross-validation testing loocv

asked May 17 '19 at 14:53

Diogo Bastos

39
1

3

votes

1 answer

What is the reason behind using a test batch size?

If one examines the SSD: Single Shot MultiBox Detector code from this GitHub repository, it can be seen that, for a testing phase (evaluating network on test data set), there is a parameter test batch size. It is not mentioned in the paper. I am not…

deep-learning testing batch-size test-datasets single-shot-multibox-detector

asked Jan 25 '19 at 17:45

carobnodrvo

143
4

2

votes

0 answers

Which evaluation metrics should be used in training, validation and testing of a model?

Which specific performance evaluation metrics are used in training, validation, and testing, and why? I am thinking error metrics (RMSE, MAE, MSE) are used in validation, and testing should use a wide variety of metrics? I don't think performance is…

machine-learning training metric testing validation

asked Apr 14 '18 at 11:58

user9645302

53
3

2

votes

1 answer

Is it a good idea to train a CNN to detect the hydration value (percentage) in skin images and evaluate it with the MSE?

I have a large dataset of skin images, each one associated with a hydration value (percentage). Now I'm looking into predicting the hydration value from an image. My thinking: train a CNN on the dataset and evaluate the model with a mean square…

convolutional-neural-networks prediction regression testing mean-squared-error

asked Jan 09 '18 at 13:31

Patrick Samy

29
1

2

votes

1 answer

Why doesn't dropout mislead results during evaluation?

I have seen that, usually, the dropout layer is used differently in training and evaluation modes, i.e. it is recommended to use during training but not in evaluation/testing. Dropout does remove a few nodes at random so that model does not end up…

neural-networks deep-learning training dropout testing

asked Jan 22 '22 at 20:23

prat__

33
4

1

vote

1 answer

How do I check that the combination of these models is good?

I've selected more than 10 discriminative (classification) models, each wrapped with a BaggingClassifier object, optimized with a GridSearchCV, and all of them placed within a VotingClassifier object. Alone, they all bring around 70% accuracy, on a…

machine-learning classification performance ensemble-learning testing

asked Jun 23 '18 at 17:25

Miko Diko

177
1
2

Questions tagged [testing]