Highest Voted 'test-datasets' Questions - Artificial Intelligence Stack Exchange

4

votes

1 answer

What are "development test sets" used for?

This is a theoretical question. I am a newbie to artificial intelligence and machine learning, and the more I read the more I like this. So far, I have been reading about the evaluation of language models (I am focused on ASR), but I still don't get…

asked Mar 13 '18 at 10:04

little_mice

143
2

3

votes

1 answer

How do I select the (number of) negative cases, if I'm given a set of positive cases?

We were given a list of labeled data (around 100) of known positive cases, i.e. people that have a certain disease, i.e. all these people are labeled with the same class (disease). We also have a much larger amount of data that we can label as…

neural-networks imbalanced-datasets selection-bias training-datasets test-datasets

asked Dec 20 '20 at 14:14

Otto

33
5

3

votes

1 answer

What is the reason behind using a test batch size?

If one examines the SSD: Single Shot MultiBox Detector code from this GitHub repository, it can be seen that, for a testing phase (evaluating network on test data set), there is a parameter test batch size. It is not mentioned in the paper. I am not…

deep-learning testing batch-size test-datasets single-shot-multibox-detector

asked Jan 25 '19 at 17:45

carobnodrvo

143
4

1

vote

1 answer

How to perform PCA in the validation/test set?

I was using PCA on my whole dataset (and, after that, I would split it into training, validation, and test datasets). However, after a little bit of research, I found out that this is the wrong way to do it. I have few questions: Are there some…

machine-learning principal-component-analysis test-datasets validation-datasets

asked Oct 30 '18 at 13:02

LVoltz

121
1
5

1

vote

0 answers

Why does the SVM perform poorly on test data that has a different class distribution than the training data?

Do you know why the SVM performs poorly on test data that has a different class distribution than the training data? The training data has around 15 classes, and the additional testing data has around 6 classes (a subset of 15 classes). I found that…

machine-learning overfitting support-vector-machine training-datasets test-datasets

asked May 07 '22 at 09:41

Allie

11
2

1

vote

1 answer

What does it mean by overfitting the test set?

Consider the following statement from p14 of Naive Bayes and Sentiment Classification While the use of a devset avoids overfitting the test set, having a fixed training set, devset, and test set creates another problem: in order to save lots of…

machine-learning terminology overfitting test-datasets

asked Jul 06 '21 at 23:09

hanugm

3,571
3
18
50

1

vote

2 answers

Are the held-out datasets used for testing, validation or both?

I came across a new term "held-out corpora" and I confused regarding its usage in the NLP domain Consider the following three paragraphs from N-gram Language Models #1: held-out corpora as a non-train data For an intrinsic evaluation of a language…

natural-language-processing terminology books test-datasets validation-datasets

asked Jul 02 '21 at 01:59

hanugm

3,571
3
18
50

1

vote

0 answers

Is there a way, while training (with contrastive learning) the embedding network, to find the test accuracy?

I aim to do action recognition in videos on a private dataset. To compare with the existing state-of-the-art implementations, other guys published their code on Github, like the one here (for the paper Self-supervised Video Representation Learning…

deep-learning accuracy action-recognition testing test-datasets

asked Mar 05 '21 at 02:29

krishna chaitanya

111
2

1

vote

1 answer

How to build a test set for a model in industry?

Most of the tutorials only teach us to split the whole dataset into three parts: training set, develop set, and test set. But in the industry, we are kind of doing test-driven development, and what comes most important is the building of our test…

testing test-datasets

asked Jan 23 '21 at 08:45

Lerner Zhang

877
1
7
19

1

vote

0 answers

Wouldn't training the model with this data lead to inaccuracies since the testing data would not be normalized in a similar way?

I was trying to normalize my input data images for feeding to my convolutional neural network and wanted to use standardize my input data. I referred to this post, which says that featurewise_center and featurewise_std_normalization scale the images…

deep-learning convolutional-neural-networks keras test-datasets standardisation

asked Feb 20 '20 at 08:36

user33681

11
1

0

votes

0 answers

How can validation accuracy be more than test accuracy?

I have been trying to implement DenseNet on small dataset using k-fold cross validation. Training accuracy is 94% ,validation accuracy is 73% whereas test accuracy is 90%.I have taken 10% of my total dataset as test set. I know some overfitting is…

deep-learning convolutional-neural-networks cross-validation dense-layers test-datasets

asked Nov 26 '22 at 08:40

srij

13
4

0

votes

2 answers

Why is the WMT16 dataset favoured for evaluating machine translation models?

The Workshop on Statistical Machine Translation has released translation challenges each year from 2004 on, which feature a dataset of sentence pairs in a variety of languages. Even though the conference has been taking place each year, with ever…

machine-translation test-datasets

asked Jan 19 '22 at 23:14

Zwiebak

101

0

votes

1 answer

Given a dataset of people with and without cancer, should I split it into training and test datasets such that the same person is not in both?

I have a database that contains healthy persons and lung cancer patients. I need to design a deep neural network for the binary classification problem (cancer/no cancer). I need to split the dataset into 70% train and 30% test. How can I do the…

deep-learning binary-classification cross-validation training-datasets test-datasets

asked Nov 09 '21 at 18:20

Noha

111
2

0

votes

1 answer

What are possible ways to combat overfitting or improve the test accuracy in my case?

I have asked a question here, and one of the comments suggested that this is a case of severe overfitting. I made a neural network, which uses residual boosting (which is done via a KNN), and I am still just able to get < 50% accuracy on the test…

neural-networks tensorflow overfitting accuracy test-datasets

asked Mar 24 '21 at 13:01

jr123456jr987654321

235
1
7

0

votes

0 answers

Why doesn't U-Net work with images different from the dataset?

I have implemented a U-Net, similar to this implementation, but for a different dataset, this one, to segment roads. It works fine using the test folder images, but, for example, when I pick a print from bing maps and try to infer with the trained…

datasets image-segmentation u-net training-datasets test-datasets

asked Mar 16 '21 at 14:41

FourZeroFive

101
1

Questions tagged [test-datasets]