Highest Voted 'metric' Questions - Artificial Intelligence Stack Exchange

8

votes

2 answers

Why is the perplexity a good evaluation metric for chatbots?

A few papers I have come across say that BLEU is not an appropriate evaluation metric for chatbots, so they use the perplexity. First of all, what is perplexity? How to calculate it? And why is perplexity a good evaluation metric for chatbots?

asked Jun 01 '17 at 02:50

RuiZhang1993

89
2

7

votes

2 answers

How is the F1 score calculated in a question-answering system?

I have an NLP model for answer-extraction. So, basically, I have a paragraph and a question as input, and my model extracts the span of the paragraph that corresponds to the answer to the question. I need to know how to compute the F1 score for such…

natural-language-processing natural-language-understanding metric question-answering

asked Jul 26 '20 at 14:13

HLeb

549
5
10

6

votes

2 answers

What evaluation metric are used for sequence-to-sequence prediction problems?

I am solving many sequence-to-sequence prediction problems using RNN/LSTM. What type of evaluation metrics can be used for sequence prediction problems? One metric is the mean squared error (MSE) that we can give as a parameter during the training…

recurrent-neural-networks long-short-term-memory sequence-modeling metric

asked Nov 14 '19 at 07:30

Asif Khan

181
1
6

4

votes

2 answers

Which metric should I use to assess the quality of the clusters?

I have a model that outputs a latent N-dimensional embedding for all data points, trained in a way that clusters data-points from the same class together, while being separated from other clusters belonging to other different classes. The…

unsupervised-learning clustering metric umap

asked Feb 09 '21 at 21:06

jaeger6

308
1
7

4

votes

0 answers

When computing the ROC-AUC score for multi-class classification problems, when should we use One-vs-Rest and One-vs-One?

The sklearn's documentation of the method roc_auc_score states that the parameter multi_class can take the value 'OvR' (which stands for One-vs-Rest) or 'OvO' (which stands for One-vs-One). These values are only applicable for multi-class…

machine-learning metric scikit-learn roc-auc multiclass-classification

asked Jan 06 '21 at 05:38

Leockl

151
1

4

votes

2 answers

How do you measure multi-label classification accuracy?

Multi-label assignment is the task in machine learning to assign to each input value a set of categories from a fixed vocabulary where the categories need not be statistically independent, so precluding building a set of independent classifiers each…

reference-request performance metric multi-label-classification

asked Oct 28 '20 at 11:27

Nick

251
1
5

3

votes

1 answer

How should we interpret all the different metrics in reinforcement learning?

I'm trying to train some deep RL agents using policy gradient methods like AC and PPO. While training, I have a ton of different metrics being monitored. I understand that the ultimate goal is to maximize the reward or return per episode. But there…

reinforcement-learning training policy-gradients metric

asked Jul 07 '20 at 16:25

bluekaterpillar

51
2

3

votes

1 answer

What is meant by the expected BLEU cost when training with BLEU and SIMILE?

Recently I was reading a paper based on a new evaluation metric SIMILE. In a section, validation loss comparison had been made for SIMILE and BLEU. The plot showed the expected BLEU cost when training with BLEU and SIMILE. What I'm unable to…

natural-language-processing training metric expectation

asked May 03 '20 at 08:15

develop97

31
2

3

votes

1 answer

Why is there more than one way of calculating the accuracy?

Some sources consider the true negatives (TN) when computing the accuracy, while some don't. Source 1: https://medium.com/greyatom/performance-metrics-for-classification-problems-in-machine-learning-part-i-b085d432082b Source…

machine-learning classification accuracy metric multi-label-classification

asked Feb 21 '20 at 10:04

Stephen Philip

317
2
9

3

votes

1 answer

Using True Positive as a Cost Function

I wanted to use True Positive (and True Negative) in my cost function to make to modify the ROC shape of my classifier. Someone told me and I read that it is not differentiable and therefore not usable as a cost function for a neural network. In the…

deep-learning metric

asked Aug 28 '19 at 13:00

Léonard Barras

31
2

2

votes

2 answers

How can we compare, in terms of similarity, two pieces of text?

How can we compare, in terms of similarity (and/or meaning), two pieces of text (or documents)? For example, let's say that I want to determine whether a document is a plagiarized version of another document. Which approach should I use? Could I use…

neural-networks natural-language-processing metric similarity

asked May 07 '18 at 16:40

cuong tran

33
1
5

2

votes

0 answers

Which evaluation metrics should be used in training, validation and testing of a model?

Which specific performance evaluation metrics are used in training, validation, and testing, and why? I am thinking error metrics (RMSE, MAE, MSE) are used in validation, and testing should use a wide variety of metrics? I don't think performance is…

machine-learning training metric testing validation

asked Apr 14 '18 at 11:58

user9645302

53
3

2

votes

1 answer

Why does the pass@k metric not "behave like" probability?

pass@k is a metric used to evaluate models that generate code, used for example to evaluate Codex. To evaluate pass@k, you have a dataset of natural language/code pairs, and you pass each NL prompt to the model. For each prompt, it generates k code…

transformer metric

asked Mar 07 '23 at 22:02

Jack M

242
1
8

2

votes

1 answer

How to calculate a meaningful distance between multidimensional tensors

TLDR: given two tensors $t_1$ and $t_2$, both with shape $(c,h,w),$ how shall the distance between them be measured? More Info: I'm working on a project in which I'm trying to distinguish between an anomalous sample (specifically from MNIST) and a…

features feature-extraction metric anomaly-detection tensor

asked Jul 04 '22 at 18:17

Hadar Sharvit

371
1
12

2

votes

2 answers

Is it possible that every class has a higher recall than precision for multi-class classification?

I am a student learning machine learning recently, and one thing is keep confusing me, I tried multiple sources and failed to find the related answer. As following table shows (this is from some paper): Is it possible that every class has a higher…

machine-learning text-classification metric precision

asked Jul 07 '20 at 21:31

Cheleeger Ken

73
5

Questions tagged [metric]