Highest Voted 'benchmarks' Questions - Artificial Intelligence Stack Exchange

22

votes

4 answers

Why does ChatGPT fail in playing "20 questions"?

IBM Watson's success in playing "Jeopardy!" was a landmark in the history of artificial intelligence. In the seemingly simpler game of "Twenty questions" where player B has to guess a word that player A thinks of by asking questions to be answered…

natural-language-processing chatgpt benchmarks

asked May 24 '23 at 13:32

Hans-Peter Stricker

811
1
8
20

10

votes

3 answers

How can AI researchers avoid "overfitting" to commonly-used benchmarks as a community?

In fields such as Machine Learning, we typically (somewhat informally) say that we are overfitting if improve our performance on a training set at the cost of reduced performance on a test set / the true population from which data is sampled. More…

machine-learning research academia benchmarks

asked Aug 12 '18 at 12:08

Dennis Soemers

9,894
2
25
66

6

votes

1 answer

Interesting examples of discrete stochastic games

SGs are a generalization of MDPs to multiple agents. Like this previous question on MDPs, are there any interesting examples of zero-sum, discrete SGs—preferably with small state and action spaces? I'm hoping to use such examples as benchmarks, but…

game-theory environment markov-decision-process benchmarks

asked Dec 23 '19 at 20:12

user76284

347
1
14

6

votes

1 answer

Benchmarks for reinforcement learning in discrete MDPs

To compare the performance of various algorithms for perfect information games, reasonable benchmarks include reversi and m,n,k-games (generalized tic-tac-toe). For imperfect information games, something like simplified poker is a reasonable…

reinforcement-learning environment markov-decision-process benchmarks

asked Sep 01 '19 at 18:11

user76284

347
1
14

6

votes

2 answers

What are the most compact Real Time-Strategy Games?

There was a recent informal question on chat about RTS games suitable for AI benchmarks, and I thought it would be useful to ask a question about them in relation to AI research. Compact is defined as the fewest mechanics, elements, and smallest…

game-ai rts benchmarks

asked Jun 05 '19 at 18:45

DukeZhou

6,237
5
25
53

4

votes

1 answer

Why is chess still a benchmark for Artificial Intelligence?

Even though modern chess playing programs have demonstrated themselves to be as strong (or stronger) than even the best human players for nearly 20 years now (1997 when IBM's Deep Blue defeated the world chess champion Gary Kasparov), why would a…

chess intelligence-testing alphago benchmarks

asked Dec 15 '17 at 20:39

DJ2

143
3

2

votes

1 answer

Are there benchmarks for assessing the speed of the forward-pass of neural networks?

I have a task where I would like to use a convolutional neural network (CNN). I would like to incrementally start from the fastest models, fine-tune and see whether they fit my "budget". At the moment, I'm just looking at object detection CNN-based…

convolutional-neural-networks computer-vision feedforward-neural-networks benchmarks

asked Jul 05 '18 at 11:56

The Nomadic Coder

123
3

2

votes

0 answers

NLP annotation tool online and other tools to compare performances of different NLP algorithms

I do text annotations (POS tagging, NER, chunking, synset) by using a specific annotation tool for Natural Language Processing. I would like to make the same annotations on different tools to compare the performances of both. Furthermore, for I…

natural-language-processing chat-bots benchmarks

asked Nov 20 '19 at 19:21

franz1

163
4

1

vote

1 answer

Is there a benchmark for multi-objective evolutionary algorithms?

I'm working on a project for an evolutionary algorithms course, and the problem we're trying to solve is multi-objective. We'll use NSGA-II but we also wanted to compare with some other MOEAs, however, we haven't been able to find good…

reference-request evolutionary-algorithms benchmarks moea nsga-2

asked Oct 15 '21 at 23:17

Matías Santurio

13
2

1

vote

0 answers

Bechmark models for Text Classification / Sentiment Classification

I am currently working on a novel application in NLP where I try to classify empathic and non-empathic texts. I would like to compare the performance of my model to some benchmark models. As I am working with models based on Word2Vec embeddings, the…

natural-language-processing word-embedding text-classification word2vec benchmarks

asked Oct 14 '20 at 17:26

Jakub

11
1

1

vote

0 answers

What is the efficiency of trained neural networks?

Training neural networks takes a while. My question is, how efficient is a neural network that is completely trained (assuming it's not a model that is constantly learning)? I understand that this is a vague and simply difficult question to answer,…

neural-networks gpt efficiency computational-complexity benchmarks

asked Sep 04 '20 at 14:08

Anton

111
2

1

vote

0 answers

Benchmarking SAC on Pybullet

So far I have seen TD3 and DDPG benchmarks on Pybullet environments, but I am looking for SAC benchmarks on Pybullet too, anyone can help?

reinforcement-learning pytorch ddpg benchmarks

asked May 14 '20 at 08:21

ASA

151
1

0

votes

0 answers

A technique to show what tokens are relatively predicted by an LLM

I’m picturing a technique where you can see what an LLM is likely to respond with, which updates in real time. It’s a bit trippy, but it’s like GitHub Copilot, in that there is predicted text while you type, but it’s predicting what an LLM would say…

natural-language-processing language-model large-language-models information-theory benchmarks

asked Jul 07 '23 at 06:40

hmltn

103
9

Questions tagged [benchmarks]