Most Popular
1500 questions
27
votes
4 answers
Why does C++ seem less widely used than Python in AI?
I just want to know why do machine learning engineers and AI programmers use languages like Python to perform AI tasks and not C++, even though C++ is technically a more powerful language than Python.

Mark ellon
- 489
- 1
- 5
- 6
27
votes
2 answers
What is sample efficiency, and how can importance sampling be used to achieve it?
For instance, the title of this paper reads: "Sample Efficient Actor-Critic with Experience Replay".
What is sample efficiency, and how can importance sampling be used to achieve it?

Gokul NC
- 423
- 1
- 4
- 7
27
votes
4 answers
Why is ChatGPT bad at math?
As opposed to How does ChatGPT know math?, I've been seeing some things floating around the Twitterverse about how ChatGPT can actually be very bad at math. For instance, I asked it "If it takes 5 machines 5 minutes to make 5 devices, how long would…

Mithical
- 2,885
- 5
- 27
- 39
27
votes
1 answer
What is the "temperature" in the GPT models?
What does the temperature parameter mean when talking about the GPT models?
I know that a higher temperature value means more randomness, but I want to know how randomness is introduced.
Does temperature mean we add noise to the weights/activations…

Tom Dörr
- 393
- 1
- 3
- 7
27
votes
2 answers
Is Prolog still used in AI?
According to Wikipedia,
Prolog is a general-purpose logic programming language associated with artificial intelligence and computational linguistics.
Is it still used for AI?
This is based off of a question on the 2014 closed beta. The author had…

Mithical
- 2,885
- 5
- 27
- 39
27
votes
3 answers
Where can I find the proof of the universal approximation theorem?
The Wikipedia article for the universal approximation theorem cites a version of the universal approximation theorem for Lebesgue-measurable functions from this conference paper. However, the paper does not include the proofs of the theorem. Does…

Leroy Od
- 435
- 1
- 4
- 4
27
votes
1 answer
What is the Bellman operator in reinforcement learning?
In mathematics, the word operator can refer to several distinct but related concepts. An operator can be defined as a function between two vector spaces, it can be defined as a function where the domain and the codomain are the same, or it can be…

nbro
- 39,006
- 12
- 98
- 176
26
votes
1 answer
How is BERT different from the original transformer architecture?
As far as I can tell, BERT is a type of Transformer architecture. What I do not understand is:
How is Bert different from the original transformer architecture?
What tasks are better suited for BERT, and what tasks are better suited for the…

chessprogrammer
- 2,215
- 2
- 12
- 23
26
votes
1 answer
What exactly are the "parameters" in GPT-3's 175 billion parameters and how are they chosen/generated?
When I studied neural networks, parameters were learning rate, batch size etc. But even GPT3's ArXiv paper does not mention anything about what exactly the parameters are, but gives a small hint that they might just be sentences.
Even tutorial…

Nav
- 481
- 1
- 5
- 10
25
votes
3 answers
How do I handle large images when training a CNN?
Suppose that I have 10K images of sizes $2400 \times 2400$ to train a CNN.
How do I handle such large image sizes without downsampling?
Here are a few more specific questions.
Are there any techniques to handle such large images which are to be…

WaterRocket8236
- 403
- 1
- 4
- 7
25
votes
4 answers
What is a Dynamic Computational Graph?
Frameworks like PyTorch and TensorFlow through TensorFlow Fold support Dynamic Computational Graphs and are receiving attention from data scientists.
However, there seems to be a lack of resource to aid in understanding Dynamic Computational…

Blaszard
- 1,027
- 2
- 11
- 25
24
votes
2 answers
Are there other approaches to deal with variable action spaces?
This question is about Reinforcement Learning and variable action spaces for every/some states.
Variable action space
Let's say you have an MDP, where the number of actions varies between states (for example like in Figure 1 or Figure 2). We can…

Rikard Olsson
- 341
- 1
- 3
- 8
24
votes
8 answers
What is artificial intelligence?
What is the definition of artificial intelligence?

Rana Wasif
- 369
- 1
- 6
24
votes
3 answers
How to choose an activation function for the hidden layers?
I choose the activation function for the output layer depending on the output that I need and the properties of the activation function that I know. For example, I choose the sigmoid function when I'm dealing with probabilities, a ReLU when I'm…

gvgramazio
- 696
- 2
- 7
- 19
24
votes
1 answer
Meaning of roles in the API of GPT-4/ChatGPT (system/user/assistant)
In the API of GPT-4 and ChatGPT, the prompt for a chat conversation is a list of messages, each marked as one of three roles: system, user or assistant.*
I understand which information this represents - but what does the model with that…

Volker Siegel
- 589
- 1
- 4
- 17