Questions tagged [inference]
25 questions
12
votes
4 answers
Why LLMs and RNNs learn so fast during inference but, ironically, are so slow during training?
Why LLMs learn so fast during inference, but, ironically, are so slow during training? That is, if you teach an AI a new concept in a prompt, it will learn and use the concept perfectly and flawless, through the whole prompt, after just one shot.…

MaiaVictor
- 355
- 1
- 9
5
votes
2 answers
Are both the training and inference systems required in the same application?
From what I understand, there are 2 stages for deep learning: the first is training and the second is inference. The first is often done on GPUs because of their massive parallelism capabilities, among other things. The second, inference, while it…

Mahmoud Abdel-Mon'em
- 113
- 7
5
votes
0 answers
Training and inference for highly-context-sensitive information
What is the best way to train / do inference when the context matters highly as to what the inferred result should be?
For example in the image below all people are standing upright, but because of the perspective of the camera, their location…

g491
- 101
- 2
5
votes
1 answer
Is the Mask Needed for Masked Self-Attention During Inference with GPT-2
My understanding is that masked self-attention is necessary during training of GPT-2, as otherwise it would be able to directly see the correct next output at each iteration. My question is whether the attention mask is necessary, or even possible,…

D_s
- 51
- 3
3
votes
0 answers
How to use TPU for real-time low-latency inference?
I use Google's Cloud TPU hardware extensively using Tensorflow for training models and inference, however, when I run inference I do it in large batches. The TPU takes about 3 minutes to warm up before it runs the inference. But when I read the…

adng
- 51
- 2
2
votes
1 answer
Why is exact inference in a Bayesian network both NP-hard and P-hard?
I should show that exact inference in a Bayesian network (BN) is NP-hard and P-hard by using a 3-SAT problem.
So, I did formulate a 3-SAT problem by defining 3-CNF:
$$(x_1 \lor x_2) \land (\neg x_3 \lor x_2) \land (x_3 \lor x_1)$$
I reduced it to…

xava
- 423
- 1
- 3
- 9
2
votes
2 answers
What is a beam?
For example, faster-whisper's transcribe function takes an argument
beam_size: Beam size to use for decoding.
What does "beam" mean?

Geremia
- 163
- 6
2
votes
0 answers
Why does the BatchNormalization layer produce different outputs during training and inference?
I modified resnet50 architecture to get a regression network. I just add batchnorm1d and ReLU layers just before the fully connected layer. During the training, the output of batchnorm1d layer is nearly equal to 3 and this gives good results for…

Bedrick Kiq
- 141
- 2
1
vote
0 answers
Inference process and flow, and role of GPU, CPU, and RAM
This is a noob question.
I load a HuggingFace transformer model into GPU and create a HuggingFace pipeline using that model. Then I run inference on the model using the pipeline.
I would be glad to read in some depth about the actual process flow of…

ahron
- 131
- 6
1
vote
1 answer
What if we drop the causal mask in auto-regressive Transformer?
I understand the triangular causal mask in the attention is used to prevent tokens from "looking into the future", but why do we want to prevent that?
Let's suppose we have a model with context length $T = 8$. At inference time, we want to predict…

nalzok
- 251
- 2
- 8
1
vote
2 answers
How to optimize transformer inference for prompts shorter than the maximum sequence length?
As far as I understand, a Transformer has a specific input sequence length that depends on its architecture. So a model like gpt-4 has a sequence length of 8192 tokens. As such, I am interested what happens when the input prompt is shorter than…

janekb04
- 121
- 3
1
vote
0 answers
Inference time of VGG16 when initialised with different weights
I’m trying to understand the differences in inference time and training time between two models:
VGG16 with weights initialised from a Glorot uniform distribution and the same network with the only difference being that weights are initialised to…

kiril avramov
- 11
- 2
1
vote
0 answers
How can I use this Reformer to extract entities from a new sentence?
I have been looking at the NER example with Trax in this notebook. However, the notebook only gives an example for training the model. I can't find any examples of how to use this model to extract entities from a new string of text.
I've tried the…

Alan Buxton
- 121
- 5
1
vote
1 answer
In RL as probabilistic inference, why do we take a probability to be $\exp(r(s_t, a_t))$?
In section 2 the paper Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review the author is discussing formulating the RL problem as a probabilistic graphical model. They introduce a binary optimality variable…

David
- 4,591
- 1
- 6
- 25
1
vote
0 answers
Algorithm which learns to select from proposed options
My goal is to write a program that automatically selects a routing out of multiple proposed options.
The data consists out of the multiple proposed options with each the attributes time, costs and if there is a transhipment and also which of the…

Nui
- 11
- 1