For questions related to padding (the input) in the context of neural networks.
Questions tagged [padding]
9 questions
3
votes
0 answers
What is the difference between zero-padding and character-padding in Recurrent Neural Networks?
For RNN's to work efficiently, we vectorize the operations, which results in an input matrix of shape
(m, max_seq_len)
where m is the number of examples, e.g. sentences, and max_seq_len is the maximum length that a sentence can have. Some examples…

PhysicsMan
- 31
- 1
2
votes
0 answers
How do neural networks deal with inputs of different sizes that are padded in order to have them of the same size?
I am trying to create an environment for RL where the size of my input (observation space) is not fixed. As a way around it, I thought about padding the size to a maximum value and then assigning "null" to those values that do not exist. Now, these…

user101464
- 61
- 3
1
vote
2 answers
How is the padding mask incorporated in the attention formula?
I have been looking for the answer in other questions but no one tackled that. I want to ask you how is the padding mask considered in the formula of attention?
The attention formula taking into account a causal mask is:
$Attention(Q, K, V) =…

Daviiid
- 563
- 3
- 15
1
vote
1 answer
Is reconciling shape discrepancies the only purpose of padding?
Padding is a technique used in some of the domains of artificial intelligence.
Data is generally available in different shapes. But in order to pass the data as input to a model in deep learning, the model allows only a particular shape of data to…

hanugm
- 3,571
- 3
- 18
- 50
1
vote
1 answer
Is it a good practice to pad signal before feature extraction?
Is padding, before feature extraction with VGGish, a good practice?
Our padding technique is to find the longest signal (which is loaded .wav signal), and then, in every shorter signal, put zeros to the size of the longest one. We need to use it…

Dawid_K
- 13
- 2
1
vote
0 answers
Transpose convolution in TiF-GAN: How does "same" padding works?
This question should be quite generic but I faced the problem in the case of the TiF-GAN generator so I am going to use it as an example. (Link to paper)
If you check the penultimate page in the paper you can find the architecture design of the…

MattSt
- 597
- 1
- 5
- 12
0
votes
2 answers
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one use?
While fine-tuning a decoder only LLM like LLaMA on chat dataset, what kind of padding should one use?
Many papers use Left Padding, but is right padding wrong since transformers gives the following warning if using right padding " A decoder-only…

basujindal
- 3
- 2
0
votes
0 answers
How is padding masking considered in the Attention Head of a Transformer?
For purely educational purposes, my goal is to implement basic Transformer architecture from scratch. So far I focused on the encoder for classification tasks and assumed that all samples in a batch have the same length. This means, I didn't care…

Christian
- 101
- 1
0
votes
1 answer
Text classification of non-equal length texts, should I pad left or right?
Text classification of equal length texts works without padding, but in reality, practically, texts never have the same length.
For example, spam filtering on blog article:
thanks for sharing [3 tokens] --> 0 (Not spam)
this article is great [4…

Dee
- 1,283
- 1
- 11
- 35