For example, faster-whisper's transcribe
function takes an argument
beam_size: Beam size to use for decoding.
What does "beam" mean?
For example, faster-whisper's transcribe
function takes an argument
beam_size: Beam size to use for decoding.
What does "beam" mean?
The beam search is a algorithm to find probable output sequences for an input sequence, so it has been used for decoding in the context of sequence-to-sequence tasks, like machine translation. It's an alternative to a greedy search, which just samples the most likely token at every step (which is not guaranteed to find the most likely sequence, though), that can keep a set of solutions.
I think I first encountered this algorithm a few years ago while reading this important paper, which refers to this Alex Graves' paper, which actually contains a description and the pseudocode of a beam search, but I had forgotten about it, so I decided to review it :)
A good description can be found here (section 7.2.3) or here.
If you're familiar with search algorithms like breadth-first search (BFS), think of beam search as a similar search algorithm, where nodes are the tokens from some vocabulary and the edges are the probabilities of selecting a certain token given the previously selected tokens (important!). In fact, it can be viewed as a generalization of BFS (exhaustive search) - if you set the beam size equal to the vocabulary size, you get BFS.
The following image (taken from here, which provides a nice explanation) should give you the intuition
Now, I don't know the origin of the term beam in this context or why it's called beam search. In some papers I've come across (example), the beam acts like an object that has properties like the size (beam size).
However, it could refer to the paths or the polygon formed by the external nodes at each layer from the first to the last, which is kind of similar to or could remind us of a light beam (image taken from here).
The terminology is not very important, unless you're only interested in the history of the algorithm, which I assume not
Here is Guillaume Klein's answer at the issue section of the Git repository:
"Beam Search" in Wikipedia:
Additionally, the beam size/beam width is controlling the number of paths that are explored at each step when generating an output.