I want some clarification on how residual vector quantization works in ColBERT v2.
The model independently encodes document terms offline using a BERT-like model. These embeddings are further reduced to a lower dimension (128-d
) using a linear projection layer. These 128-d term vectors are first coarse-quantized using k-means clustering. For each term vector, we compute a residual as the difference between the vector and cluster centroid.
Each of these residuals is further quantized using b-bits per dimension (b=1 or 2). How does this b-bit quantization work exactly? I am familiar with product quantization, where each vector is divided into sub-vectors. These sub-vectors are then quantized using k-means; however, how does b-bit per dimension quantization work? Is it the same as PQ, where the sub-vector is an individual bit?
Thanks!