Viterbi versus filtering

Question

In Chapter 15 of Russel and Norvig's Artificial Intelligence -- A Modern Approach (Third Edition), they describe three basic tasks in temporal inference:

Filtering,
Likelihood, and
Finding the Most Likely Sequence.

My question is on the difference between the first and third task. Finding the Most Likely Sequence determines, given evidences $e_1,\dots,e_n$, the most likely sequence of states $S_1,\dots,S_n$. This is done using the Viterbi algorithm. On the other hand, Filtering provides the probability distribution on states after seeing $e_1,\dots,e_n$. You could then pick the state with the highest probability, call it $S'_n$. I am guessing that $S'_n$ should always be equal to $S_n$. Likewise, you can already do the same after any prefix $e_1,\dots,e_i$, again picking the most likely state $S'_i$. I would love to have a simple example where $S'_1,\dots,S'_n$ is not equal to the sequence $S_1,\dots,S_n$ produced by the Viterbi algorithm.

score 2 · Answer 1 · answered Jun 19 '19 at 00:12

Welcome to AI.SE @vdbuss, and great first question!

This point is touched on in Section 15.2.3 (page 576 in my copy), in the second paragraph, and there's a good exercise at the end of the chapter (15.4) that is designed to get you to think through exactly why these are different procedures. If you want to really absorb it, I suggest trying to work out that exercise! If you want the quick answer, read on.

The basic action of filtering is to generate a probability distribution $P(X_{t+1} | e_{1:t+1})$ using only two pieces of information, specifically the current state distribution $P(X_{t} | e_{1:t})$, and the new piece of evidence $e_{t+1}$. So, when computing the most likely sequence, the algorithm cannot take into account the sequences that are actually possible, while Vitirbi can.

Here's a simple example: suppose I tell you that I'm going to drop you in a maze at one of two locations. I drop you near the top right corner with probability 0.75, and near the bottom left corner with probability 0.25. Suppose further that a Grue is known (with certainty) to live somewhere near the bottom left corner. Using filtering, your maximum aposterori estimate for your location after being dropped in the maze ($t=1$) is that you are in the top right corner. You then move 1 step to the right and can see a Grue. Clearly your estimate for your position in the second timestep $(t=2)$ must be the bottom left, because Grues only live there. But, you definitely can't end up moving to the bottom left by moving right from the top right, so your sequence has probability zero overall, despite using the maximum aposterori estimate for position at every step. To avoid this, Vitirbi uses a linear amount of extra space to select the maximum aposterori sequence, which in this case is clearly that you are near the bottom left in both timesteps.

Viterbi versus filtering

1 Answers1