In Chapter 15 of Russel and Norvig's Artificial Intelligence -- A Modern Approach (Third Edition), they describe three basic tasks in temporal inference:
- Filtering,
- Likelihood, and
- Finding the Most Likely Sequence.
My question is on the difference between the first and third task. Finding the Most Likely Sequence determines, given evidences $e_1,\dots,e_n$, the most likely sequence of states $S_1,\dots,S_n$. This is done using the Viterbi algorithm. On the other hand, Filtering provides the probability distribution on states after seeing $e_1,\dots,e_n$. You could then pick the state with the highest probability, call it $S'_n$. I am guessing that $S'_n$ should always be equal to $S_n$. Likewise, you can already do the same after any prefix $e_1,\dots,e_i$, again picking the most likely state $S'_i$. I would love to have a simple example where $S'_1,\dots,S'_n$ is not equal to the sequence $S_1,\dots,S_n$ produced by the Viterbi algorithm.