Let's assume we need to train an RL model that drops duplicates in a tabular dataset? The actions should probably defined as drop or do nothing.
But what should be the agent itself then? To me, it doesn't make sense to see it just as a navigator looping over the states (dataset indices from the first to the last) and decide on which to drop.