We will describe the input to the network as a vector, called features vector. Each component of this vector is usually related to some "real world" information, by example "age of the person", "number of atoms", "...".
In very usual situations, a specific component of the input vector will have near than always the same value. This is more usual in binary components or components that has a small set of possible values.
However, usually in the cases where this component has a value different from the most usual one, this component is very important and informative.
These values of this kind of components are called infrequent features.
(example: "rains?" is in my city 99.9% of time "false". However, when it is true, it is a key factor to all questions about the behavior of the population).
The problem with these features: as unusual values are infrequent, the net has few chances to learn from them, and some learning algorithms could fail to give them the weight that they must (taken into account that, as has been said, these components are very important when they take a value different to the most frequent one).
Some adaptive learning rate algorithms as AdaGrad tries to solve this issue.