2

I am reading this blog post: https://ruder.io/optimizing-gradient-descent/index.html. In the section about AdaGrad, it says:

It adapts the learning rate to the parameters, performing smaller updates (i.e. low learning rates) for parameters associated with frequently occurring features, and larger updates (i.e. high learning rates) for parameters associated with infrequent features.

But I am not sure about the meaning of infrequent features: is it that the value of a given feature changes rarely?

nbro
  • 39,006
  • 12
  • 98
  • 176

1 Answers1

2

We will describe the input to the network as a vector, called features vector. Each component of this vector is usually related to some "real world" information, by example "age of the person", "number of atoms", "...".

In very usual situations, a specific component of the input vector will have near than always the same value. This is more usual in binary components or components that has a small set of possible values.

However, usually in the cases where this component has a value different from the most usual one, this component is very important and informative.

These values of this kind of components are called infrequent features.

(example: "rains?" is in my city 99.9% of time "false". However, when it is true, it is a key factor to all questions about the behavior of the population).

The problem with these features: as unusual values are infrequent, the net has few chances to learn from them, and some learning algorithms could fail to give them the weight that they must (taken into account that, as has been said, these components are very important when they take a value different to the most frequent one).

Some adaptive learning rate algorithms as AdaGrad tries to solve this issue.

pasaba por aqui
  • 1,282
  • 6
  • 21
  • Thank you very much for your answer, it is very clear :) – ava_punksmash Oct 23 '20 at 12:15
  • @ava_punksmash: you are welcome. Suggestion, uppvote answers you like, but do not accept any answer until one or two days after post the question. It is frequent that what seems a good answer is later improved by another. Even, that accepted answers are later proof as incorrect or false. – pasaba por aqui Oct 23 '20 at 12:18
  • OK thank you for the advice ! Edit: for now my vote is counted but not taken into account because i am a new user, but i do appreciate your help – ava_punksmash Oct 23 '20 at 12:27