0

Gradients are used in optimization algorithms. Based on the values of gradients, we generally update the weights of a neural network.

It is known that gradients have a direction and the direction opposite to the gradient should be considered for weight updation. In any function of two dimensions: one input, and one output, there are only two possible directions for any gradient: left or right.

Is the number of gradient directions infinite in higher dimensions ($\ge 3$)? Or does the number of possible directions are $2n$ where $n$ is the number of input variables?

hanugm
  • 3,571
  • 3
  • 18
  • 50

1 Answers1

1

Let's look at the definition of gradient:

In vector calculus, the gradient of a scalar-valued differentiable function $f$ of several variables is the vector field (or vector-valued function) $\nabla f$ whose value at a point $p$ is the vector $r^{[a]}$ whose components are the partial derivatives of $f$ at $p .^{[1][2][3][4][5][6][7][8][9]}$ That is, for $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$, its gradient $\nabla f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}$ is defined at the point $p=\left(x_{1}, \ldots, x_{n}\right)$ in $n$-dimensional space as the vector: $^{[b]}$ $$ \nabla f(p)=\left[\begin{array}{c} \frac{\partial f}{\partial x_{1}}(p) \\ \vdots \\ \frac{\partial f}{\partial x_{n}}(p) \end{array}\right] $$

First of all, the gradient is not a single value or a vector, it's an operator that given a function returns another function (note that in the definition $$\nabla f$$ map from $\mathbb{R}^n$ to $\mathbb{R}^n$ again), which can be used to compute a vector for each point of a field. So, it doesn't really make sense to talk of a gradient direction per se, since the direction actually belongs to the single vectors associated with each point of the field. How many directions do these vectors have? Well, it depends on the field. A plane has 2 dimensions, hence 2 directions in which you can move, in the same way, $\mathbb{R}^n$ will have $n$ dimensions, hence $n$ directions in which you can move.

Note also that in gradient descent the gradient is computed with respect to the cost (loss) function:

$$W_{t+1} = w_t - \alpha(\partial C / \partial w)$$

Mathematically this means that:

  • we have a set of weights $w$
  • we use these weights to produce an output given a specific input
  • we then compute an error (or cost, or loss) between the input and output, i.e. we compute at which point of the field of the cost function we end up due to our current weights
  • we then compute the gradient of the cost function, i.e. we look at all points of the field of the cost function to understand which vector has the biggest magnitude (we care most about the magnitude rather than the direction) and finally
  • we update the weight in such a way that the next value produced by the weights when computing the cost function will be in the same direction as the previously found vector.
hanugm
  • 3,571
  • 3
  • 18
  • 50
Edoardo Guerriero
  • 5,153
  • 1
  • 11
  • 25
  • 1
    When people say "gradient", as in the case of gradient descent, people usually refer to the gradient vector and not the gradient operator. It's important to distinguish the two, as you're doing, but it's also important to say that people usually refer to the gradient vector. You also say "A plane has 2 dimensions, hence 2 directions in which you can move". I would say that it has 2 degrees of freedom for the direction, but the directions of these vectors will not just be 1 of 2 possibilities, in case that's what you mean. In other words, in a 2d space, you have infinitely many directions. – nbro Aug 21 '21 at 10:54