3

In mathematics, there is a proof that the following infinite series converges to a constant irrational number, denoted by $e$, called as Euler's number or Euler's constant or Napier's constant. The value of $e$ lies between 2 and 3.

$$1 + \dfrac{1}{1!} + \dfrac{1}{2!} + \dfrac{1}{3!} + \cdots$$

The natural exponential function, defined as follows, has some interesting properties

$$f: \mathbb{R} \rightarrow \mathbb{R}$$ $$f(x) = e^x$$

It is used in several algorithms and in the definitions of functions like SoftMax. I am interested in knowing the possible mathematical characteristics that lead this function useful in artificial intelligence.

The following are the properties I am aware of. But, I am not sure about how some of them will be useful

  1. Non-linearlity: Activation functions are intended to provide non-linearity. So, it is a candidate for activation functions due to this property. You can check its graph here.

  2. Differentiability: Loss functions used for in back-propagation algorithm need to be differentiable. So, it can be a candidate for usage in loss functions too.

$$\dfrac{d}{dx} e^x = e^x \text{ for all } x \in \mathbb{R}$$

  1. Continuity: I am not sure how this property is useful in algorithms. Intuitively, you can check from graph provided above that it is continuous.

  2. Smoothness: I am not sure how this property is useful in algorithms. But seems useful. The natural exponential function has the smoothness property.

$$\dfrac{d^n}{d^nx} e^x = e^x \text{ for all } x \in \mathbb{R} \text{ and } n \in \mathbb{N}$$.

Are there any other properties like non-linearily, differentiability, smoothness etc., for the natural exponential function that make it superior to use in AI algorithms?

Glorfindel
  • 169
  • 1
  • 5
  • 13
hanugm
  • 3,571
  • 3
  • 18
  • 50
  • I think this question is a bit broad because a function may be used in different ways, in different contexts, and for different purposes. I think you're interested in ML (and maybe specifically in neural networks) and not the whole AI. If that's the case, I would start by replacing "AI" with "ML" or be more specific, then I would try to motivate a little more why you're asking this question. For example, in logic-based AI, maybe the characteristics that you mention are not even useful. Where would a function be used in those contexts? These are all issues that I see with this post. – nbro Mar 02 '22 at 14:48
  • @nbro But I am not sure there are many branches in AI that use $e^x$. And it is true that there are plenty of branches, but, if we view from the perspective of function only, there are finite properties for $e^x$. Am I wrong anywhere? – hanugm Mar 03 '22 at 10:13

3 Answers3

2

The question must be split in two:

Use in softmax: softmax is based in the concept of cross entropy, logistic regression, logistic function ... . All these concepts includes exponential or logarithmic in its formulation.

Use in activation functions:

  • monotonic: a non-monotonic function will give same output to different inputs, disturbing the concept of distance/error.
  • non-linear: a linear activation function collapses the layer with the next one, thus, it is useless.
  • differentiable: necessary to use back-propagation.
pasaba por aqui
  • 1,282
  • 6
  • 21
1

While all the properties you list are true, the essential parts are the Differentiability and the fact that you build a nice probability distribution from it.

The unnormalized scores the neural network puts out before we apply softmax to them may very well contain negative values. Applying the exponential function is a great way to build a probability distribution from a given vector, disregarding the sign of the values.

tnfru
  • 348
  • 1
  • 12
-1

The key properties of the softmax function that are useful in machine learning are:

  1. It takes as input a vector of real numbers.
  2. After applying softmax, each component will be in the interval [0 , 1] and the components will add up to 1. Thus, a probability distribution.

In short it converts a random dataset of real numbers (usually normalize to [0,1] and transforms them into probabilities which are can useful for 1) classification/regression problems and 2) to minimize the cost/loss function used in neural networks

codecypher
  • 127
  • 4