What are the mathematical properties of natural exponential function that lead to its usefulness in artificial intelligence?

Question

In mathematics, there is a proof that the following infinite series converges to a constant irrational number, denoted by $e$, called as Euler's number or Euler's constant or Napier's constant. The value of $e$ lies between 2 and 3.

$$1 + \dfrac{1}{1!} + \dfrac{1}{2!} + \dfrac{1}{3!} + \cdots$$

The natural exponential function, defined as follows, has some interesting properties

$$f: \mathbb{R} \rightarrow \mathbb{R}$$ $$f(x) = e^x$$

It is used in several algorithms and in the definitions of functions like SoftMax. I am interested in knowing the possible mathematical characteristics that lead this function useful in artificial intelligence.

The following are the properties I am aware of. But, I am not sure about how some of them will be useful

Non-linearlity: Activation functions are intended to provide non-linearity. So, it is a candidate for activation functions due to this property. You can check its graph here.
Differentiability: Loss functions used for in back-propagation algorithm need to be differentiable. So, it can be a candidate for usage in loss functions too.

$$\dfrac{d}{dx} e^x = e^x \text{ for all } x \in \mathbb{R}$$

Continuity: I am not sure how this property is useful in algorithms. Intuitively, you can check from graph provided above that it is continuous.
Smoothness: I am not sure how this property is useful in algorithms. But seems useful. The natural exponential function has the smoothness property.

$$\dfrac{d^n}{d^nx} e^x = e^x \text{ for all } x \in \mathbb{R} \text{ and } n \in \mathbb{N}$$.

Are there any other properties like non-linearily, differentiability, smoothness etc., for the natural exponential function that make it superior to use in AI algorithms?

I think this question is a bit broad because a function may be used in different ways, in different contexts, and for different purposes. I think you're interested in ML (and maybe specifically in neural networks) and not the whole AI. If that's the case, I would start by replacing "AI" with "ML" or be more specific, then I would try to motivate a little more why you're asking this question. For example, in logic-based AI, maybe the characteristics that you mention are not even useful. Where would a function be used in those contexts? These are all issues that I see with this post. — nbro, Mar 02 '22 at 14:48
@nbro But I am not sure there are many branches in AI that use $e^x$. And it is true that there are plenty of branches, but, if we view from the perspective of function only, there are finite properties for $e^x$. Am I wrong anywhere? — hanugm, Mar 03 '22 at 10:13

score 2 · Answer 1 · answered Feb 13 '22 at 17:08

The question must be split in two:

Use in softmax: softmax is based in the concept of cross entropy, logistic regression, logistic function ... . All these concepts includes exponential or logarithmic in its formulation.

Use in activation functions:

monotonic: a non-monotonic function will give same output to different inputs, disturbing the concept of distance/error.
non-linear: a linear activation function collapses the layer with the next one, thus, it is useless.
differentiable: necessary to use back-propagation.

score 1 · Answer 2 · answered Sep 16 '21 at 10:29

While all the properties you list are true, the essential parts are the Differentiability and the fact that you build a nice probability distribution from it.

The unnormalized scores the neural network puts out before we apply softmax to them may very well contain negative values. Applying the exponential function is a great way to build a probability distribution from a given vector, disregarding the sign of the values.

codecypher · Answer 3 · 2021-08-17T04:52:43.100

The key properties of the softmax function that are useful in machine learning are:

It takes as input a vector of real numbers.
After applying softmax, each component will be in the interval [0 , 1] and the components will add up to 1. Thus, a probability distribution.

In short it converts a random dataset of real numbers (usually normalize to [0,1] and transforms them into probabilities which are can useful for 1) classification/regression problems and 2) to minimize the cost/loss function used in neural networks

What are the mathematical properties of natural exponential function that lead to its usefulness in artificial intelligence?

3 Answers3