What is "bad local minima"?
The following papers all mention this expression.
What is "bad local minima"?
The following papers all mention this expression.
The adjective bad isn't mathematically descriptive. A better term is sub-optimal, which implies the state of learning might appear optimal based on current information but the optimal solution from among all possibilities is not yet located.
Consider a graph representing a loss function, one of the names to measure disparity between the current learning state and the optimal. Some papers use the term error. In all learning and adaptive control cases, it is the quantification of disparity between current and optimal states. This is a 3D plot, so it only visualizes the loss as a function of two real number parameters. There can be thousands, but the two will suffice to visualize the meaning of local versus global minima. Some may recall this phenomenon from pre-calculus or analytic geometry.
If pure gradient descent is used with an initial value to the far left, the local minimum in the loss surface would be located first. Climbing the slope to test the loss value at the global minimum further to the right would not generally occur. The gradient will in most cases cause the next iteration in learning algorithms to travel down hill, thus the term gradient descent.
This is a simple case beyond just the visualization of two parameters, since there can be thousands of local minima in the loss surface.
There are many strategic approaches to improve the speed and reliability in the search for the global minimum loss. These are just a few.
Interestingly, the only way to prove that the optimal state is found is to try every possibility by checking each one, which is not nearly feasible in most cases, or by relying on a model from which the global optimal may be determined. Most theoretical frameworks target a particular accuracy, reliability, speed, and minimal input information quantity as part of an AI project, so that no such exhaustive search or model perfection is required.
In practical terms, for example, an automated vehicle control system is adequate when the unit testing, alpha functional testing, and eventually beta functional testing all indicate a lower incidence of both injury and loss of property than when humans drive. It is a statistical quality assurance, as in the case of most service and manufacturing businesses.
The graph above was developed for another answer, which has additional information for those interested.
As mentioned in the abstract of on of these papers, bad local minima is a suboptimal local minimum which means a local minimum that is near to a global minimum.