Questions tagged [optimization]

For questions about implementing and improving optimization algorithms used in creating AI programs, or optimization in general.

In mathematics, computer science and operations research, mathematical optimization or mathematical programming, alternatively spelled optimisation, is the selection of a best element (with regard to some criterion) from some set of available alternatives.

In the simplest case, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations constitutes a large area of applied mathematics. More generally, optimization includes finding "best available" values of some objective function given a defined domain (or input), including a variety of different types of objective functions and different types of domains.

211 questions
14
votes
1 answer

What are the implications of the "No Free Lunch" theorem for machine learning?

The No Free Lunch (NFL) theorem states (see the paper Coevolutionary Free Lunches by David H. Wolpert and William G. Macready) any two algorithms are equivalent when their performance is averaged across all possible problems Is the "No Free Lunch"…
user9947
12
votes
1 answer

What are hyper-heuristics, and how are they different from meta-heuristics?

I wanted to know what the differences between hyper-heuristics and meta-heuristics are, and what their main applications are. Which problems are suited to be solved by hyper-heuristics?
bmwalide
  • 399
  • 2
  • 6
10
votes
2 answers

What are the limitations of the hill climbing algorithm and how to overcome them?

What are the limitations of the hill climbing algorithm? How can we overcome these limitations?
Abbas Ali
  • 566
  • 3
  • 10
  • 17
10
votes
1 answer

Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorch

I'm training an auto-encoder network with Adam optimizer (with amsgrad=True) and MSE loss for Single channel Audio Source Separation task. Whenever I decay the learning rate by a factor, the network loss jumps abruptly and then decreases until the…
10
votes
1 answer

When should you not use the bias in a layer?

I'm not really that experienced with deep learning, and I've been looking at research code (mostly PyTorch) for deep neural networks, specifically GANs, and, in many cases, I see the authors setting bias=False in some layers without much…
Nikos Tsakas
  • 113
  • 1
  • 8
10
votes
1 answer

What is the difference between reinforcement learning and evolutionary algorithms?

What is the difference between reinforcement learning (RL) and evolutionary algorithms (EA)? I am trying to understand the basics of RL, but I do not yet have practical experience with RL. I know slightly more about EAs, but not enough to understand…
10
votes
2 answers

Can artificial intelligence be thought of as optimization?

In this video an expert says, "One way of thinking about what intelligence is [specifically with regard to artificial intelligence], is as an optimization process." Can intelligence always be thought of as an optimization process, and can artificial…
dynrepsys
  • 1,363
  • 11
  • 22
9
votes
1 answer

How does weight normalization work?

I was reading the paper Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks about improving the learning of an ANN using weight normalization. They consider standard artificial neural networks where the…
9
votes
1 answer

Given a list of integers $\{c_1, \dots, c_N \}$, how do I find an integer $D$ that minimizes the sum of remainders $\sum_i c_i \text{ mod } D$?

I have a set of fixed integers $S = \{c_1, \dots, c_N \}$. I want to find a single integer $D$, greater than a certain threshold $T$, i.e. $D > T \geq 0$, that divides each $c_i$ and leaves remainder $r_i \geq 0$, i.e. $r_i$ can be written as $r_i =…
9
votes
3 answers

How is it possible that the MSE used to train neural networks with gradient descent has multiple local minima?

We often train neural networks by optimizing the mean squared error (MSE), which is an equation of a parabola $y=x^2$, with gradient descent. We also say that weight adjustment in a neural network by the gradient descent algorithm can hit a local…
8
votes
2 answers

Why does a one-layer hidden network get more robust to poor initialization with growing number of hidden neurons?

In a nutshell: I want to understand why a one hidden layer neural network converges to a good minimum more reliably when a larger number of hidden neurons is used. Below a more detailed explanation of my experiment: I am working on a simple 2D…
Chrigi
  • 181
  • 5
8
votes
2 answers

Why should the number of neurons in a hidden layer be a power of 2?

I have read somewhere on the web (I lost the reference) that the number of units (or neurons) in a hidden layer should be a power of 2 because it helps the learning algorithm to converge faster. Is this a fact? If it is, why is this true? Does it…
8
votes
2 answers

Why is the perceptron criterion function differentiable?

I'm reading chapter one of the book called Neural Networks and Deep Learning from Aggarwal. In section 1.2.1.1 of the book, I'm learning about the perceptron. One thing that book says is, if we use the sign function for the following loss function:…
8
votes
1 answer

Why is the learning rate generally beneath 1?

In all examples I've ever seen, the learning rate of an optimisation method is always less than $1$. However, I've never found an explanation as to why this is. In addition to that, there are some cases where having a learning rate bigger than 1 is…
7
votes
1 answer

What is an objective function?

Local search algorithms are useful for solving pure optimization problems, in which the aim is to find the best state according to an objective function. My question is what is the objective function?
1
2 3
14 15