Questions tagged [optimization]

For questions about implementing and improving optimization algorithms used in creating AI programs, or optimization in general.

In mathematics, computer science and operations research, mathematical optimization or mathematical programming, alternatively spelled optimisation, is the selection of a best element (with regard to some criterion) from some set of available alternatives.

In the simplest case, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. The generalization of optimization theory and techniques to other formulations constitutes a large area of applied mathematics. More generally, optimization includes finding "best available" values of some objective function given a defined domain (or input), including a variety of different types of objective functions and different types of domains.

211 questions

votes

1 answer

What are the implications of the "No Free Lunch" theorem for machine learning?

The No Free Lunch (NFL) theorem states (see the paper Coevolutionary Free Lunches by David H. Wolpert and William G. Macready) any two algorithms are equivalent when their performance is averaged across all possible problems Is the "No Free Lunch"…

asked Sep 27 '19 at 13:52

user9947

votes

1 answer

What are hyper-heuristics, and how are they different from meta-heuristics?

I wanted to know what the differences between hyper-heuristics and meta-heuristics are, and what their main applications are. Which problems are suited to be solved by hyper-heuristics?

comparison definitions optimization meta-heuristics

asked Aug 27 '16 at 13:15

bmwalide

votes

2 answers

What are the limitations of the hill climbing algorithm and how to overcome them?

What are the limitations of the hill climbing algorithm? How can we overcome these limitations?

algorithm search optimization problem-solving hill-climbing

asked Nov 15 '18 at 15:03

Abbas Ali

votes

1 answer

Loss jumps abruptly when I decay the learning rate with Adam optimizer in PyTorch

I'm training an auto-encoder network with Adam optimizer (with amsgrad=True) and MSE loss for Single channel Audio Source Separation task. Whenever I decay the learning rate by a factor, the network loss jumps abruptly and then decreases until the…

machine-learning deep-learning optimization autoencoders objective-functions

asked Sep 20 '18 at 13:14

imflash217

votes

1 answer

When should you not use the bias in a layer?

I'm not really that experienced with deep learning, and I've been looking at research code (mostly PyTorch) for deep neural networks, specifically GANs, and, in many cases, I see the authors setting bias=False in some layers without much…

optimization deep-neural-networks pytorch

asked May 09 '21 at 12:36

Nikos Tsakas

votes

1 answer

What is the difference between reinforcement learning and evolutionary algorithms?

What is the difference between reinforcement learning (RL) and evolutionary algorithms (EA)? I am trying to understand the basics of RL, but I do not yet have practical experience with RL. I know slightly more about EAs, but not enough to understand…

reinforcement-learning comparison genetic-algorithms optimization evolutionary-algorithms

asked Jun 02 '20 at 16:25

Single Malt

votes

2 answers

Can artificial intelligence be thought of as optimization?

In this video an expert says, "One way of thinking about what intelligence is [specifically with regard to artificial intelligence], is as an optimization process." Can intelligence always be thought of as an optimization process, and can artificial…

optimization agi

asked Aug 02 '16 at 17:56

dynrepsys

1,363
11
22

votes

1 answer

How does weight normalization work?

I was reading the paper Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks about improving the learning of an ANN using weight normalization. They consider standard artificial neural networks where the…

neural-networks deep-learning papers optimization weight-normalization

asked Feb 16 '18 at 19:36

Mike AI

votes

1 answer

Given a list of integers $\{c_1, \dots, c_N \}$, how do I find an integer $D$ that minimizes the sum of remainders $\sum_i c_i \text{ mod } D$?

I have a set of fixed integers $S = \{c_1, \dots, c_N \}$. I want to find a single integer $D$, greater than a certain threshold $T$, i.e. $D > T \geq 0$, that divides each $c_i$ and leaves remainder $r_i \geq 0$, i.e. $r_i$ can be written as $r_i =…

machine-learning genetic-algorithms optimization constrained-optimization

asked Sep 27 '19 at 08:12

Ramzah Rehman

votes

3 answers

How is it possible that the MSE used to train neural networks with gradient descent has multiple local minima?

We often train neural networks by optimizing the mean squared error (MSE), which is an equation of a parabola $y=x^2$, with gradient descent. We also say that weight adjustment in a neural network by the gradient descent algorithm can hit a local…

neural-networks deep-learning optimization gradient-descent mean-squared-error

asked Apr 23 '19 at 19:12

isnvi23h4

votes

2 answers

Why does a one-layer hidden network get more robust to poor initialization with growing number of hidden neurons?

In a nutshell: I want to understand why a one hidden layer neural network converges to a good minimum more reliably when a larger number of hidden neurons is used. Below a more detailed explanation of my experiment: I am working on a simple 2D…

neural-networks optimization

asked Apr 05 '18 at 08:59

Chrigi

votes

2 answers

Why should the number of neurons in a hidden layer be a power of 2?

I have read somewhere on the web (I lost the reference) that the number of units (or neurons) in a hidden layer should be a power of 2 because it helps the learning algorithm to converge faster. Is this a fact? If it is, why is this true? Does it…

deep-learning optimization hyperparameter-optimization hyper-parameters hidden-layers

asked Feb 22 '18 at 16:56

dsfx3d

votes

2 answers

Why is the perceptron criterion function differentiable?

I'm reading chapter one of the book called Neural Networks and Deep Learning from Aggarwal. In section 1.2.1.1 of the book, I'm learning about the perceptron. One thing that book says is, if we use the sign function for the following loss function:…

optimization gradient-descent perceptron

asked Oct 27 '20 at 00:38

Flávio Mendes

votes

1 answer

Why is the learning rate generally beneath 1?

In all examples I've ever seen, the learning rate of an optimisation method is always less than $1$. However, I've never found an explanation as to why this is. In addition to that, there are some cases where having a learning rate bigger than 1 is…

machine-learning optimization gradient-descent learning-rate stochastic-gradient-descent

asked Sep 25 '20 at 03:40

Recessive

1,346
8
21

votes

1 answer

What is an objective function?

Local search algorithms are useful for solving pure optimization problems, in which the aim is to find the best state according to an objective function. My question is what is the objective function?

terminology objective-functions optimization local-search meta-heuristics

asked Nov 16 '18 at 15:40

Abbas Ali

2 3

…

14 15 Next