3

I want to train my neural network by evolution, that is I want to recombine the weights of the best performing neural networks in each evolution cycle or generation. My initial instinct was to represent weights as they are, which is variable of type double, and either

  1. swap the weights between the two parent network

  2. generate a random number between the two weights

However, what I need is to represent the weights as a binary string, and then, as usual, carry out the crossover on the string.

So, how can I take my array of weights (of floating-point values or doubles) and convert that into binary string? Should the chromosome contain an array of strings where each string represents a single weight?

nbro
  • 39,006
  • 12
  • 98
  • 176
Raed
  • 31
  • 1
  • 1
    Ask yourself: do I really need to convert it to bits? Modern implementations just modify the _values_ of the weights. Using bits to do this is highly inefficient. During combining (aka crossover), you could average the weights of the two parents, or select them uniformly. Have a look at [this paper](http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf) – Thomas Wagenaar Jun 16 '17 at 12:09
  • 1
    From experience, crossover between two parent NN just gives unusable offsprings, it works much better if you skip crossover all together and use mutation as the main offspring generator. Another advice, keep your weights data type as is. – Aus Jun 17 '17 at 18:57
  • Are you still using backprop? If so, you need to preserve a differentiable gradient across your network to backpropagate the error, which I believe any swapping of data (weights) in/out will disrupt. If not, it seems to me that training a NN using evolution instead of backprop will be counterproductive. This is because: 1) in theory, it's bad to replace a continuous learning method like backprop with discontinuous methods like crossover and mutation, and 2) in practice, evolution will prove to be a circuitous and *very* slow way to train a NN's many weights. – Randy Jun 19 '17 at 20:14

1 Answers1

3

I would first say consider the advice of Thomas W in the comment above and think about whether you really need to discretize your variables. I'd also question the wisdom of training a reasonably sized network with a GA instead of a dedicated neural net training algorithm that's very likely to exhibit much better performance. However, assuming you really do need to do this, the basic approach is generally to do the following.

  1. Decide on your range and precision. For the sake of simplicity, I'm going to assume you use say $L=16$ bits to represent every weight. You can have some variables encoded at a higher precision than others -- you're in charge of writing the encode/decode functions, so it's entirely up to you, but here, I'm assuming you don't. Your encoding will now be a vector of length $16n$, where $n$ is the number of weights in the network. You can use a string if you like, but it's probably a net loss of efficiency, as mostly what you'll be doing is converting those bits to integers so that you can do arithmetic necessary to decode the encoded weights, so I'd just make it a vector of $16n$ integers.

For the range, you just need the minimum and maximum values you're going to allow. Again, for simplicity, I'm going to say the values range between $-10$ and $10$, denoted as $R_{\min}=-10$ and $R_{\max}=10$.

  1. Your GA then just manipulates the $0$s and $1$s. To evaluate fitness, you have to decode each of those $16$-bit strings into a double between $-10$ and $10$. With $16$ bits, you have $2^{16}=16536$ possible values. The total range of output values is $10-(-10)=20$. $20/16536=0.0012$. Let $0000000000000000$ be the smallest possible string, and map that onto the value $-10$. Now, every increment to that binary string just adds $0.0012$ to the output value.

Putting all this together, you have $w_i = (R_{\max} - R_{\min})/2^L + R_{\min}$. Do that for each block of $16$ bits giving you each weight in the network. Run your network on your training data and use whatever accuracy metric as the fitness function for the GA.

  1. For the GA itself now, you don't really have to do anything special. The GA can just see an arbitrary string of $16n$ bits. You can do crossover and mutation on just the apparently meaningless string of bits.

Your question was asking about going from a vector of weights to a vector of strings, which is the opposite conversion from that described above. Generally, for a GA, you don't need a phenotype->genotype conversion. You just generate genotypes using genetic operators and use the genotype->phenotype conversion to evaluate candidate solutions. However, there are some reasons why you might want to go the other way. For instance, you might want to seed the population with a known set of weights, and so you'd need the ability to convert those weights to binary.

Basically, in that case, just invert that conversion function. The idea is to convert the floating-point weight between $R_{\min}$ and $R_{\max}$ into the integer that's closest to the same distance into the range when mapped to $0$ to $2^{L}-1$. For example, if the weight is $-10$, then it's the smallest possible value. It maps onto the smallest possible value between $0$ and $2^{L}-1$, namely $0$. Similarly, a weight of $0$ maps onto $(2^{L}-1)/2$, because it's halfway between the min and max of both ranges. If you play around with it a bit, you should see that the expression here is given by $\big \lfloor \frac{w_i - R_\min}{R_\max - R_\min} 2^L \big \rfloor$. That gives me an integer in the range $0$ to $2^L$. From there, just do a standard binary encoding (or gray code, whatever) to get into a bit string you can manipulate.

deong
  • 611
  • 3
  • 4