5

Is it possible to classify data using a genetic algorithm? For example, would it be possible to sort this database?

Any example in Matlab?

nbro
  • 39,006
  • 12
  • 98
  • 176
  • Yes. You may want to read this article: http://www.genetic-programming.com/jkpdf/tr1314.pdf and/or this book: http://dl.acm.org/citation.cfm?id=138936 . – user31264 Mar 05 '17 at 23:09

2 Answers2

3

It is possible, but is a pretty terrible idea.

There are a few options. One is to not use the GA as a direct classifier, but instead use a GA to learn the parameters of another classification model like a neural network. The basic idea of a GA is that it (very roughly speaking) forms a black-box method for searching an arbitrary space for solutions that minimize or maximize some function.

Here, you would be searching the space of possible neural network topologies and/or weights to find one that minimizes the misclassification rate.

Another approach is that taken by what are sometimes called Learning Classifier Systems (LCS) or Genetics Based Machine Learning (GBML). This approach is to use evolutionary mechanics to evolve rule sets of the form "if X condition is true, then do/classify Y". That's a more direct method of solving this sort of problem. You define some features on your dataset, and the algorithm tries to learn rules based on those features.

The problem with any of these approaches is just that there are so many better ways to solve the problem. Remember, a GA is basically a black-box that's supposed to work acceptably well for a huge range of unknown problems. But I'm not solving a huge range of unknown problems. I'm trying to separate ham from spam on one dataset. I can come up with methods that simply do that job better and more quickly than a GA has any real hope of doing.

deong
  • 611
  • 3
  • 4
2

You must understand that a genetic algorithm is an optimization algorithm. You can't feed it e-mails and make it classify spam. A genetic algorithm is used to train a model to classify spam. That something could be neural networks.

What you need is a genetic algorithm that optimizes neural networks neuroevolution, which might roughly work as follows

  1. Start with a pool of neural networks
  2. Feed them e-mails, let them classify, and calculate fitness on % correct
  3. Select neural networks for crossover
  4. Crossover
  5. Mutate

However, there are better ways for classifying e-mails (e.g. an algorithm that looks for certain "spam words").

But it is definitely possible. I have a javascript library set up for neuroevolution, if you're interested.

nbro
  • 39,006
  • 12
  • 98
  • 176
Thomas Wagenaar
  • 1,187
  • 8
  • 11