5

Let's suppose that we have a legacy system in which we don't have the source code and this system is on a mainframe written in Cobol. Is there any way using machine learning in which we can learn from the inputs and outputs the way the executables work? Doing this analysis could lead to develop some rest / soap webservice that can substitute the legacy system in my opinion.

jcromanu
  • 153
  • 3

1 Answers1

9

Let's assume from the outset that the space of inputs is too large to allow exhaustive tabulation.

The essential issues when applying ML are that the program being modelled is likely in general to:

  • Be discrete, i.e. operate (at least in part) on integer, boolean or categorical variables.
  • Contain various conditional/looping constructs (if/while/for etc).
  • Have side-effects that affect other parts of the program (e.g. non-local variables) or world state (e.g. writing to a file).

These pose obstacles for ML methods such as ANN. The ML approach which is most immediately compatible with these issues is Genetic Programming (GP). A recent specialisation of GP that is specifically concerned with the transformation of existing software systems is Genetic Improvement (GI).

However neither GP/GI (nor any other current ML technique) is a 'silver bullet' here:

  • Despite decades of research, GP still works best at synthesizing relatively small functions, certainly not entire legacy programs.
  • Because it is only possible to train on a very small subset of a program's input space, there is little guarantee that the program will generalize to inputs it hasn't been trained on.
  • How will the success of side effects be formally determined for training purposes?

Some of these issues could be addressed to some degree if the program has a comprehensive test suite, but replacing an entire nontrivial program is not likely anytime soon. Replacing smaller parts of the program that have good unit tests is more realistic goal.

Here is a case study showing how GI was successfully used to fix errors in the implementation of Apache Hadoop, by operating only on the program binary.

NietzscheanAI
  • 7,206
  • 22
  • 36