When a neural network learns something from a data set, we are left with a bunch of weights which represent some approximation of knowledge about the world. Although different data sets or even different runs of the same NN might yield completely different sets of weights, the resulting equations must be mathematically similar (linear combinations, rotations, etc.). Since we usually build NNs to model a particular concrete task (identify cats, pedestrians, tumors, etc.), it seems that we are generally satisfied to let the network continue to act as a black box.
Now, I understand that there is a push for "understandability" of NNs, other ML techniques, etc. But this is not quite what I'm getting at. It seems to me that given a bunch of data points recording the behavior of charged particles, one could effectively recover Maxwell's laws using a sufficiently advanced NN. Perhaps that requires NNs which are much more sophisticated than what we have today. But it illustrates the thing I am interested in: NNs could, in my mind, be teaching us general truths about the world if we took the time to analyze and simplify the formulae that they give us1.
For instance, there must be hundreds, if not thousands of NNs which have been trained on visual recognition tasks that end up learning many of the same sub-skills, to put it a bit anthropomorphically. I recently read about gauge CNNs, but this goes the other way: we start with what we know and then bake it into the network.
Has anyone attempted to go the opposite way? Either:
- Take a bunch of similar NNs and analyze what they have in common to extract general formulae about the focus area2
- Carefully inspect the structure of a well-trained NN to directly extract the "Maxwell's equations" which might be hiding in them?
1 Imagine if we built a NN to learn Newtonian mechanics just to compute a simple ballistic trajectory. It could surely be done, but would also be a massive waste of resources. We have nice, neat equations for ballistic motion, given to us by the "original neural networks", so we use those.
2 E.g., surely the set of all visual NNs have collectively discovered near-optimal algorithms for edge/line/orientation detection, etc.). This could perhaps be done with the assistance of a meta-ML algorithm (like, clustering over NN weight matrices?).