Can we get the inverse of the function that a neural network represents?

Question

I was wondering if it's possible to get the inverse of a neural network. If we view a NN as a function, can we obtain its inverse?

I tried to build a simple MNIST architecture, with the input of (784,) and output of (10,), train it to reach good accuracy, and then inverse the predicted value to try and get back the input - but the results were nowhere near what I started with. (I used the pseudo-inverse for the W matrix.)

My NN is basically the following function:

$$ f(x) = \theta(xW + b), \;\;\;\;\; \theta(z) = \frac{1}{1+e^{-z}} $$

I.e.

def rev_sigmoid(y):
    return np.log(y/(1-y))

def rev_linear(z, W, b):
    return (z - b) @ np.linalg.pinv(W)

y = model.predict(x_train[0:1])
z = rev_sigmoid(y)
x = rev_linear(z, W, b)
x = x.reshape(28, 28)
plt.imshow(x)

^ This should have been a 5:

Is there a reason why it failed? And is it ever possible to get inverse of NN's?

EDIT: it is also worth noting that doing the opposite does yield good results. I.e. starting with the y's (a 1-hot encoding of the digits) and using it to predict the image (an array of 784 bytes) using the same architecture: input (10,) and output (784,) with a sigmoid. This is not exactly equivalent, to an inverse as here you first do the linear transformation and then the non-linear. While in an inverse you would first do (well, undo) the non-linear, and then do (undo) the linear. I.e. the claim that the 784x10 matrix is collapsing too much information seems a bit odd to me, as there does exist a 10x784 matrix that can reproduce enough of that information.

The basic condition for a function to be invertible is that the function should be one to one (basically monotonic). In your structure you are clearly adding multiple inputs which destroys one to one relation. I think you can prove this formally, Maths.SE might help you. — , Jan 18 '20 at 16:47
@respectful I was actually implementing the other way around, giving 1-hot-encoding and trying to find meaningful representation of it in latent space that you can construct an image from. And a friend asked me if you could reverse it to make it back a classification problem - and it made me wonder why not actually. — Maverick Meerkat, Jan 18 '20 at 20:48
Interesting. I think Autoencoders come pretty close to what you're looking for but they aren't strictly "mathematical" inverses. On a theoretical note, it seems that the "resolution" of the pseudo inverse would be really low thus any "reconstruction" will be of low quality because of the low rank. Could you post the original picture of the 5? I might just be biased but it seems like there is a very low resolution 5 hiding in the noise. I'll write a more thorough answer in a bit. — respectful, Jan 18 '20 at 21:11
AFAIU, this is a major scientific problem of neural networks. We don't understand why they work, and how to inverse them. If you solve that problem, please public several academic papers (as [open science](https://en.wikipedia.org/wiki/Open_science) please!) — Basile Starynkevitch, Jan 18 '20 at 23:33
@BasileStarynkevitch lol, didn't know it was such a huge topic. Cool. I'f I find a solution to this I'll let you know. — Maverick Meerkat, Jan 19 '20 at 13:18
I wonder if such a reconstruction can be done for all the classes. That is, is there some qualitatively "faithful" inverse. Specifically, for any class can we reconstruct, via an inverse process, a nice image like the one you did. — respectful, Jan 23 '20 at 22:18
@respectful yes - the y-to-x mapping give similar results to all 10 chars/classes. — Maverick Meerkat, Jan 24 '20 at 13:22
They did exactly what you want in this jupyter notebook: https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/blob/master/part3_neural_network_mnist_backquery.ipynb — Lukas Brunner, Feb 03 '22 at 10:34

respectful · Answer 1 · 2020-01-19T00:43:25.290

Mathematical Exploration

let $\Theta^+$ be the pseudo-inverse of $\Theta$.

Recall, that if a vector $\boldsymbol v \in R(\Theta)$ (ie in the row space) then $\boldsymbol v = \Theta^+\Theta\boldsymbol v$. That is, so long as we select a vector that is in the rowspace of $\Theta$ then we can reconstruct it with full fidelity using the pseudo inverse. Thus, if any of the images happen to be linear combinations of the rows of $\Theta$ then we can reconstruct it.

To be more specific. let $f(\boldsymbol x)$ have a pseudo-inverse $f^+(\boldsymbol x)$ defined as you have. If we restrict our domain such that $\boldsymbol x \in C(\Theta^T)$ (column space of the transpose) then $f^+= f^{-1}_{res}$.

That is, under our domain restriction the pseudo inverse becomes a true inverse.

An Extrapolation

It would then seem that so long as we are under such domain restrictions then we could define a pseudo inverse for a general NN. Though, it might be possible that some NNs don't have any restriction that admits an inverse. Maybe, there is some way to regularize the parameters such that this is possible. NNs with ReLU wouldn't admit such an inverse since ReLU loses information on negative values. Leaky Relu might work.

Further Investigation

Finally, this presents a zone for further study. Some questions to answer might be:

Is it possible for optimized parameters to contain non-trivial examples in their row-space?
If so, under what conditions is this possible?
Are the examples in any way represented in the row space?
Is there some way to regularize a NN such that it admits an inverse over some desired restriction?
Under what conditions is invertibility useful?

When researches change slightly the input of a NN to observe its impact on the output, hoping to find explainability/adversary examples, isn't that a poor-man's approach to inveritibility? If you agree that's the case, then these are the first two examples of an answer for your last question. The inveritibility would allow us to create clusters of domain-space rows that map to a specific output in classification problems, or map to variations in regression problems. Such a thing would be great! — Alpha, Jan 23 '20 at 00:39
@Alpha You are absolutely right. I'm actually writing a paper on this right now. This sort of thing will be covered. It's really fascinating. I'm hoping to have the results shortly. — respectful, Jan 23 '20 at 00:46

Maverick Meerkat · Answer 2 · 2020-01-23T18:18:47.530

So, if I go the opposite way, start with my y and predict an x, and then ask for the inverse of that - I get really good results (actually - 100% accuracy).

i.e.

model = Sequential([
    Dense(784, input_shape=(10,), activation='sigmoid'),
])
model.compile(loss=keras.losses.binary_crossentropy,
              optimizer=keras.optimizers.Adam(0.01),
              metrics=['binary_crossentropy'])
model.fit(y_train, x_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(y_test, x_test))
# train until accuracy > 0.9, then:
W, b = model.get_weights()
y = y_train
x = reverse.predict(y)
z = rev_sigmoid(x)
y_hat = rev_linear(z, W, b)
(y_hat.argmax(axis=1) == y.argmax(axis=1)).mean()  # 1.0

After playing a bit with some toy examples, I think the other way is probably not possible, as the matrices don't have an inverse. Putting these (toy) matrices in WolframAlpha for example tells you the determinant is 0, but in numpy the determinant is just slightly bigger than 0, so you manage to calculate an "inverse" which is not really an inverse and get the bad results.

It's also makes sense. In the reversed scenario, we start with 10 dimension, expand to 784, and then collapse back to 10. But in the "regular" scenario, we start at 784, collapse to 10, and then expand to 784 again - and (I guess) too much information is lost then.

Nice edit! You are right, in general the neural network does not have an inverse. It does however have what I'm calling its pseudo inverse (extending the idea of a matrix pseudo inverse). I'm currently working on developing the general mathematical formula for the pre-image which is very useful in regards to adversarial examples. — respectful, Jan 23 '20 at 22:16

Anant Dev · Answer 3 · 2021-03-16T17:16:32.390

TL DR; I do not think the inverse of any reasonable neural network would exist.

Assume that you are using $32-bit$ floating-point numbers in the MNIST example. The number of distinct numbers that a $32-bit$ float can represent is finite (say $x$)

The number of different images you can put into the neural network = $x^{784}$. But the total number of distinct outputs is only $x^{10}$ (As the output is a vector of length $10$).

Hence by the pigeonhole principle there are multiple inputs that correspond to the same output.

Also on an average there are $x^{784}$ / $x^{10}$ = $x ^ {774}$ input images for a single output vector.

This means the function is not invertible as it is not one to one (and that by a long shot).

Some people are also discussing pseudo inverses. I do not know much about pseudo inverses. Still, for these to work (by that, I mean to be able to produce images of numbers from a given output vector), they should be able to distinguish between the $x ^ {774}$ images into

Images that look like real numbers and
Others, which are a total mess of flickering pixels that just happen to produce the given output.

Whatever algorithm solves this problem hence mush possesses domain knowledge of the problem, probably inherited through the neural network weights.

This seems to be an unexplored region of Neural Networks.

Hope this helps:)

Can we get the inverse of the function that a neural network represents?

3 Answers3