Autoencoder: predictions missing for nodes in the bottleneck layer

Question

I'm using tf.Keras to build a deep-fully connected autoencoder. My input dataset is a dataframe with shape (19947,), and the purpose of the autoencoder is to predict normalized gene expression values. They are continuous values that range from [0,~620000].

I tried different architectures and I'm using relu activation for all layers. To optimize I'm using adam with mae loss.

The problem I have is the network trains successfully (although the train loss is still terrible) but when I'm predicting I notice that although the predictions do make sense for some nodes, there are always a certain number of nodes that only output 0. I've tried changing the number of nodes of my bottleneck layer (output) and it always happens even when I decrease the output number.

Any ideas on what I'm doing wrong?

tf.Keras code:

input_layer = keras.Input(shape=(19947,))
simple_encoder = keras.models.Sequential([
    input_layer,
    keras.layers.Dense(512, activation='relu'),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(16, activation='relu')
])
simple_decoder = keras.models.Sequential([
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(512, activation='relu'),
    keras.layers.Dense(19947, activation='relu')
])
simple_ae = keras.models.Sequential([simple_encoder, simple_decoder])
simple_ae.compile(optimizer='adam', loss='mae')
simple_ae.fit(X_train, X_train,
              epochs=1000,
              validation_data=(X_valid, X_valid),
              callbacks=[early_stopping])

Output of encoder.predict with 16 nodes on the bottleneck layer. 7 nodes predict only 0's and 8 nodes predict "correctly"

Have you done any normalization of the data? The weights and biases will have a hard time influencing a number with a magnitude like 620000. — David Hoelzer, Nov 29 '20 at 22:48
The data is already normalized using a specific algorithm to normalize genomic expression data, I could normalize on top of that using min-max or standard scaler but I'm not sure that would be accurate — beerzy, Nov 29 '20 at 23:42

score 0 · Accepted Answer · answered Nov 30 '20 at 07:25

0

My closest guess is because you are using the activation function ReLU, which pushes data to be greater than zero. However, because of your data's nature, the autoencoder is highly dependent on negative calculations to reconstruct your data, but the best it can achieve is zero.

In the context of artificial neural networks, the rectifier is an activation function defined as the positive part of its argument: $$f(x) = x^+ = max(0, x)$$ https://en.wikipedia.org/wiki/Rectifier_(neural_networks)

answered Nov 30 '20 at 07:25

kiarash_kiani

51
5

I'm not sure I fully understand your answer. I do understand that relu pushes any negative numbers to zero, however my data contains no negative number so what's the logic behind the bottleneck layer predicting negative numbers? Anyway, I will try to use LeakyReLu as activation wich should in theory fix the problem right? Do you suggest I try this change only on the bottleneck layer or on all layers? – beerzy Nov 30 '20 at 15:30
I've tried the following: 1) Convert my dataset to ]0,~620000]; 2) Apply log2 transformation converting it to [-23.25, 19.24]. My hope is that changing to activations that accept negative numbers will fix it, but I still have the same issue. – beerzy Nov 30 '20 at 16:08
I mean, the auto encoder probably wanted to calculate negative outputs in hidden layers but the relu won't let it to be. Have you tried to use linear activation? – kiarash_kiani Nov 30 '20 at 18:05
Try to change all activation function in all layers – kiarash_kiani Nov 30 '20 at 18:14
Does decreasing bottleneck size affects reconstruction error? – kiarash_kiani Nov 30 '20 at 18:20
Indeed you were right, I tried to change all the activations to linear and the nodes predicting all zeroes are predicting negative numbers. I don't understand, however, why it predicts negative weights if my dataset contains none, but maybe I shouldn't worry about that as long as I can minimize my error. Regarding decreasing the bottleneck size it worsens the reconstruction error in general. I will accept your answer, but will let it open a bit more if someone wants to jump in and explain why it is happening. – beerzy Nov 30 '20 at 18:36
I'm glad to hear that. When you are using an autoencoder, you are doing dimension reduction—you can see it as a linear or non-Linear PCA. This means you are transforming data from one vector space (input layer) to another vector Space (bottleneck layer). Your new representation is calculated by gradient descent. So it is acceptable that you get negative features on the bottleneck (after transformation, in the new vector space) while you don't have any negative number in the input layer which is your original vector space. – kiarash_kiani Dec 01 '20 at 03:23
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/116839/discussion-between-kiarash-kiani-and-beerzy). – kiarash_kiani Dec 01 '20 at 03:30

Autoencoder: predictions missing for nodes in the bottleneck layer

1 Answers1