Im trying to train a neural network (VAE) using tensorflow and Im getting different results based on the type of input in the model.fit.
When I input arrays I get normal difference between the validation loss and the total loss. When I input a dataset based on the same input I get a normal total loss and a really small validation loss.
I havent changed the model. The only things that changes is the input format.
The code for when I input an array. train slices is (2627,138,138,1) and define the batch size in the model.fit
train_slices = preprocess_data(CropTumor, file_array[train_dataset])
val_slices = preprocess_data(CropTumor, file_array[val_dataset])
# reset model weights before training
VAE.set_weights(initial_weights)
# fit model
fit_results = VAE.fit(train_slices,train_slices,
epochs=1000,
validation_data=(val_slices,val_slices),
callbacks=[early_stopping_kfold, tensorboard_callback],
batch_size=batch_sz,
verbose=2
)
The output
Epoch 1/1000
2022-08-01 11:56:35.683852: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
2022-08-01 11:56:36.371780: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-08-01 11:56:36.461054: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
672/672 - 7s - loss: 537.2896 - val_loss: 213.7070 - 7s/epoch - 11ms/step
Epoch 2/1000
672/672 - 5s - loss: 248.5211 - val_loss: 161.9758 - 5s/epoch - 7ms/step
Epoch 3/1000
672/672 - 4s - loss: 192.8771 - val_loss: 125.9349 - 4s/epoch - 6ms/step
Epoch 4/1000
672/672 - 4s - loss: 153.1647 - val_loss: 99.4395 - 4s/epoch - 6ms/step
Epoch 5/1000
672/672 - 5s - loss: 132.0143 - val_loss: 88.9975 - 5s/epoch - 7ms/step
Epoch 6/1000
672/672 - 4s - loss: 118.5642 - val_loss: 81.1653 - 4s/epoch - 6ms/step
Epoch 7/1000
672/672 - 5s - loss: 108.6678 - val_loss: 76.6315 - 5s/epoch - 7ms/step
Epoch 8/1000
672/672 - 4s - loss: 100.9759 - val_loss: 73.8963 - 4s/epoch - 6ms/step
When on the other hand I use the same data in the form of dataset I get a really small validation loss
train_dset =
tf.keras.preprocessing.image_dataset_from_directory(directory="./Data/09_TrainingSet_VAE1",
labels=None,
label_mode=None,
image_size=(138, 138),
color_mode="grayscale",
batch_size=None,
shuffle=True)
val_dset = tf.data.Dataset.from_tensor_slices(val_slices)
train_dset = (train_dset.map(preprocess_dataset).batch(batch_sz).shuffle(1))
val_dset = (val_dset.map(preprocess_dataset).batch(batch_sz).shuffle(1))
# reset model weights before training
VAE.set_weights(initial_weights)
# fit model
fit_results = VAE.fit(train_dset,
epochs=10,
validation_data=val_dset,
callbacks=[early_stopping_kfold, tensorboard_callback],
verbose=2
)
And my output is
Epoch 1/10
2022-08-01 12:04:08.656012: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8401
2022-08-01 12:04:09.335957: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2022-08-01 12:04:09.431082: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
613/613 - 7s - loss: 466.5601 - val_loss: 17.3872 - 7s/epoch - 12ms/step
Epoch 2/10
613/613 - 5s - loss: 217.4277 - val_loss: 7.7309 - 5s/epoch - 8ms/step
Epoch 3/10
613/613 - 5s - loss: 167.2855 - val_loss: 6.0742 - 5s/epoch - 9ms/step
Epoch 4/10
613/613 - 6s - loss: 130.8230 - val_loss: 1.9557 - 6s/epoch - 10ms/step
Epoch 5/10
613/613 - 6s - loss: 112.1165 - val_loss: 1.1561 - 6s/epoch - 10ms/step
Epoch 6/10
613/613 - 5s - loss: 101.3152 - val_loss: 0.6442 - 5s/epoch - 8ms/step
Epoch 7/10
613/613 - 5s - loss: 93.3648 - val_loss: 0.4150 - 5s/epoch - 8ms/step
Epoch 8/10
613/613 - 5s - loss: 87.1542 - val_loss: 0.2232 - 5s/epoch - 8ms/step
The loss function for both is
def loss_func(encoder_mu, encoder_log_variance):
def vae_reconstruction_loss(y_true, y_predict):
reconstruction_loss = tf.math.reduce_sum(tf.math.square(y_true-y_predict), axis=[1, 2, 3])
return reconstruction_loss
def vae_kl_loss(encoder_mu, encoder_log_variance):
kl_loss = -0.5 * tf.math.reduce_sum(1.0 + encoder_log_variance - tf.math.square(encoder_mu) - tf.math.exp(encoder_log_variance),
axis=1)
return kl_loss
def vae_loss(y_true, y_predict):
reconstruction_loss = vae_reconstruction_loss(y_true, y_predict)
kl_loss = vae_kl_loss(y_true, y_predict)
loss = reconstruction_weight*reconstruction_loss + kl_weight*kl_loss
return loss
return vae_loss
and model is compiled with
VAE.compile(optimizer=tfk.optimizers.Adam(learning_rate=learning_rate),
loss=loss_func(encoder_mu_layer, encoder_log_variance_layer))