0

I am making a custom Pytorch model that at some point, clusters a latent space that was created by another, previous routine of the model (Autoencoder).

In a bit more detail, my model is a regular Autoencoder, but in every training step, I want to perform a clustering of the representations in the latent space Z, and save the clustering algorithm inside the model to use after training, ONLY for inference (to make certain kind of predictions).

I thought about defining an sklearn model as an instance of the nn.Module as

class Model(Module):
    def __init__(self, hparams):
        super().__init__()
        self.clustering_model = sklearn.clustering_model()

This way, I could call the clustering model to fit the latent space Z, after calculating the reconstruction loss as a regular autoencoder.

After training, I would then have in the variable model.clustering_model a trained clustering model that I could use for inference.

However, this doesn't seem to work out, probably because of incompatibilities between sklearn and Pytorch. Basically, it seems that training the clustering_model and saving it inside the pytorch model didn't actually save any trained weights.

I'm right now unsure whether it didn't save weights because it actually wasn't trained (by an error on my side) or if it is that PyTorch just can't save a sklearn trained model as a variable in a nn.Module.

Did anyone try before to implement sklearn inside a nn.Module?

  • I've not used Pytorch before, but [skorch](https://github.com/skorch-dev/skorch) might be a helpful library for you. – LittleLulatsch Apr 13 '23 at 15:44
  • @LittleLulatsch I've checked! Sadly the library is made to transport Pytorch to sklearn, and not the other way around. – puradrogasincortar Apr 13 '23 at 21:27
  • It is indeed possible, yet It may be trickier than just add the clustering code to the forward method of autoencoder. Are there any particular reasons to not do it? forward method also allows to return several values. – Ciodar Apr 14 '23 at 06:31
  • @Ciodar Do you mean that they should be saved if they are added to the forward method instead of just the class? I'm unsure about which is the criteria of Pytorch to save information about a model instance. It seems that Pytorch will only save weights that were trained (i.e. that were part of the loss), but my clustering algorithm wouldn't enter into the pytorch loss calculation. – puradrogasincortar Apr 14 '23 at 08:03
  • Maybe I misunderstood the first time I read that. Do you want to compute a latent representation $z_k$ for each of your input samples, and then cluster all latent representations into a latent subspace? In this case, I think the simpler solution is to split these two problems, storing all output latent spaces and performing a clustering of those. – Ciodar Apr 14 '23 at 08:17
  • Once you have computed the centroids, you can save your clustering params and restore them. I'm kinda wary to actually introduce Kmeans parameters as module parameters, since they should not be added into the computational graph (Kmeans is not typically trained with GD) – Ciodar Apr 14 '23 at 08:23
  • 1
    One solution to have them in your custom Pytorch model is to register a [buffer](https://stackoverflow.com/questions/59620431/what-is-a-buffer-in-pytorch) where you store your clustering parameters, and define a procedure to save/restore the clustering weights from the buffer. Then you can call the clustering algorithm (maybe only during prediction) using a custom method or also in your forward. (I personally would call a custom method) – Ciodar Apr 14 '23 at 08:36
  • @Ciodar Indeed a nn.parameters is what I need to tell Pytorch to save those weights. Thanks! – puradrogasincortar Apr 14 '23 at 11:17
  • 1
    I would use buffers instead, as specified in the link above and also [here](https://stackoverflow.com/questions/57540745/what-is-the-difference-between-register-parameter-and-register-buffer-in-pytorch/57546078#57546078). Anyways, I will transform these comments in an answer if they fit your needs. – Ciodar Apr 14 '23 at 12:02
  • Programming questions (which includes questions related to specific libraries) are off-topic here. Please, ask them at Stack Overflow or Data Science SE. See our on-topic page. We focus on the theoretical aspects of AI. – nbro Apr 17 '23 at 12:03

1 Answers1

1

Yes, you can define a totally custom Model, maybe with a clustering method you can call after forward only during inference. Clustering parameters (e.g centroids in Kmeans) can be stored inside a buffer (see here and here)

Clustering is not typically optimized with Gradient Descent, so clustering parameters should not require gradient, and you need to pay attention to not include the clustering step into your computational graph.

I would implement this pipeline:

  1. Autoencoder training: Train your AutoEncoder as usual, and save the model's checkpoint.
  2. Feature extraction: Extract all latent codes using your trained autoencoder, then save them onto disk into a numpy array or a tensor.
  3. Clustering: Load all latent codes and apply a clustering algorithm (e.g Kmeans) and save all relevant parameters (centroids) onto a buffer .
  4. Inference: Load trained autoencoder and centroids. Both parameters and buffers are into model's state_dict, so they can be loaded with the load_state_dict method. Pass new data into the encoder, then cluster the latent representation onto the closest centroid

By using a modular approach, you can experiment different clustering methods without losing all the features (saving additional training), and then put all together once you're satisfied with the model.

Ciodar
  • 242
  • 8