1

I'm trying to have a simple autoencoder but with variable latent length (the network can produce variable latent lengths with respect to the complexity of the input), but I've not seen any related work to get idea from. Have you seen any related work? Do you have any idea to do so?

Actually, I want to use this autoencoder for transmitting the data over a noisy channel, so having a variable-length may help.

nbro
  • 39,006
  • 12
  • 98
  • 176
amin
  • 420
  • 2
  • 10
  • 1
    What do you mean with complexity of the input? What is your input? You mean something like texts and the input complexity is the number of words of your texts? – ddaedalus Oct 18 '20 at 13:46
  • it can be text and in that case the number of words is a good measure of complexity. I've seen an example of doing such a thing in [this](https://web.stanford.edu/~milind/papers/variablesc_spawc.pdf) paper but I think its not a good way to do so. I want a little more general idea, where the model can understand the complexity itself and determine the length of latent with respect to that – amin Oct 18 '20 at 17:05

2 Answers2

1

If you use RNNs, then I think the solution is to use padding (zero padding) with max sequence length (that is the max number of words in a text) in order to tell your model to skip the zeros when possible. In that way, your model will try to learn a good representation of your input with fixed size. If you do not know this dimension, a solution may be to grid search this hyperparameter.

If you still want to exploit the dimensionality difference, maybe you can train different models with fixed size dimension of the representation dependently of the dimension of the input. That is, for example, use one for small, one for medium and one for large dimensions, but this should surely require to have a large and quite balanced initial dataset.

Another idea could be to use the autoencoder with a fixed latent dimension. Then, do effective clustering on your samples using their latent representation, considering that similar representations should have similar dimensionality requirements (?). After that, you could train your initial dataset on k models, the same number as the clusters. That is, there should be k different latent spaces. The goal is to match each instance to the correct model. At first, you should train them all with each instance, but as the training progresses, you should maybe go with binary search for each instance in order it to find the correct model, assuming that there is total order in the measuring of the dimensionality requirements. Of course, this is just an idea, I don't know if it is going to be really helpful at all.

ddaedalus
  • 919
  • 1
  • 6
  • 21
  • Unfortunately I can't understand your first paragraph, do you mean zero-padding the latent? can you explain more? – amin Oct 18 '20 at 19:42
  • No! In the input space in order to deal with the incosistent dimensionality. – ddaedalus Oct 18 '20 at 19:46
  • Ok, but how it helps with variable-length latent vector? What I understood from this paragraph will help to have different length input (which RNNs can do with no problem at all) – amin Oct 18 '20 at 20:00
  • Your second paragraph is the way they have done in the paper I mentioned before (this one). But training a model for each latent length is not what I want, because I want the network (not me) to measure the complexity of input (useful when the input is image) and be able to produce any latent length (not those I predetermined) – amin Oct 18 '20 at 20:00
  • Check my post. I fixed it. – ddaedalus Oct 18 '20 at 20:31
-1

You might want to look at an encoder-decoder sequence to sequence model. This model allows you to input and output data with variable length.

Tom Dörr
  • 393
  • 1
  • 3
  • 7
  • Maybe I asked my question in a bad way. The length of input or output doesn't matter to me (and they can be both fixed). But I want the network to produce variable length latent. Like if it see a complex image, it should use more latent units to reconstruct it, but if the image of the same size is simple (e.g. mostly black), the network should decide to use less latent units and it will be able to get almost the same reconstruction error. – amin Oct 18 '20 at 17:12