I generated a bunch of simulation data from a complex physical simulation that spits out patterns. I am trying to apply unsupervised learning to analyze the patterns and ideally classify them into whatever categories the learning technique identifies. Using PCA or manifold techniques such as t-SNE for this problem is rather straightforward, but applying neural networks (autoencoders, specifically) becomes non-trivial, as I am not sure splitting my dataset into test and training data is the right way.
Naively, I was thinking of the following approaches:
Train an autoencoder with all the data as training data and train it for a large number of epochs (overfitting is not a problem in this case perse I would think)
Keras offers a
model.predict
option which enables me to just construct the encoder section of the autoencoder and obtain the bottleneck valuesCarry out some data augmentation and split the data as one might into training and test data and carry out the workflow as normal (This approach makes me a little uncomfortable as I am not attempting to generalize a neural network or should I be?)
I would appreciate any guidance on how to proceed or if my understanding of the application of autoencoders is flawed in this context.