Recently my friend asked me a question: having two input matrices X and Y (each size NxD) where D >> N, and ground truth matrix Z of size DxD, what deep architecture shall I use to learn a deep model of this representation?
- N ~ is in the order of tens
- D ~ is in the order of tens of thousands
The problem is located in the domain of bioinformatics, however, this is more of an architectural problem. All matrices contain floats.
I tried first a simple model based on a CNN model in keras. I've stacked input X and Y into an Input Matrix of size (number of training examples, N, D, 2). Outputs are of size (number of training examples, D, D, 1)
- Conv2D layer
- leaky ReLU
- Conv2D layer
- leaky ReLU
- Dropout
- Flattening layer
- Dense (fully connected) of size D
- leaky ReLU
- droout
- Dense (fully connected) of size D**2 (D squared)
- leaky ReLU
- droout
- Reshaping output into (D,D,1) (for single training set)
However, this model is untrainable. It has over billion parameters for emulated data.
(Exactly 1,321,005,944 for my randomly emulated dataset)
Do you find this problem solvable? What other architectures I might try to solve this problem?
Best.