Deep NN architecture for predicting a matrix from two matrices

Question

Recently my friend asked me a question: having two input matrices X and Y (each size NxD) where D >> N, and ground truth matrix Z of size DxD, what deep architecture shall I use to learn a deep model of this representation?

N ~ is in the order of tens
D ~ is in the order of tens of thousands

The problem is located in the domain of bioinformatics, however, this is more of an architectural problem. All matrices contain floats.

I tried first a simple model based on a CNN model in keras. I've stacked input X and Y into an Input Matrix of size (number of training examples, N, D, 2). Outputs are of size (number of training examples, D, D, 1)

Conv2D layer
- leaky ReLU
Conv2D layer
- leaky ReLU
Dropout
Flattening layer
Dense (fully connected) of size D
- leaky ReLU
- droout
Dense (fully connected) of size D**2 (D squared)
- leaky ReLU
- droout
Reshaping output into (D,D,1) (for single training set)

However, this model is untrainable. It has over billion parameters for emulated data.

(Exactly 1,321,005,944 for my randomly emulated dataset)

Do you find this problem solvable? What other architectures I might try to solve this problem?

Best.

Is there any relation between X and Y through Z? Perhaps X1*Z = Y1? — Jaden Travnik, Feb 26 '18 at 23:06
There is no known relationship between X and Y. We only know that the output matrix Z is symmetrical, therefore we could predict the "upper/lower triangle" of it. — vaxherra, Feb 27 '18 at 17:36
Is there any other relationships between the matrices? If Z isnt a transformation is it part of a decomposition? Although the relationship between X and Y isnt know, is the goal to find this relationship? What should Z be representing? — Jaden Travnik, Feb 27 '18 at 18:54
Just to be clear, you have an output (and ground truth) of at least 10,000 x 10,000, or 100 million elements, and you would like to generate it to match an observed distribution over 20 x 10,000, or 200 thousand elements? How many training examples do you have? — Neil Slater, Feb 28 '18 at 07:49
Are there any simplifying assumptions that are reasonable to make in the problem domain? The big ones that may help would be sparsity (only some small percentage of inputs and outputs are populated, or different from some mean distribution), or smoothness (the 10,000 x 10,000 output could be approximated reasonably by e.g. a 1,000 x 1,000 output, and maybe has similar properties to a real-world image) — Neil Slater, Feb 28 '18 at 10:23
@NeilSlater thanks for your questions. My friend responded, and said that he has millions of training examples. In addition, the output/truth is 49,995,000 values and it is symmetrical matrix. It could be approximated, as he states, perhaps by a factor of 1000 and that would be still reasonable — vaxherra, Feb 28 '18 at 16:38
I understand from that the matrix is smooth, perhaps representing a grid of evaluations over a complicated but smooth function of the inputs? That's a lot of data that your friend has. Millions of records, each of which contains ~ 25M numerical values. We're taking about a dataset size getting up to a Petabyte here . . . not something I have any experience of. If an approximation with downsampled outputs (and maybe inputs) might have value, I suggest start with that as your model. Reduce the grid sizes as much as you can to start, see if that model would be useful. — Neil Slater, Feb 28 '18 at 17:35

Deep NN architecture for predicting a matrix from two matrices

0 Answers0