Neural networks with sparse inputs

Question

I have a task I want to solve with neural networks. The task is predicting a certain vector of dimension K. The problem is that the inputs to the networks are sparse.

The input is a vector of size N, where N is huge (> 1M) and for most cases, the vast majority of the entries (> 99%) in the input vector are 0. Very rare examples however do have almost full inputs. It's clear that the model is hard to train, as there might be huge weight imbalance within examples, while the target vectors need not have such an imbalance.

I do have a working model, but could you point me to some papers / relevant literature about training a network whose inputs are so sparse? Perhaps there are some techniques that could be useful (maybe some preprocessing steps on the inputs, or something along those lines).

Any hint is appreciated!

Can you please put your **specific** question in the title? "Neural networks with sparse inputs" is not a question and it's also not specific enough. — nbro, Apr 06 '22 at 10:37
What is the percentage of fully populated inputs? Can they be thrown away? Did you try usual dimensionallity reduction techniques? Did you try embeddings like in language processing? What else did you try? What kind of network is your working model? — Vladislav Gladkikh, Apr 22 '22 at 07:12
I am not an expert in sparse computation, but [this Pytorch tutorial](https://pytorch.org/docs/stable/sparse.html) might be a great start. — Minh-Long Luu, Jan 25 '23 at 16:40

score 0 · Answer 1 · answered Apr 06 '22 at 02:31

What I did when I encountered sparse inputs was to preprocess the data to only include vectors with more than n (threshold) nonzero values.

Not really an elegant solution (since you give up a huge part of your data), but you can try tweaking the threshold for best results.

score 0 · Answer 2 · answered Aug 26 '22 at 16:25

If you’re writing your own training code, then you can optimize for sparse inputs at two points in the algorithm:

Forward-propagation from the input layer to any other layer;
Computing the error gradients of weights between the input layer and any other layer.

Both of these steps involve sums over all input samples, which can be turned into sums over just the nonzero samples. Of course, this requires that you give the input matrices a sparse representation, like a list of (input #, sample #, value) fields.

Unfortunately, even if the input layer is sparse, the hidden/output layers generally are not sparse if you have fully-connected layers, so I think that’s all the computational speed up that’s possible.

score -1 · Answer 3 · answered Aug 25 '22 at 12:41

-1

The default solution to handle sparse or one-hot encoded input data is to use embeddings(PyTorch Embeddings). The binary, n-dimensional input data is then transformed to a small dimensional(e.g., 4-128 dim) vector containing real values.

answered Aug 25 '22 at 12:41

a1o

1

Neural networks with sparse inputs

3 Answers3