I have a fully connected network that takes in a variable-length input padded with 0.
However, the network doesn't seem to be learning and I am guessing that the high number of zeros in the input might have something to do with that.
Are there solutions for dealing with padded input in fully connected layers or should I consider a different architecture?
UPDATE (to provide more details):
The goal of the network if to clean full file paths: i.g.:
/My document/some folder/a file name.txt > a file name
/Hard drive/book/deeplearning/1.txt > deeplearning
The constraint is that the training data labels have been generated using a regex on the file name itself so it's not very accurate.
I am hoping that by treating every word equally (without sequential information) the network would be able to generalize as to which type of word is usually kept and which is usually discarded.
Then network takes in a sequence of word embedding trained on paths data and outputs logits that correspond to probabilities of each word to be kept or not.