I have a mixture of real (float) and categorical features to use as input in a neural network. I encode the categorical features using one-hot / multi-hot encoding.
If I want to use all the features as input what is usually/empirically the best practice:
Concatenating all the features - sparse one-hot/multi-hot vectors and float values features - in one vector which is part dense part sparse and using this as input, or
Splitting the sparse one-hot/multi-hot vectors from the dense features and using an extra separate layer for the sparse features to make them dense before concatenating them with the other already dense features.
Same as 2 but maybe using a separate layer for the dense features too so we concatenate "embeddings" instead of features and embeddings.
What, in your experience / opinion, should I do, trial-and-error aside?