We can say that matrix factorization of a matrix $R$, in general, is finding two matrices $P$ and $Q$ such that $R \approx P.Q^{T}$ with some constraints on $P$ and $Q$. Looking at some matrix factorization algorithms on the internet like Scikit-Learn's Non-Negative Matrix Factorization I come to wonder how this works for recommendation systems. Generally with recommendation systems we have a user-item ratings matrix, let's denote it $R$, which is really sparse so when we look at datasets we find missing values, $NaN$. When I look at examples of using matrix factorization for recommender systems I find that the missing values are replaces with $0$. My question is, how do we get actual predictions on the items non rated by users when the dot product $P.Q^{T}$ is supposed to converge to $R$?
I have tried with this simple matrix that I found here
R = [
[5,3,0,1],
[4,0,0,1],
[1,1,0,5],
[1,0,0,4],
[0,1,5,4],
]
R = np.array(R)
The algorithm I used is Scikit-Learn's and no matter how I change the parameters, I can't seem to find a matrix that has actual values in place of $0$s. It always finds a really good approximation of $R$. Maybe all the hyperparameter tuning I'm doing is leading to overfitting, and let's suppose there is a set of combination of parameters for which we don't have $0$s and still we minimize $||R-P.Q^{T}||$ with regard to some norm to a decent level, how can we be sure that the predictions are accurate? I mean, there must be many different combinations of parameters that ensure both prediction different values for the $0$s and minimizing $||R-P.Q^{T}||$ to a decent level.
Thank you!