And that's all, we can infer P(x|y=c) and P(c) from the data. I don't see where the MLE shows its role.
Maximum likelihood estimate is used for this very purpose, i.e. to estimate the conditional probability $p(x_j \mid y)$ and marginal probability $p(y)$ .
In Naive Bayes Algorithm ,using the properties of conditional probability, we can estimate the joint probability
$$
p(y, x_1, x_2, \dots, x_n) = p(x_1, x_2, \dots, x_n \mid y) \; p(y)
$$
which is the same as
$$
p(y=c|x)\propto p(x|y=c)p(y=c)
$$
What we need to estimate in here, are the conditional $p(x_j \mid y)$ and marginal $p(y)$ probabilities, and we use maximum likelihood for this.
To be more precise: Assume that the $x$ is a feature vector $(x_1, x_2, ...x_n)$ - which means nothing more than the data point is a vector. In Naive Bayes Classification the assumption is that these features are independent of each other conditional on the class. So, in that case this term
$$p(x|y=c)$$ is replaced by $$\prod_{i=1}^{n} p(x_i|y=c)$$ -- this is because of the independence assumption that lets us simply multiply the probabilities.
The problem of estimating the class to which the data point belongs reduces to the problem of maximizing the likelihood $$\prod_{i=1}^{n} p(x_i|y=c)$$ -- which means assigning the data $(x_1, x_2, ...x_n)$ to the class $k$ for which this likelihood is highest. This will give the same result as MLE which takes the form of maximizing this quantity -- $$\prod_{i=1}^{n} p(x_i|\theta)$$ where $\theta$ are the assumed parameters.