There are many elements to be considered. To argue the answer, we can use mathematical logic reasoning.
Let us assume a propositional logic (PL) context without loss of generality.
First, we rephrase the automatic theorem proving (ATM) in the Boolean Satisfiability (SAT) problem: is there an interpretation (or, more broadly, a model) $M$ for a PL formula $\varphi$ such that $M$ satisfies $\varphi$, namely, $M \models \varphi$? The answer to such a question can be yes or no. Intuitively, $\varphi$ represents the statement, and $M$ is its proof. Observe that it is a top-down approach. The Turing award Stephen Cook proved it to be $\mathbf{NP}$-complete, a remarkable result in theoretical computer science. Therefore, ATM is a hard problem, and this is its mathematical beauty.
Second, machine learning (ML) approaches approximate complex problems, but they are not exact, which, in turn, is what an ML expert would want. Why? From a philosophical and practical point of view, having a 100% accurate ML model for a specific problem implies that it overfitted the original data, eventually leading to catastrophic scenarios in production (e.g., data distribution shift issues). Note that it is a bottom-up approach. We can use a similar argument to the previous one as follows. Let $\mathcal{M}$ represent the data, that is, a set of instances.1 Then, ML tries to statistically infer a PL theory $\Phi$ from $\mathcal{M}$, which is a set of PL formulas.2 The quintessential representative of such a problem is decision tree learning. To have a duality with the first argument, we can say $\mathcal{M} \models_\theta \Phi$, where $\theta$ is a set of parameters (e.g., how to assess the truth of multiple models against multiple formulas statistically).
All in one, it is plausible to believe that an ML model will perform some form of mathematical reasoning, but, at the time of writing, not in terms of ATM because the proof (of a theorem) lasts forever. We should embrace technologies like ChatGPT with caution because it produces different answers to the same input. However, a theorem can have multiple proofs, an argument for accepting the "randomness" in ChatGPT's answers. Nevertheless, IMHO, we are still far from having an ML-based ATM.
1 Note the different notation, $\mathcal{M}$, with respect to the previous one, $M$.
2 Similarly, note the different notation, $\Phi$, with respect to the previous one, $\varphi$.