What could happen if we wrongly assume that the POMDP is an MDP and do reinforcement learning with this assumption over the MDP?
It depends on a few things. The theoretical basis of reinforcement learning needs the state descriptions to have the Markov property for guarantees of convergence to optimal or approximately optimal solutions. The Markov property is a requirement that the state defines 100% of the controllable variation of reward and next state (given the action) - the rest must be purely stochastic.
An MDP can be "nearly Markov", and a lot of real-world physical systems are like that. For instance, pole-balancing and acrobot tasks can be implemented as physical systems using motors, wheels, joints etc. In those real systems, there are limits to accuracy of measurement of the state, and many hidden variables, such as variable temperature (affecting length of components), friction effects, air turbulence. Those hidden variables taken strictly by formal definition would make the system a POMDP. However, their influence compared to the key state variables is low, and in some cases effectively random from the perspective of the agent. In practice RL works well in the real physical systems, despite state data being technically incomplete.
In Atari games using multiple frame images as states, there are varying degrees of to which those states are already non-Markovian. In general a computer game's state may include many features that are not displayed on the screen. Enemies may have health totals or other hidden state, there can be timers controlling appearance of hazards, and in a large number of games the screen only shows a relatively small window into the total play area. However, the Deep Mind DQN network did well on a variety of scrolling combat and platform games.
One game where DQN did notably badly - no better than a default random player - was Montezuma's Revenge. Not only does that platform puzzler game have a large map to traverse, but it includes components where state on one screen affects results on another.
It is hard to make a general statement about where an MDP with missing useful state information would benefit from being treated as a POMDP more formally. Your question is essentially the same thing expressed in reverse.
The true answer for any non-trivial environment would be to try an experiment. It is also possible to make some educated guesses. The basis for those guesses might be the question "If the agent could know hidden feature x
from the state, how different would expected reward and policy be?"
For the breakout example using each single frame as a state representation, I would expect the following to hold:
Value estimates become much harder since seeing the ball next to a brick - compared to seeing a ball progressively get closer to a brick over 4 frames - gives much less confidence that it is about to hit that brick and score some points.
It should still be possible for the agent to optimise play, as one working strategy is to position the "bat" under the ball at all times. This will mean less precise control over angle of bounces, so I would expect it to perform worse than the four-frame version. However, it should still be significantly better than a default random action agent. A key driver for this observation is that seeing the ball close to the bottom of the screen, and not close to the bat, would still be a good predictor of a low expected future reward (even averaged over chances of ball going up vs going down), hence the controller should act to prevent such states occurring.