This is quite a specific task and there is a solution: CycleGAN.
Of course, DALL-E, GPT-4 and similar systems can also do it, but if you are looking for the "how?", CycleGAN is specifically designed to solve this task and it is more st to understand.
How does CycleGAN work?
CycleGAN is specifically trained on two domains, e.g. Photos and Van Gogh Paintings. CycleGAN consists of 4 Neural Networks: One that transforms images of the first domain (e.g. photos) into images of the second domain (e.g. Van Gogh-Paintings). A second network does the backward transformation (e.g. Van Gogh --> Photo).
To ensure, that a transformed image still represents the same content, CycleGAN has Cyclic Objectiv that requires an image that is transformed forward and backward (e.g. Photos --> Van Gogh --> Photo) to be close to the original Photo. The same should also hold for backward-forward-transformations.
Two networks serve as discriminators that ensure that a transformed image look like an image of the other domain (e.g. a transformed photo looks like a Van Gogh).
For further readings:
How Good is CyclicGAN:
These Examples are taken from the Paper:
