1

I'm looking for a good summary of the state of tools for making AI generated videos of people talking. This can be no-code, low-code, or running/tweaking actual code.

Here are some use-cases. I would like to know, for each of these, what tools exist, what tools might exist soon, etc. I'm using Donald Trump as the example, but it could be any person.

  1. Start with a video of me talking and making facial expressions, and create a video with Donald Trump's face making the same expressions and mouth movements, and still with my voice. (It looks like https://github.com/deepfakes/faceswap does this well.)

  2. Like #1, but with Donald Trump's voice instead of mine. (I looked at voice.ai for this, but it didn't have great quality, and also required you to do a fairly good impression of the person to start with. It also has what seems like a flaky/buggy UI. Plus I'd like an opensource tool. I also looked at fakeyou.com and wasn't able to get an example without paying.)

  3. Write a script, perhaps including some direction for facial or hand expressions, and create a video where Donald Trump speaks the script in his voice, with his cadence/speaking style, and ideally with his corresponding facial and hand mannerisms / body language.

I did do a bunch of googling for this stuff. I got buried in an avalanche of marketing speak and hucksterism. I'm looking for a reliable index of tools that actually work. Naively, based on how well visual training works, I would think that a model could be trained on lots of Donald Trump talking videos, and then you could give it a script and it would do the voice, cadence, mannerisms, expressions. I did read that voice is just harder than video, because of the many overlayed frequencies. Does that make #3 impossible currently or in the near future?

M Katz
  • 111
  • 2

0 Answers0