0

Tacotron TTS models (e.g. Tacotron 2 and Parallel Tacotron 2) were trained on 25h and 405h of speech data respectively. By comparison, more recent TTS systems are trained on >50,000h of speech data. Why were Tacotron models trained on such a relatively small volume of data?

Nik
  • 1
  • Could the answer just be when they were trained? Bigger models and input data sets require more resources. So this may follow a pattern analogous to Moore's law. – Bruce Adams Jul 11 '23 at 16:26

0 Answers0