Tacotron TTS models (e.g. Tacotron 2 and Parallel Tacotron 2) were trained on 25h and 405h of speech data respectively. By comparison, more recent TTS systems are trained on >50,000h of speech data. Why were Tacotron models trained on such a relatively small volume of data?
Asked
Active
Viewed 18 times
0
-
Could the answer just be when they were trained? Bigger models and input data sets require more resources. So this may follow a pattern analogous to Moore's law. – Bruce Adams Jul 11 '23 at 16:26