Why was Tacotron trained on <1000h of speech data?

Asked Jul 11 '23 at 14:11

Active Jul 11 '23 at 14:11

Viewed 18 times

Tacotron TTS models (e.g. Tacotron 2 and Parallel Tacotron 2) were trained on 25h and 405h of speech data respectively. By comparison, more recent TTS systems are trained on >50,000h of speech data. Why were Tacotron models trained on such a relatively small volume of data?

asked Jul 11 '23 at 14:11

Nik

Could the answer just be when they were trained? Bigger models and input data sets require more resources. So this may follow a pattern analogous to Moore's law. – Bruce Adams Jul 11 '23 at 16:26

Why was Tacotron trained on <1000h of speech data?

0 Answers0