There are many resources available for text-to-audio (or vice versa) synthesis, for example Google's 'Wavenet'.
These tools do not allow the finer degree of control that may be required regarding the degree of inflections / tonality retained in output. For example to change vocal characteristics (Implied Ethnicity / Sexbfor example) of a dubbed voice over from one voice whilst retaining tonality (Shouting vs calm).
Text-to-speech 'and back' seems a suboptimal approach due to data loss (e.g. tonality) before reconstruction.
Re-encoding audio-to-audio would/may allow the alteration of characteristics in a manner not available via standard audio processing methods whilst retaining more of the desired tonality.
Is AI able to distinguish between characteristics and tonality as implied above and is such a speach-speach re-encoding tool available, ideally open source?