This paper describes a Chinese text-to-visual speech synthesis system based on data-driven (sample based) approach, which is realized by short video segments concatenation.
给出一个基于数据驱动方法(基于样本方法)的汉语文本-可视语音合成系统,通过将小段视频拼接生成新的可视语音。
To achieve the natural synthesized speech, the prosodic structure of input text should be precisely predicted by the text to speech Synthesis system.
为提高合成语音的自然度,需要知道准确的合成文本的韵律结构。
To achieve the natural synthesized speech, the prosodic structure of input text should be precisely predicted by the text to speech Synthesis system.
为提高合成语音的自然度,需要知道准确的合成文本的韵律结构。
应用推荐