Repository: blue-fish/Real-Time-Voice-Cloning (v1.0)
Description: RTVC-7 uses the same training approach as RTVC-4. This allows comparison of voice similarity on Tacotron 1 and 2. After training on LibriSpeech, the synthesizer is finetuned for 10k steps on LibriTTS so the model responds appropriately to punctuation. The samples here still use a vocoder that is trained on ground truth-aligned (GTA) mels from the RTVC-1 synthesizer.
Click here for more voice cloning experiments.
Name | Model | Steps | Batch Size | Datasets Used | Speakers | Audio Duration | |
---|---|---|---|---|---|---|---|
Speaker Encoder: | Pretrained | GE2E | 1,564,501 | 64 | LibriSpeech train-other-500 VoxCeleb1 Dev A-D VoxCeleb2 Dev A-H |
8371 | 3201 hours |
Synthesizer: | LS_Taco1_295k | Tacotron 1 | 285,000 | 12 | LibriSpeech train-clean-100 LibriSpeech train-clean-360 |
1172 | 436 hours |
10,000 | 12 | LibriTTS train-clean-100 LibriTTS train-clean-360 |
1172 | 245 hours | |||
Vocoder: | LS_GTA_1159k | WaveRNN | 1,159,000 | 50 | LibriSpeech train-clean-100 LibriSpeech train-clean-360 |
1172 | 436 hours |
All speakers are unseen during training. The first row is the reference audio used to compute the speaker embedding. The rows below that are synthesized using that speaker embedding.
VCTK p240 | VCTK p260 | LibriSpeech 1320 | LibriSpeech 3575 | LibriSpeech 6829 | LibriSpeech 8230 | |
---|---|---|---|---|---|---|
Reference: | ||||||
Synthesized: | ||||||
0: Take a look at these pages for crooked creek drive. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
1: There are several listings for gas station. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
2: Here's the forecast for the next four days. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
3: Here is some information about the Gospel of John. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
4: His motives were more pragmatic and political. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
5: She had three brothers and two sisters. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
6: This work reflects a quest for lost identity, a recuperation of an unknown past. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
7: There were many editions of these works still being used in the nineteenth century. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
8: Modern birds are classified as coelurosaurs by nearly all palaeontologists. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: | ||||||
9: He was being fitted for ruling the state, in the words of his biographer. | ||||||
Google: | ||||||
RTVC-4: | ||||||
RTVC-7: |