Repository: CorentinJ/Real-Time-Voice-Cloning @5425557
Description: RTVC-5 uses a synthesizer trained on the mic1 recordings from the VCTK dataset. Speakers p240 and p260 are held out of the training set. Like RTVC-4, silence in the raw recordings has been removed using VAD. The vocoder is trained on ground truth mel spectrograms from the mic1 data.
Click here for more voice cloning experiments.
| Name | Model | Steps | Batch Size | Datasets Used | Speakers | Audio Duration | |
|---|---|---|---|---|---|---|---|
| Speaker Encoder: | Pretrained | GE2E | 1,564,501 | 64 | LibriSpeech train-other-500 VoxCeleb1 Dev A-D VoxCeleb2 Dev A-H |
8371 | 3201 hours |
| Synthesizer: | VCTK_Taco2_242k | Tacotron 2 | 242,000 | 12 | VCTK | 109 | 44 hours |
| Vocoder: | VCTK_GT_733k | WaveRNN | 733,000 | 80 | VCTK | 109 | 44 hours |
All speakers are unseen during training. The first row is the reference audio used to compute the speaker embedding. The rows below that are synthesized using that speaker embedding.
| VCTK p240 | VCTK p260 | LibriSpeech 1320 | LibriSpeech 3575 | LibriSpeech 6829 | LibriSpeech 8230 | |
|---|---|---|---|---|---|---|
| Reference: | ||||||
| Synthesized: | ||||||
| 0: Take a look at these pages for crooked creek drive. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 1: There are several listings for gas station. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 2: Here's the forecast for the next four days. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 3: Here is some information about the Gospel of John. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 4: His motives were more pragmatic and political. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 5: She had three brothers and two sisters. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 6: This work reflects a quest for lost identity, a recuperation of an unknown past. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 7: There were many editions of these works still being used in the nineteenth century. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 8: Modern birds are classified as coelurosaurs by nearly all palaeontologists. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||
| 9: He was being fitted for ruling the state, in the words of his biographer. | ||||||
| Google: | ||||||
| RTVC-4: | ||||||
| RTVC-5: | ||||||