Audio Samples for RTVC-6 Voice Cloning Model

Repository: blue-fish/Real-Time-Voice-Cloning (v1.0)

Description: RTVC-6 and subsequent models use a synthesizer based on the original Tacotron architecture described in the paper "Tacotron: Towards End-to-End Speech Synthesis" with modifications for voice cloning. Like RTVC-5, the model is trained on mic1 recordings from the VCTK dataset. To replicate the Google paper better, all speaker IDs ending in zero are held out of the training set.

Click here for more voice cloning experiments.

RTVC-6 Model Overview

Name Model Steps Batch Size Datasets Used Speakers Audio Duration
Speaker Encoder: Pretrained GE2E 1,564,501 64 LibriSpeech train-other-500
VoxCeleb1 Dev A-D
VoxCeleb2 Dev A-H
8371 3201 hours
Synthesizer: VCTK_Taco1_254k Tacotron 1 254,000 12 VCTK 109 44 hours
Vocoder: VCTK_GT_733k WaveRNN 733,000 80 VCTK 109 44 hours

Voice Cloning Results

All speakers are unseen during training. The first row is the reference audio used to compute the speaker embedding. The rows below that are synthesized using that speaker embedding.

VCTK p240VCTK p260LibriSpeech 1320LibriSpeech 3575LibriSpeech 6829LibriSpeech 8230
Reference:
 
Synthesized:
0: Take a look at these pages for crooked creek drive.
Google:
RTVC-5:
RTVC-6:
 
1: There are several listings for gas station.
Google:
RTVC-5:
RTVC-6:
 
2: Here's the forecast for the next four days.
Google:
RTVC-5:
RTVC-6:
 
3: Here is some information about the Gospel of John.
Google:
RTVC-5:
RTVC-6:
 
4: His motives were more pragmatic and political.
Google:
RTVC-5:
RTVC-6:
 
5: She had three brothers and two sisters.
Google:
RTVC-5:
RTVC-6:
 
6: This work reflects a quest for lost identity, a recuperation of an unknown past.
Google:
RTVC-5:
RTVC-6:
 
7: There were many editions of these works still being used in the nineteenth century.
Google:
RTVC-5:
RTVC-6:
 
8: Modern birds are classified as coelurosaurs by nearly all palaeontologists.
Google:
RTVC-5:
RTVC-6:
 
9: He was being fitted for ruling the state, in the words of his biographer.
Google:
RTVC-5:
RTVC-6: