I found this really fascinating and wanted to share.
TorToise TTS
TorToiSe is a zero-shot multi-voice text-to-speech engine developed by neonbjb. The github covers how to install it and all the different ways its performance/output can be tweaked. https://github.com/neonbjb/tortoise-tts
The "zero-shot" refers to the fact that TorToiSe doesn't need training to generate outputs. Just give it some text and a few wav files and it's ready to go. The results are quite good for how modest the requirements are.
How does it work?
Provide some text you want turned into spoken speech
Provide some samples of voiced dialogue (wav files)
TorToiSe will synthesize a voice from the provided samples and read out the provided text using the mimicked voice
From a TK17 perspective, I see some interesting potential here with creating voiced dialogue that can be added to videos in post production.
Examples
The github itself contains some example outputs, but I also generated a few of my own:
Amy (Soul Calibur IV) "What's up, Klub Exile! I hope everybody is having a wonderful day."
Ivy (Soul Calibur IV) "What an absolutely despicable collection of perverts. Exactly what I was looking for."
Rachel (Dead or Alive 5) "I wonder if any of you fellas could show me a good time? If you can even manage to keep up with me that is."
Makoto Niijima (Persona 5) "It's important to eat three square meals a day and to get plenty of rest and exercise. Let's do our best to stay healthy."
Haughty Elf Voiceset (Skyrim) An excerpt from the Navy Seal Copypasta
I'm using SndUp to host these so they embed, I hope that's okay. If anyone knows of a better host for sharing small audio files I'm all ears.
Notes
The name TorToiSe is appropriate. This thing is quite slow at generating outputs. I actually switched over to using the following fork which has much better performance at the cost of being a pain to set up: https://github.com/152334H/tortoise-tts-fast
Using the fork, the above examples each took me <10 seconds to generate (the big one took about a minute). My GPU is a GTX 1080. I provided between 3 and 10 wav files as dialogue samples for each character, using content I downloaded from The Sounds Resource. It's a great place to find collections of ripped game audio.