Serbian TTS

Implementation Time:
months
Solution Provider: Act Digital Serbia

 

Welcome to vsk.ai, your gateway to open-source Text-to-Speech technology tailored for the rich and diverse Serbian language. Join us in bringing Vuk Stefanović Karadžić’s legacy to the digital age as we empower Serbian speakers to give voice to their words, naturally and authentically.

https://actserbia.github.io/vsk.ai/

 

VITS (Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech ) is an End-to-End (encoder -> vocoder together) TTS model that takes advantage of SOTA DL techniques like GANs, VAE, Normalizing Flows. It does not require external alignment annotations and learns the text-to-audio alignment using MAS, as explained in the paper. The model architecture is a combination of GlowTTS encoder and HiFiGAN vocoder. It is a feed-forward model with x67.12 real-time factor on a GPU.

YourTTS is a multi-speaker and multi-lingual TTS model that can perform voice conversion and zero-shot speaker adaptation. It can also learn a new language or voice with a ~ 1 minute long audio clip. This is a big open gate for training TTS models in low-resources languages. YourTTS uses VITS as the backbone architecture coupled with a speaker encoder model.

PDF Document

Video link

Solution Diagram

Contact Us
The enquirer name
Company Name

Implementation Time

months

Diagram of Solution

Use Case Brochure