A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models - podcast episode cover

A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models

Jul 22, 2025•20 min•Ep. 988
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

🤗 Upvotes: 42 | cs.CL, cs.SD, eess.AS

Authors:
Kirill Borodin, Nikita Vasiliev, Vasiliy Kudryavtsev, Maxim Maslov, Mikhail Gorodnichev, Oleg Rogov, Grach Mkrtchian

Title:
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models

Arxiv:
http://arxiv.org/abs/2507.13563v1

Abstract:
Russian speech synthesis presents distinctive challenges, including vowel reduction, consonant devoicing, variable stress patterns, homograph ambiguity, and unnatural intonation. This paper introduces Balalaika, a novel dataset comprising more than 2,000 hours of studio-quality Russian speech with comprehensive textual annotations, including punctuation and stress markings. Experimental results show that models trained on Balalaika significantly outperform those trained on existing datasets in both speech synthesis and enhancement tasks. We detail the dataset construction pipeline, annotation methodology, and results of comparative evaluations.

For the best experience, listen in Metacast app for iOS or Android