CB-25 Enhancing Text-to-Speech Synthesis with Topological Data Analysis and Deep Neural Networks
SCURS Disciplines
Computer Sciences
Document Type
Poster Presentation
Abstract
Text-to-speech (TTS) systems have significantly advanced with deep neural networks (DNNs), yet challenges remain in achieving natural and high-quality speech synthesis. This paper presents a novel DNN-based TTS model that incorporates topological data analysis (TDA) to enhance output quality. Traditional TTS models often struggle with prosody, pronunciation variability, and smooth transitions between phonemes, leading to unnatural speech synthesis. Our approach leverages topological structures to better capture the underlying patterns of speech waveforms, improving both clarity and expressiveness.
By integrating TDA in the feature extraction and synthesis stages, our model identifies persistent homological features within speech representations, allowing for a more structured understanding of phonetic and prosodic variations. The architecture combines convolutional and recurrent neural networks, optimizing both short-term and long-term speech dependencies. Experimental evaluations on benchmark datasets demonstrate that our topology-enhanced TTS model outperforms conventional architectures in terms of speech intelligibility, naturalness, and smoothness. Subjective listening tests and objective metrics indicate reduced artifacts, improved tone variations, and better generalization across different speakers and linguistic styles.
Our findings highlight the potential of incorporating topology into TTS frameworks, providing a novel perspective for improving speech synthesis quality. This research paves the way for more expressive, human-like synthetic speech and opens new directions for enhancing neural TTS systems with advanced mathematical techniques.
Keywords
text-to-speech, machine learning, deep neural network
Start Date
11-4-2025 9:30 AM
Location
University Readiness Center Greatroom
End Date
11-4-2025 11:30 AM
CB-25 Enhancing Text-to-Speech Synthesis with Topological Data Analysis and Deep Neural Networks
University Readiness Center Greatroom
Text-to-speech (TTS) systems have significantly advanced with deep neural networks (DNNs), yet challenges remain in achieving natural and high-quality speech synthesis. This paper presents a novel DNN-based TTS model that incorporates topological data analysis (TDA) to enhance output quality. Traditional TTS models often struggle with prosody, pronunciation variability, and smooth transitions between phonemes, leading to unnatural speech synthesis. Our approach leverages topological structures to better capture the underlying patterns of speech waveforms, improving both clarity and expressiveness.
By integrating TDA in the feature extraction and synthesis stages, our model identifies persistent homological features within speech representations, allowing for a more structured understanding of phonetic and prosodic variations. The architecture combines convolutional and recurrent neural networks, optimizing both short-term and long-term speech dependencies. Experimental evaluations on benchmark datasets demonstrate that our topology-enhanced TTS model outperforms conventional architectures in terms of speech intelligibility, naturalness, and smoothness. Subjective listening tests and objective metrics indicate reduced artifacts, improved tone variations, and better generalization across different speakers and linguistic styles.
Our findings highlight the potential of incorporating topology into TTS frameworks, providing a novel perspective for improving speech synthesis quality. This research paves the way for more expressive, human-like synthetic speech and opens new directions for enhancing neural TTS systems with advanced mathematical techniques.