search

UMD     This Site





Ganesh Sivaraman and Carol Espy-Wilson

Ganesh Sivaraman and Carol Espy-Wilson

 

A paper by Professor Carol Espy-Wilson (ECE/ISR) and her former student Ganesh Sivaraman (EE Ph.D. 2017), aims to improve speech inversion, the process of mapping acoustic signals into articulatory parameters. "Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion" has just been published in the Journal of the Acoustical Society of America (Vol.146, No.1).

Speech inversion is the process of mapping acoustic signals into articulatory parameters. Much work needs to be done in developing a robust speech inversion system, but if it could be made significantly more accurate, such a system could make a real impact on improving speech accent conversion, speech therapy, language learning, automatic speech recognition, and detection of depression from speech.

Differences among speakers typically makes speech inversion even harder. Normalizing these differences is essential to effectively using multi-speaker articulatory data for training a speaker-independent speech inversion system. It is essential to develop speech inversion systems that are speaker independent and can accurately estimate articulatory features for any speaker.

Espy-Wilson and Sivaraman aim to minimize speaker variability in the acoustic space attributed to vocal tract length differences between speakers for performing acoustic-to-articulatory inversion. Their goal is to normalize acoustic data from multiple speakers towards the acoustic space of a target speaker.

The researchers explored a vocal tract length normalization (VTLN) technique that could transform the acoustic features of different speakers to a target speaker acoustic space, minimizing speaker-specific details. The speaker-normalized features were then used to train a deep, feed-forward, neural-network-based speech inversion system.

The paper shows that data from multiple speakers can be normalized and combined to create better speaker-independent speech inversion systems. This approach can be extended to combine data from different articulatory datasets to create a single improved speech inversion system.

Sivaraman currently is a research scientist at Pindrop in Atlanta. Pindrop develops solutions for people and companies to authenticate each other via voice interactions, to increase security, identity, and trust for call centers and Internet of Things devices.



Related Articles:
Espy-Wilson is PI for NSF project to improve 'speech inversion' tool
New research uses reverberations for better automatic speech recognition
Espy-Wilson Honored at UMD’s First to ADVANCE Celebration
Espy-Wilson Delivers Keynote Address for Stanford’s WISE Inspirations Network
Espy-Wilson Featured in MIT’s Tech Review
Piya Pal delivers plenary talk at IEEE Underwater Acoustic Signal Processing Workshop
OmniSpeech to Demonstrate Technology at 2014 CES International

July 25, 2019


«Previous Story  

 

 

Current Headlines

Srivastava Named Inaugural Director of Semiconductor Initiatives and Innovation

State-of-the-Art 3D Nanoprinter Now at UMD

UMD, Partners Receive $31M for Semiconductor Research

Two NSF Awards for ECE Alum Michael Zuzak (Ph.D. ’22)

Applications Open for Professor and Chair of UMD's Department of Materials Science and Engineering

Ghodssi Honored With Gaede-Langmuir Award

Milchberg and Wu named Distinguished University Professors

New features on ingestible capsule will deliver targeted drugs to better treat IBD, Crohn’s disease

Forty years of MEMS research at the Hilton Head Workshop

Baturalp Buyukates (ECE Ph.D. ’21) Honored by IEEE ComSoc

 
 
Back to top  
Home Clark School Home UMD Home