search

UMD     This Site






Professor Carol Espy-Wilson (ECE/ISR) is one of the principal investigators of a three-year, $1.8 million National Science Foundation collaborative research grant for "Landmark-based Robust Speech Recognition Using Prosody-Guided Models of Speech Variability." Mary Harper, an affiliate research professor in the University of Maryland Computer Science Department and professor at Purdue University, is co-principal investigator of the University of Maryland portion of the grant.

This collaborative project includes research at four other locations: UCLA (Abeer Alwan, PI); University of Illinois Urbana-Champaign (Jennifer Cole, PI and Mark Hasegawa-Johnson, co-PI); Yale University (Louis Goldstein, PI) and Boston University (Elliot Saltzman, PI).

The research will develop a system with performance comparable to humans in automatically transcribing unrestricted conversational speech, representing many speakers and dialects, and embedded in adverse acoustic environments.

Espy-Wilson's approach will apply new high-dimensional machine learning techniques, constrained by empirical and theoretical studies of speech production and perception, to learn from data the information structures that human listeners extract from speech. She will develop large-vocabulary psychologically realistic models of speech acoustics, pronunciation variability, prosody, and syntax by deriving knowledge representations that reflect those proposed for human speech production and speech perception, using machine learning techniques to adjust the parameters of all knowledge representations simultaneously in order to minimize the structural risk of the recognizer.

The team will develop nonlinear acoustic landmark detectors and pattern classifiers that integrate auditory-based signal processing and acoustic phonetic processing, are invariant to noise, change in speaker characteristics and reverberation, and can be learned in a semi-supervised fashion from labeled and unlabeled data. In addition, they will use variable frame rate analysis, which will allow for multi-resolution analysis, as well as implement lexical access based on gesture, using a variety of training data.

The work will improve communication and collaboration between people and machines and also improve understanding of how human produce and perceive speech. It brings together a team of experts in speech processing, acoustic phonetics, prosody, gestural phonology, statistical pattern matching, language modeling, and speech perception, with faculty across engineering, computer science and linguistics.



May 17, 2007


«Previous Story  

 

 

Current Headlines

Adjustable Drug Release Marks New Milestone in Ingestible Capsule Research

ISR Alumnus Earns Prestigious NSF CAREER Award

Celebrating a Legend: Matt Scassero's Retirement Event

MATRIX-Affiliated Faculty Solving Challenges From Sea to Space

Scientists Fast-Track Nerve-on-a-Chip Design via Machine Learning Algorithms

Sochol Receives E. Robert Kent Outstanding Teaching Award for Junior Faculty

Innovation and Collaboration: Congressional Leaders Visit Southern Maryland

ISR Honors 2025 Graduate Achievements

How to Major in the Future

From the Chesapeake Bay to Deep Space: Innovating for the Public Good

 
 
Back to top  
Home Clark School Home UMD Home