
NSF Org: |
IIS Division of Information & Intelligent Systems |
Recipient: |
|
Initial Amendment Date: | October 31, 2007 |
Latest Amendment Date: | May 18, 2010 |
Award Number: | 0754089 |
Award Instrument: | Continuing Grant |
Program Manager: |
Tatiana Korelsky
IIS Division of Information & Intelligent Systems CSE Directorate for Computer and Information Science and Engineering |
Start Date: | August 1, 2007 |
End Date: | December 31, 2011 (Estimated) |
Total Intended Award Amount: | $369,267.00 |
Total Awarded Amount to Date: | $385,267.00 |
Funds Obligated to Date: |
FY 2008 = $100,000.00 FY 2009 = $108,000.00 FY 2010 = $108,000.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
5200 N LAKE RD MERCED CA US 95343-5001 (209)201-2039 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
5200 N LAKE RD MERCED CA US 95343-5001 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | Robust Intelligence |
Primary Program Source: |
01000809DB NSF RESEARCH & RELATED ACTIVIT 01000910DB NSF RESEARCH & RELATED ACTIVIT 01001011DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Articulatory inversion is the problem of recovering the sequence of
vocal tract shapes that produce a given acoustic utterance.
Articulatory representations are useful for automatic speech
recognition, speech production research, language therapy, and
language learning. Articulatory inversion is a hard problem because
different vocal tract shapes can produce the same acoustics, yet the
articulatory trajectory must obey the mechanical constraints of the
human vocal tract. Other examples of inversion problems over a
sequence, which share the multivalued nature of the mappings and the
existence of constraints, are: the recovery of facial gestures
associated with a speech utterance; the inverse kinematics of a robot
arm; and the recovery of 3D motion from video.
This project approaches articulatory inversion from a machine learning
standpoint, based on a framework introduced by the PI. The
low-dimensional manifold in articulatory-acoustic space is represented
in a probabilistic way by a density model estimated from data
(recorded using a microphone and electromagnetic articulography).
Multivalued mappings are explicitly represented by the modes of
conditional distributions of this density, and the articulatory
trajectory is disambiguated using a continuity constraint.
The project introduces new problems in dimensionality reduction,
density estimation and regularization (such as multivalued regression
and graph-learning from noisy data), and new models and algorithms.
The expected results of this work are: performing basic research in
machine learning, and introducing mapping inversion problems to
research and education; improving articulatory inversion (for which
code will be made freely available); and advocating data-driven
approaches in speech production research and education.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
The practical motivation for this project was the solution of difficult inverse problems such as articulatory inversion in speech processing, where we want to recover the vocal tract shape that produced a given utterance; people tracking in computer vision, where we want to recover the 3D pose of a person from a video; or inverse kinematics in robotics, where we want to determine the joint angles that will position a robot arm along a desired trajectory in workspace. The PI developed machine learning algorithms and theory for problems inspired by these applications.
One specific area of research concerned algorithms to reduce the dimensionality of data. The PI developed several new algorithms, as well as numerical optimization methods to accelerate their training, and extended some of them to the case where part of the training or testing data is missing. One of these algorithms, the Laplacian Eigenmaps Latent Variable Model, was used in a people tracking application. Other applications of these algorithms involved the 2D or 3D visualization of high-dimensional data.
Another specific area of research concerned mean-shift algorithms, which have traditionally been used for clustering problems, such as segmenting an image into meaningful objects. The PI has contributed to the theory of mean-shift algorithms, by proving their convergence and their order of convergence, and relating them to expectation-maximisation (EM) algorithms; and to their practical application, by developing fast numerical optimization methods for them. The PI has also extended their applicability to problems beyond clustering: to denoising point clouds that have a low-dimensional structure (such as the surface of a 3D object as measured with a 3D laser scanner, or the manifold defined by a collection of images of handwritten digits that vary in slant, thickness, style, etc.); and to reconstructing missing entries of a matrix, as in recommender systems.
The PI has also contributed to the problem of articulatory inversion of speech. Through the application of machine learning techniques to databases of acoustic and articulatory speech, he has quantified the frequency with which speakers produce a given, fixed speech sound using more than one vocal tract shape. He has developed an articulatory inversion algorithm that explicitly estimates these different vocal tract shapes, and he has also applied this algorithm to the inverse kinematics problem in robotics.
This research has contributed to the theoretical and computational understanding of various existing and new unsupervised learning algorithms, with particular emphasis in their optimisation, and has illustrated their application in the areas mentioned above.
Datasets, as well as Matlab implementations for most of the algorithms resulting from this research, are available for free from the PI's web page.
Last Modified: 11/16/2012
Modified by: Miguel A Carreira-Perpinan