Award Abstract # 0754089
CAREER: machine learning approches for articulatory inversion

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: UNIVERSITY OF CALIFORNIA, MERCED
Initial Amendment Date: October 31, 2007
Latest Amendment Date: May 18, 2010
Award Number: 0754089
Award Instrument: Continuing Grant
Program Manager: Tatiana Korelsky
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: August 1, 2007
End Date: December 31, 2011 (Estimated)
Total Intended Award Amount: $369,267.00
Total Awarded Amount to Date: $385,267.00
Funds Obligated to Date: FY 2007 = $69,267.00
FY 2008 = $100,000.00

FY 2009 = $108,000.00

FY 2010 = $108,000.00
History of Investigator:
  • Miguel Carreira-Perpinan (Principal Investigator)
    mcarreira-perpinan@ucmerced.edu
Recipient Sponsored Research Office: University of California - Merced
5200 N LAKE RD
MERCED
CA  US  95343-5001
(209)201-2039
Sponsor Congressional District: 13
Primary Place of Performance: University of California - Merced
5200 N LAKE RD
MERCED
CA  US  95343-5001
Primary Place of Performance
Congressional District:
13
Unique Entity Identifier (UEI): FFM7VPAG8P92
Parent UEI:
NSF Program(s): Robust Intelligence
Primary Program Source: app-0107 
01000809DB NSF RESEARCH & RELATED ACTIVIT

01000910DB NSF RESEARCH & RELATED ACTIVIT

01001011DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): HPCC, 9251, OTHR, 0000, 1045, 9218
Program Element Code(s): 749500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Articulatory inversion is the problem of recovering the sequence of
vocal tract shapes that produce a given acoustic utterance.
Articulatory representations are useful for automatic speech
recognition, speech production research, language therapy, and
language learning. Articulatory inversion is a hard problem because
different vocal tract shapes can produce the same acoustics, yet the
articulatory trajectory must obey the mechanical constraints of the
human vocal tract. Other examples of inversion problems over a
sequence, which share the multivalued nature of the mappings and the
existence of constraints, are: the recovery of facial gestures
associated with a speech utterance; the inverse kinematics of a robot
arm; and the recovery of 3D motion from video.

This project approaches articulatory inversion from a machine learning
standpoint, based on a framework introduced by the PI. The
low-dimensional manifold in articulatory-acoustic space is represented
in a probabilistic way by a density model estimated from data
(recorded using a microphone and electromagnetic articulography).
Multivalued mappings are explicitly represented by the modes of
conditional distributions of this density, and the articulatory
trajectory is disambiguated using a continuity constraint.

The project introduces new problems in dimensionality reduction,
density estimation and regularization (such as multivalued regression
and graph-learning from noisy data), and new models and algorithms.
The expected results of this work are: performing basic research in
machine learning, and introducing mapping inversion problems to
research and education; improving articulatory inversion (for which
code will be made freely available); and advocating data-driven
approaches in speech production research and education.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 26)
A. Myronenko, X. Song and M. A. Carreira-Perpinan "Non-rigid point set registration: Coherent Point Drift" Advances in Neural Information Processing Systems 19 (NIPS'2006) , v.19 , 2007 , p.1009
Carreira-Perpinan, M. A. "Generalised blurring mean-shift algorithms for nonparametric clustering" IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008) , 2008
Carreira-Perpinan, M. A. "The elastic embedding algorithm for dimensionality reduction" 27th International Conference on Machine Learning (ICML) , 2010 , p.167
Carreira-Perpiñán, M. Á. "Generalised blurring mean-shift algorithms for nonparametric clustering" IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008) , 2008
Carreira-Perpinan, M. A. and Lu, Z. "Dimensionality reduction by unsupervised regression" IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008) , 2008
Carreira-Perpiñán, M. Á. and Lu, Z. "Dimensionality reduction by unsupervised regression" IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008) , 2008
Carreira-Perpinan, M. A.; Lu, Z. "Manifold learning and missing data recovery through unsupervised regression" 12th IEEE Int. Conf. Data Mining (ICDM 2011) , 2011 , p.1014 10.1109/ICDM.2011.97
Carreira-Perpinan, M. A.; Lu, Z. "Parametric dimensionality reduction by unsupervised regression" IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , 2010 , p.1895 10.1109/CVPR.2010.5539862
C. Qin and M. A. Carreira-Perpinan "A comparison of acoustic features for articulatory inversion" Eurospeech'2007 , 2007 , p.2469
C. Qin and M. A. Carreira-Perpinan "An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping" Eurospeech'2007 , 2007 , p.74
M. A. Carreira-Perpinan "Acceleration strategies for Gaussian mean-shift image segmentation" IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006) , 2006 , p.1160 10.1109/CVPR.2006.44
(Showing: 1 - 10 of 26)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

The practical motivation for this project was the solution of difficult inverse problems such as articulatory inversion in speech processing, where we want to recover the vocal tract shape that produced a given utterance; people tracking in computer vision, where we want to recover the 3D pose of a person from a video; or inverse kinematics in robotics, where we want to determine the joint angles that will position a robot arm along a desired trajectory in workspace. The PI developed machine learning algorithms and theory for problems inspired by these applications.

One specific area of research concerned algorithms to reduce the dimensionality of data. The PI developed several new algorithms, as well as numerical optimization methods to accelerate their training, and extended some of them to the case where part of the training or testing data is missing. One of these algorithms, the Laplacian Eigenmaps Latent Variable Model, was used in a people tracking application. Other applications of these algorithms involved the 2D or 3D visualization of high-dimensional data.

Another specific area of research concerned mean-shift algorithms, which have traditionally been used for clustering problems, such as segmenting an image into meaningful objects. The PI has contributed to the theory of mean-shift algorithms, by proving their convergence and their order of convergence, and relating them to expectation-maximisation (EM) algorithms; and to their practical application, by developing fast numerical optimization methods for them. The PI has also extended their applicability to problems beyond clustering: to denoising point clouds that have a low-dimensional structure (such as the surface of a 3D object as measured with a 3D laser scanner, or the manifold defined by a collection of images of handwritten digits that vary in slant, thickness, style, etc.); and to reconstructing missing entries of a matrix, as in recommender systems.

The PI has also contributed to the problem of articulatory inversion of speech. Through the application of machine learning techniques to databases of acoustic and articulatory speech, he has quantified the frequency with which speakers produce a given, fixed speech sound using more than one vocal tract shape. He has developed an articulatory inversion algorithm that explicitly estimates these different vocal tract shapes, and he has also applied this algorithm to the inverse kinematics problem in robotics.

This research has contributed to the theoretical and computational understanding of various existing and new unsupervised learning algorithms, with particular emphasis in their optimisation, and has illustrated their application in the areas mentioned above.

Datasets, as well as Matlab implementations for most of the algorithms resulting from this research, are available for free from the PI's web page.


Last Modified: 11/16/2012
Modified by: Miguel A Carreira-Perpinan