Award Abstract # 1007594
Simultaneous Confidence Regions for Functional Data Analysis: Theory and Methods

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: MICHIGAN STATE UNIVERSITY
Initial Amendment Date: August 24, 2010
Latest Amendment Date: August 24, 2010
Award Number: 1007594
Award Instrument: Standard Grant
Program Manager: Gabor Szekely
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: September 1, 2010
End Date: August 31, 2013 (Estimated)
Total Intended Award Amount: $159,986.00
Total Awarded Amount to Date: $159,986.00
Funds Obligated to Date: FY 2010 = $159,986.00
History of Investigator:
  • Lijian Yang (Principal Investigator)
    yang@stt.msu.edu
Recipient Sponsored Research Office: Michigan State University
426 AUDITORIUM RD RM 2
EAST LANSING
MI  US  48824-2600
(517)355-5040
Sponsor Congressional District: 07
Primary Place of Performance: Michigan State University
426 AUDITORIUM RD RM 2
EAST LANSING
MI  US  48824-2600
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): R28EKN92ZTZ9
Parent UEI: VJKZC4D1JN36
NSF Program(s): STATISTICS
Primary Program Source: 01001011DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 126900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

This research project provides simultaneous confidence regions for various functional features in functional data analysis (FDA), with asymptotic theory and guide to practical implementation. Specifically, asymptotically correct confidence regions will be constructed for (1) the mean function of functional data and the coefficient function in varying coefficient longitudinal regression model; and (2) the covariance function of functional data and the regression function in functional linear model. For the simpler functions in (1), the investigator will employ both regression spline and local polynomial methods in order to establish rigorous asymptotic theory for both sparse and dense function data. Results on partial sum strong approximation by Brownian motions and advanced extreme value theory for sequences of non-stationary Gaussian processes will be applied to obtain distributional properties of the maximal deviation processes. For the more complicated functions in (2), the investigator will propose two-step estimators and show that it is asymptotically as efficient as some ?infeasible? analogs. Asymptotic distributions for maximal deviations are established for the ?infeasible estimators? which are then inherited by the two-step estimators.

Functional data, also known as curve data, consist of collections of digitally recorded curves or surfaces, often with random errors. Such data abound in virtually all scientific disciplines, including but not limited to, climatology, clinical studies, epidemiology, evolutionary biology and food engineering/science. The need to draw information out of a sample of curves, coupled with the unleashing of modern computing power, has made functional data analysis (FDA) one of the most active areas of contemporary statistics research. While multivariate statistics is about unknown vectors and matrices, FDA concerns unknown curves and surfaces, which is most naturally done with confidence regions. The methods developed by the investigator fill a major gap in the current FDA methodology, which lacks procedures to make conclusions on an entire curve with quantifiable uncertainty. Codes written in common software packages such as Matlab or R will be freely distributed so practitioners from academia and industry for analyzing functional data in real time, with own chosen significance levels. Completing this project depends crucially on several capable Ph. D. students working under the investigator?s supervision, so state-of-the-art research is integrated with the training of graduate students as future researchers, consistent with NSF's education goal.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 17)
Cao, G., Todem, D., Yang, L. and Fine, J. "Evaluating statistical hypotheses for non-identifiable models using estimating functions" Scandinavian Journal of Statistics , v.40 , 2013 , p.256 DOI: 10.1111/j.1467-9469.2012.00811.x
Cao, G., Wang, J., Wang. L. and Todem, D. "Spline confidence bands for functional derivatives" Journal of Statistical Planning and Inference , v.142 , 2012 , p.1557
Cao, G., Yang, L. and Todem, D. "Simultaneous inference for the mean function based on dense functional data" Journal of Nonparametric Statistics , v.24 , 2012 , p.359 http://dx.doi.org/10.1080/10485252.2011.638071
Liu, R., Yang, L. and Härdle, W. "Oracally efficient two-step estimation of generalized additive model" Journal of the American Statistical Association , v.108 , 2013 , p.619 DOI: 10.1080/01621459.2013.763726
Ma, S. and Yang, L. "A jump-detecting procedure based on spline estimation" Journal of Nonparametric Statistics , v.23 , 2011 , p.67
Ma, S. and Yang, L. "Spline-backfitted kernel smoothing of partially linear additive model" Journal of Statistical Planning and Inference , v.141 , 2011 , p.204
Ma, S., Yang, L. and Carroll, R. "A simultaneous confidence band for sparse longitudinal regression" Statistica Sinica , v.22 , 2012 10.5705/ss.2010.034
Ma, S., Yang, L., Romero, R. and Cui, Y. "Varying coefficient model for gene-environment interaction: a non-linear look" Bioinformatics , v.27 , 2011 , p.21
Mishra, D. K., Dolan, K. D. and Yang, L. "Bootstrap confidence intervals for the kinetic parameters for degradation of anthocyanins in grape pomace" Journal of Food Process Engineering , v.34 , 2011 , p.122
Qiu, D., Shao, Q. and Yang, L. "Efficient inference for autoregressive coefficients in the presence of trends" Journal of Multivariate Analysis , v.114 , 2013 , p.40 http://dx.doi.org/10.1016/j.jmva.2012.07.016
Qiu, D., Shao, Q., and Yang, L. "Efficient inference for autoregressive coefficients in the presence of trends" Journal of Multivariate Analysis , v.114 , 2013 , p.40 10.1016/j.jmva.2012.07.016
(Showing: 1 - 10 of 17)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Functional data, also figuratively called curve data, consist of random collections of digitally recorded sample curves or surfaces, often contaminated with measurement errors. Since 2000, the study of functional data has been a focal point in main stream statistics research, as such data pour in from virtually all scientific disciplines, including but not limited to, climatology, clinical studies, epidemiology, evolutionary biology and food engineering/science.

While there has been a massive amount of research in functional data analysis (FDA), a mere fraction of it addresses the critical issue of statistical inference, namely, drawing conclusions about an entire curve or surface of interest with quantifiable uncertainty. While classic mathematical statistics provides data analysts with confidence intervals for single parameters and joint confidence regions for multiple parameters, analogous constructs in the context of FDA almost did not exist prior to this project.

The most natural tools for drawing intelligent conclusions on unknown curves/surfaces are confidence bands/envelopes, which are simply two/three dimensional regions enclosed by an upper confdence curve/surface and a lower one, both constructed from the data. At the completion of this project, several types of simultaneous confidence bands have been made available for the mean of functional data of both sparse and dense type (i.e., each sample curve may have been recorded over a small or large number of points). Simultaneous confidence envelope has also been provided for the covariance surface of functional data which is not affected by the mean function.

Codes written in the popular software package R to compute confidence band for sparse functional data will be available on the internet so practitioners from academia and industry can use it freely for analyzing functional data in real time, with own chosen significance levels. Three Ph. D. students had worked on the project and had been trained to capable researchers in FDA, consistent with NSF's education goal.


Last Modified: 11/29/2013
Modified by: Lijian Yang

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page