Award Abstract # 1613295
Integrative Multivariate Analysis of Multi-View Data

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: UNIVERSITY OF CONNECTICUT
Initial Amendment Date: August 4, 2016
Latest Amendment Date: August 4, 2016
Award Number: 1613295
Award Instrument: Standard Grant
Program Manager: Gabor Szekely
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: August 15, 2016
End Date: July 31, 2020 (Estimated)
Total Intended Award Amount: $150,000.00
Total Awarded Amount to Date: $150,000.00
Funds Obligated to Date: FY 2016 = $150,000.00
History of Investigator:
  • Kun Chen (Principal Investigator)
    kun.chen@uconn.edu
Recipient Sponsored Research Office: University of Connecticut
438 WHITNEY RD EXTENSION UNIT 1133
STORRS
CT  US  06269-9018
(860)486-3622
Sponsor Congressional District: 02
Primary Place of Performance: University of Connecticut
215 Glenbrook Road
Storrs
CT  US  06269-4120
Primary Place of Performance
Congressional District:
02
Unique Entity Identifier (UEI): WNTPS995QBM7
Parent UEI:
NSF Program(s): STATISTICS
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):
Program Element Code(s): 126900
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

Multi-view data, or the measuring of several distinct yet interrelated sets of characteristics pertaining to a single set of subjects and possibly collected from an array of sources, has become increasingly common in the fields of engineering and scientific research. This project innovates new methodologies, statistical theories, and scalable computational tools to tackle a range of statistical learning problems with multi-view data. An integrated statistical analysis of the multi-view data generation mechanisms, enabled by this project, will allow us to gain extraordinary insight of real-world phenomena by utilizing information obtained from different lenses and from different angles.

The PI will develop several generalizations of the reduced-rank matrix structure, to enable a spectrum of multivariate statistical methods for multi-view learning. The general methodology of reduced-rank estimation is one of the most critical ingredients in modern multivariate analysis. However, for handling multi-view data, the potential of the reduced-rank methodology is far from being fully realized or understood. This project presents the following overarching objectives: (1) develop integrative multivariate regression for joint learning, which entails the exploitation of multiple sets of features to build an integrated predictive model of multivariate response; (2) develop integrative canonical correlation analysis for shared learning, by combining the exploration of shared low-dimensional association structures between multiple sets of features and the development of coherent predictive models for multivariate response; (3) develop integrative dimension reduction for multi-scale learning, by utilizing both the global and local low-dimensional structures among sub-matrices of a high-dimensional matrix object; (4) develop diagnostic measures for robust learning, which would enable reliable multi-view data integration and data quality assessment.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 21)
A. Mishra, D. K. Dey, and K. Chen "Sequential co-sparse factor regression" Journal of Computational & Graphical Statistics , v.26 , 2017 , p.814-825 10.1080/10618600.2017.1340891
C. Yu, W. Yao, and K. Chen "A new method for robust mixture regression and outlier detection" Canadian Journal of Statistics , v.45 , 2017 , p.77-94
Gen, L. and Liu, X. "Integrative multi?view regression: Bridging group?sparse and low?rank models" Biometrics , 2018 Citation Details
G. Goh, D. K. Dey, and K. Chen "Bayesian sparse reduced rank multivariate regression" Journal of Multivariate Analysis , v.157 , 2017 , p.14-28
G. Vaughan, R. Aseltine, K. Chen, and J. Yan "Efficient interaction selection via stagewise generalized estimation equations" Statistics in Medicine , 2020 10.1002/sim.8574
G. Vaughan, R. Aseltine, K. Chen, and J. Yan "Stagewise generalized estimation equations with grouped variables" Biometrics , v.73 , 2017 , p.1332-1342
He, L. and Chen, K. and Xu, W. and Zhou, J. and Wang, F. "Boosted Sparse and Low-Rank Tensor Regression" Advances in neural information processing systems , v.31 , 2018 Citation Details
He, L., Chen, K., Xu, W., Zhou, J., and Wang, F. "Boosted sparse and low-rank tensor regression" Advances in Neural Information Processing Systems (NeurIPS) , v.31 , 2018 , p.1009
K. Chen and Y. Ma "Analysis of double single index models" Scandinavian Journal of Statistics , v.44 , 2017 , p.1-20
K. Chen, E. A. Hoffman, I. Seetharaman, C.-L. Lin, and K.-S. Chan "Linking lung airway structure to pulmonary function via composite bridge regression" Annals of Applied Statistics , v.10 , 2016 , p.1880-1906
K. Chen, N. Mishra, J. Smith, H. Bar, E. Schifano, L. Kuo, and M.-H. Chen "A tailored multivariate mixture model for detecting proteins of concordant change in the pathogenesis of Necrotic Enteritis" Journal of the American Statistical Association , 2018 10.1080/01621459.2017.1356314
(Showing: 1 - 10 of 21)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This project has innovated a range of new methodologies, theories, and scalable computational tools to advance statistical learning with multivariate and multi-view data. Multi-view data, or the measuring of several distinct yet interrelated sets of characteristics pertaining to a single set of subjects and possibly collected from an array of sources, has become increasingly common in the fields of engineering and scientific research. Integrative learning using the tools developed in this project has allowed us to gain important insights in a variety of real-world applications in genetics, finance, population health, among others. 

Throughout the project, we have pursued a comprehensive investigation and generalization of the so-called reduced-rank methodology, one of the most critical ingredients in modern multivariate statistical techniques, in order to advance it for large-scale multivariate/multi-view learning. We have progressed on three aspects. First, we investigated the fundamental properties of reduced-rank estimation, including its complexity measure (degrees of freedom) and unbiased risk estimation, model selection and diagnostics, robustification and outlier detection, nested or multi-scale reduced-rank structure, adaptive nuclear-norm penalization for improving bias-variance tradeoff, and composite nuclear-norm penalization for dimension reduction with multi-view feature sets. Second, we investigated the integration of reduced-rank structure with other indispensable data attributes and modeling elements, such as sparsity, feature grouping, dynamic association, missing data, data heterogeneity, etc. For example, we have developed a series of sparse and low-rank methods for simultaneous dimension reduction and variable selection, such as sparse and orthogonal factor regression for association network learning, Bayesian sparse and low-rank models for inference making, generalized sparse and low-rank models with mixed-type responses, divide-and-conquer and stagewise learning approaches for scalable computation, among others. Third, we investigated the integration of disparate but interrelated learning objectives with multi-view data, such as simultaneous feature construction and predictive modeling. With this project, the potentials of the reduced-rank methodology have been better realized and understood for handling multivariate and multi-view data in joint learning, shared learning, multi-scale learning and robust learning. 

The project has involved training of several Ph.D. students. More than 20 papers have been published in leading statistical and machine learning journals, and several R packages have been developed and distributed on CRAN. 

 


Last Modified: 11/11/2020
Modified by: Kun Chen

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page