
NSF Org: |
DMS Division Of Mathematical Sciences |
Recipient: |
|
Initial Amendment Date: | August 4, 2016 |
Latest Amendment Date: | August 4, 2016 |
Award Number: | 1613295 |
Award Instrument: | Standard Grant |
Program Manager: |
Gabor Szekely
DMS Division Of Mathematical Sciences MPS Directorate for Mathematical and Physical Sciences |
Start Date: | August 15, 2016 |
End Date: | July 31, 2020 (Estimated) |
Total Intended Award Amount: | $150,000.00 |
Total Awarded Amount to Date: | $150,000.00 |
Funds Obligated to Date: |
|
History of Investigator: |
|
Recipient Sponsored Research Office: |
438 WHITNEY RD EXTENSION UNIT 1133 STORRS CT US 06269-9018 (860)486-3622 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
215 Glenbrook Road Storrs CT US 06269-4120 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): | STATISTICS |
Primary Program Source: |
|
Program Reference Code(s): | |
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.049 |
ABSTRACT
Multi-view data, or the measuring of several distinct yet interrelated sets of characteristics pertaining to a single set of subjects and possibly collected from an array of sources, has become increasingly common in the fields of engineering and scientific research. This project innovates new methodologies, statistical theories, and scalable computational tools to tackle a range of statistical learning problems with multi-view data. An integrated statistical analysis of the multi-view data generation mechanisms, enabled by this project, will allow us to gain extraordinary insight of real-world phenomena by utilizing information obtained from different lenses and from different angles.
The PI will develop several generalizations of the reduced-rank matrix structure, to enable a spectrum of multivariate statistical methods for multi-view learning. The general methodology of reduced-rank estimation is one of the most critical ingredients in modern multivariate analysis. However, for handling multi-view data, the potential of the reduced-rank methodology is far from being fully realized or understood. This project presents the following overarching objectives: (1) develop integrative multivariate regression for joint learning, which entails the exploitation of multiple sets of features to build an integrated predictive model of multivariate response; (2) develop integrative canonical correlation analysis for shared learning, by combining the exploration of shared low-dimensional association structures between multiple sets of features and the development of coherent predictive models for multivariate response; (3) develop integrative dimension reduction for multi-scale learning, by utilizing both the global and local low-dimensional structures among sub-matrices of a high-dimensional matrix object; (4) develop diagnostic measures for robust learning, which would enable reliable multi-view data integration and data quality assessment.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This project has innovated a range of new methodologies, theories, and scalable computational tools to advance statistical learning with multivariate and multi-view data. Multi-view data, or the measuring of several distinct yet interrelated sets of characteristics pertaining to a single set of subjects and possibly collected from an array of sources, has become increasingly common in the fields of engineering and scientific research. Integrative learning using the tools developed in this project has allowed us to gain important insights in a variety of real-world applications in genetics, finance, population health, among others.
Throughout the project, we have pursued a comprehensive investigation and generalization of the so-called reduced-rank methodology, one of the most critical ingredients in modern multivariate statistical techniques, in order to advance it for large-scale multivariate/multi-view learning. We have progressed on three aspects. First, we investigated the fundamental properties of reduced-rank estimation, including its complexity measure (degrees of freedom) and unbiased risk estimation, model selection and diagnostics, robustification and outlier detection, nested or multi-scale reduced-rank structure, adaptive nuclear-norm penalization for improving bias-variance tradeoff, and composite nuclear-norm penalization for dimension reduction with multi-view feature sets. Second, we investigated the integration of reduced-rank structure with other indispensable data attributes and modeling elements, such as sparsity, feature grouping, dynamic association, missing data, data heterogeneity, etc. For example, we have developed a series of sparse and low-rank methods for simultaneous dimension reduction and variable selection, such as sparse and orthogonal factor regression for association network learning, Bayesian sparse and low-rank models for inference making, generalized sparse and low-rank models with mixed-type responses, divide-and-conquer and stagewise learning approaches for scalable computation, among others. Third, we investigated the integration of disparate but interrelated learning objectives with multi-view data, such as simultaneous feature construction and predictive modeling. With this project, the potentials of the reduced-rank methodology have been better realized and understood for handling multivariate and multi-view data in joint learning, shared learning, multi-scale learning and robust learning.
The project has involved training of several Ph.D. students. More than 20 papers have been published in leading statistical and machine learning journals, and several R packages have been developed and distributed on CRAN.
Last Modified: 11/11/2020
Modified by: Kun Chen
Please report errors in award information by writing to: awardsearch@nsf.gov.