Skip to feedback

Award Abstract # 2310955
Knockoff Feature Selection Techniques for Robust Inference in Supervised and Unsupervised Learning

NSF Org: DMS
Division Of Mathematical Sciences
Recipient: WEILL MEDICAL COLLEGE OF CORNELL UNIVERSITY
Initial Amendment Date: July 18, 2023
Latest Amendment Date: July 18, 2023
Award Number: 2310955
Award Instrument: Standard Grant
Program Manager: Yong Zeng
yzeng@nsf.gov
 (703)292-7299
DMS
 Division Of Mathematical Sciences
MPS
 Directorate for Mathematical and Physical Sciences
Start Date: August 1, 2023
End Date: July 31, 2026 (Estimated)
Total Intended Award Amount: $200,000.00
Total Awarded Amount to Date: $200,000.00
Funds Obligated to Date: FY 2023 = $200,000.00
History of Investigator:
  • Yushu Shi (Principal Investigator)
    yus4011@med.cornell.edu
Recipient Sponsored Research Office: Joan and Sanford I. Weill Medical College of Cornell University
575 LEXINGTON AVE FL 9
NEW YORK
NY  US  10022-6145
(646)962-8290
Sponsor Congressional District: 12
Primary Place of Performance: Joan and Sanford I. Weill Medical College of Cornell University
1300 YORK AVE
NEW YORK
NY  US  10065-4805
Primary Place of Performance
Congressional District:
12
Unique Entity Identifier (UEI): YNT8TCJH8FQ8
Parent UEI: QV1RJ11H58C4
NSF Program(s): STATISTICS,
MATHEMATICAL BIOLOGY
Primary Program Source: 01002324DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7334
Program Element Code(s): 126900, 733400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.049

ABSTRACT

This project aims to develop a new methodology for selecting key features among a large pool of potential variables that are predictive of the final outcomes. When applied to the biomedical field, these methods will enable the discovery of determinants of patient health, thus improving the prevention, treatment, and management of diseases. When used in fields such as engineering, psychology, sociology, economics, and environmental sciences, these methods can improve manufacturing processes, social programs that focus on diversity and equity, the care and management of mental health, and the preservation of the environment and natural resources. Additionally, the new methods will also help to generate high-quality synthetic data while maintaining the confidentiality of the original information, thereby spurring new scientific discoveries and providing a valuable educational tool. The project will offer a number of unique interdisciplinary training initiatives for the future cohorts of data scientists at the interface of statistics, machine learning, and biomedical sciences.

The research agenda is based on the 'knockoff method' for identifying key features predictive of the outcomes while maintaining false discovery control. The methods incorporate the microbiome phylogenetic structure in feature selection, accommodate missing values, incorporate multiple knockoffs to increase robustness, employ nonparametric Bayesian models for complex data structures, and introduce a new knockoff statistic based on conditional prediction function. The proposed statistics can be paired with state-of-the-art machine learning models to detect nonlinear relationships while accounting for feature correlation. Furthermore, by applying knockoff filtering with unsupervised learning models, this research can identify determinants of the feature space and provide insights into unsupervised clustering and learning.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page