Skip to feedback

Award Abstract # 1652943
CAREER: Robust Brain Imaging Genomics Data Mining Framework for Improved Cognitive Health

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: TRUSTEES OF THE COLORADO SCHOOL OF MINES
Initial Amendment Date: February 16, 2017
Latest Amendment Date: June 14, 2023
Award Number: 1652943
Award Instrument: Continuing Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: February 15, 2017
End Date: January 31, 2024 (Estimated)
Total Intended Award Amount: $409,641.00
Total Awarded Amount to Date: $529,641.00
Funds Obligated to Date: FY 2017 = $83,014.00
FY 2018 = $93,873.00

FY 2019 = $99,021.00

FY 2020 = $117,290.00

FY 2021 = $104,443.00

FY 2022 = $16,000.00

FY 2023 = $16,000.00
History of Investigator:
  • Hua Wang (Principal Investigator)
    huawang@mines.edu
Recipient Sponsored Research Office: Colorado School of Mines
1500 ILLINOIS ST
GOLDEN
CO  US  80401-1887
(303)273-3000
Sponsor Congressional District: 07
Primary Place of Performance: Colorado School of Mines
1500 Illinois St
Golden
CO  US  80401-1887
Primary Place of Performance
Congressional District:
07
Unique Entity Identifier (UEI): JW2NGMP4NMA3
Parent UEI: JW2NGMP4NMA3
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01002223DB NSF RESEARCH & RELATED ACTIVIT
01002324DB NSF RESEARCH & RELATED ACTIVIT

01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT

01001920DB NSF RESEARCH & RELATED ACTIVIT

01002021DB NSF RESEARCH & RELATED ACTIVIT

01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 1045, 7364, 8089, 8091, 9251
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

The goal of this CAREER project is to identify and establish a new robust data mining framework for better modeling, understanding and analyzing brain imaging genomics data that combine the concepts of sparsity-induced learning models and new and more efficient computational algorithms. The proposed research in this project is innovative and crucial not only to facilitating the development of new data mining techniques, but also to addressing emerging scientific questions in brain imaging genomics, and to greatly supporting the BRAIN Initiative which has recently been unveiled by the U.S. Government and become a national goal. Integrated with the research in this project are the educational goals to create and broadly disseminate new curricular and K-12 outreach materials that focus both on the challenges of large-scale, heterogeneous-modal and high dimensional data processing and on the principles behind the robust data mining techniques for alleviating them.

This project focuses on designing principled data mining algorithms for analyzing multi-modal brain imaging genomics data to yield mechanistic understanding from gene to brain function and to phenotypic outcomes. Of particular interests are (1) large-scale non-convex sparse learning models with linear convergence algorithms, (2) linear computational cost multi-task multi-dimensional data integration algorithms, and (3) evaluation and validation in large-scale brain imaging genomics studies. The research in this project will enable new computational applications in a large number of research areas. The educational materials developed as part of this project will give K-12 students a taste of some of the many fascinating topics in the machine learning and data mining fields while communicating to students the relevance of their mathematics and science classes to futures in engineering.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 36)
Brand, L and Nichols, K and Wang, H and Huang, H and Shen, L "Predicting Longitudinal Outcomes of Alzheimers Disease via a Tensor-Based Joint Classification and Regression Model" Pacific Symposium on Biocomputing 2020 , v.2020 , 2019 10.1142/9789811215636_0002 Citation Details
Brand L., Nichols K. "Predicting Longitudinal Outcomes of Alzheimers Disease via a Tensor-Based Joint Classification and Regression Mode" The Proceedings of the 25th Pacific Symposium on Biocomputing (PSB 2020) , 2020 Citation Details
Brand, Lodewijk and Baker, Lauren Zoe and Ellefsen, Carla and Sargent, Jackson and Wang, Hua "A Linear Primal-Dual Multi-Instance SVM for Big Data Classifications" 2021 IEEE International Conference on Data Mining (ICDM) , 2021 https://doi.org/10.1109/ICDM51629.2021.00012 Citation Details
Brand, Lodewijk and Nichols, Kai and Wang, Hua and Shen, Li and Huang, Heng "Joint Multi-Modal Longitudinal Regression and Classification for Alzheimers Disease Prediction" IEEE Transactions on Medical Imaging , 2020 https://doi.org/10.1109/TMI.2019.2958943 Citation Details
Brand, Lodewijk and O'Callaghan, Braedon and Sun, Anthony and Wang, Hua "Task Balanced Multimodal Feature Selection to Predict the Progression of Alzheimers Disease" 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE) , v.1 , 2020 https://doi.org/10.1109/BIBE50027.2020.00040 Citation Details
Brand, Lodewijk and Seo, Hoon and Baker, Lauren Zoe and Ellefsen, Carla and Sargent, Jackson and Wang, Hua "A linear primaldual multi-instance SVM for big data classifications" Knowledge and Information Systems , 2023 https://doi.org/10.1007/s10115-023-01961-z Citation Details
Brand, Lodewijk and Wang, Hua and Huang, Heng and Risacher, Shannon and Saykin, Andrew and Shen, Li "Joint High-Order Multi-Task Feature Learning to Predict the Progression of Alzheimer's Disease" The Proceedings of the 21st International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2018) , 2018 10.1007/978-3-030-00928-1_63 Citation Details
Brand, Lodewijk and Yang, Xue and Liu, Kai and Elbeleidy, Saad and Wang, Hua and Zhang, Hao "Learning Robust Multi-label Sample Specific Distances for Identifying HIV-1 Drug Resistance" The Proceedings of the 23rd Annual International Conference on Research in Computational Molecular Biology (RECOMB 2019) , 2019 10.1007/978-3-030-17083-7_4 Citation Details
Brand, Lodewijk and Yang, Xue and Liu, Kai and Elbeleidy, Saad and Wang, Hua and Zhang, Hao and Nie, Feiping "Learning Robust Multilabel Sample Specific Distances for Identifying HIV-1 Drug Resistance" Journal of Computational Biology , 2019 10.1089/cmb.2019.0329 Citation Details
Fei Han, Hua Wang "Learning Integrated Holism-Landmark Representations for Long-Term Loop Closure Detection" The Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018) , 2018 Citation Details
Han, Fei and Beleidy, Saad El and Wang, Hua and Ye, Cang and Zhang, Hao "Learning of Holism-Landmark Graph Embedding for Place Recognition in Long-Term Autonomy" IEEE Robotics and Automation Letters , v.3 , 2018 10.1109/LRA.2018.2856274 Citation Details
(Showing: 1 - 10 of 36)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

Scientific outcomes for intellectual merits:

1. We developed multiple sparse multi-view learning algorithms for identifying biomarkers for early detection of Alzheimer’s disease (AD)

AD is a degenerative brain disease that affects millions of people around the world. As populations in the United States and worldwide age, the prevalence of Alzheimer’s disease will only increase. In turn, the social and financial costs of AD will create a difficult environment for many families and caregivers across the globe. By combining genetic information, brain scans, and clinical data, gathered over time through the Alzheimer’s Disease Neuroimaging Initiative (ADNI), we developed a new joint regression and classification model that has shown great performance in the identification of relevant genetic and phenotypic biomarkers in patients with AD. As shown in Fig.1, our newly proposed method consists of three major components as follows. First, we use the L2,1-norm regularization to effectively associate input features overtime and generate a sparse solution. Second, we utilize a new group L1-norm regularization proposed in our previous works to globally associate the weights of the input imaging and genetic modalities, where a modality indicates a single data grouping (e.g. brain imaging data, genetic data, diagnostic data, etc.). The group L1-norm regularization is able to determine which input modality is most effective at predicting a particular output. Third, we incorporate the trace norm regularization to determine relationships that occur within modalities.

2. We developed several data representation and compression methods and applied them to the analysis of multimodal imaging, biomarker, genomics and transcriptomics data sets.

To aid automatic AD diagnoses, many longitudinal learning models have been proposed to predict clinical outcomes and/or disease status, which, though, often fail to consider missing temporal phenotypic records of the patients that can convey valuable information of AD progressions. Another challenge in AD studies is how to integrate heterogeneous genotypic and phenotypic biomarkers to improve diagnosis prediction. To cope with these challenges, as illustrated in Fig.2 we proposed a longitudinal multi-modal method to learn enriched genotypic and phenotypic biomarker representations in the format of fixed-length vectors that can simultaneously capture the baseline neuroimaging measurements of the entire dataset and progressive variations of the varied counts of follow-up measurements over time of every participant from different biomarker sources. The learned global and local projections are aligned by a soft constraint and the structured-sparsity norm is used to uncover the multi-modal structure of heterogeneous biomarker measurements. We have conducted extensive experiments on the ADNI data using one genotypic and two phenotypic biomarkers. Empirical results have demonstrated that the learned enriched biomarker representations are more effective in predicting the outcomes of various cognitive assessments. Moreover, our model has successfully identified disease-relevant biomarkers supported by existing medical findings that additionally warrant the correctness of our method from the clinical perspective.

3. We enhanced a few machine learning models that build the theoretical foundations of machine learning.

(1) Principal Component Analysis (PCA) is one of the most broadly used methods to analyze high-dimensional data. However, most existing studies on PCA aim to minimize the reconstruction error measured by the Euclidean distance, although in some fields, such as text analysis in information retrieval, analysis using the angle distance is known to be more effective. To this end, we proposed a novel PCA formulation by adding a constraint on the factors to unify the Euclidean distance and the angle distance. (2) Traditional Linear discriminant analysis (LDA) objective aims to minimize the ratio of the squared Euclidean distances that may not perform optimally on noisy datasets. One limitation is that the mean calculations use the squared ℓ2-norm distance to center the data, which is not valid when the objective depends on other distance functions. The second problem is that there is no generalized optimization algorithm to solve different robust LDA objectives. In addition, most existing algorithms can only guarantee the solution to be locally optimal, rather than globally optimal. With these recognitions, we review multiple robust loss functions and propose a new and generalized robust objective for LDA.

Other outcomes for broader impacts:

We have published about 37 full-length papers related to this project in peer-reviewed conference proceedings and journals.

This project supported three Ph.D. students at Colorado School of Mines. Two of them have graduated and the other one is currently a fourth year Ph.D. student in the Department of Computer Science and will graduate in next year with looking for an academic position.

This project also supported sixteen undergraduate REU students. The work from these students (together with his graduate student mentor supported by this project) has led to more than 10 manuscripts published and submitted to a top-tier peer-reviewed journal.

The research materials produced in this project are used in teaching several undergraduate and graduate courses at Colorado School of Mines.


Last Modified: 05/20/2024
Modified by: Hua Wang

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page