NSF Award Search: Award # 2007903

Award Abstract # 2007903

Collaborative Research: FET: Small: Machine Learning Models for Function-on-Function Regression

NSF Org:	CCF Division of Computing and Communication Foundations
Recipient:	TEXAS TECH UNIVERSITY SYSTEM
Initial Amendment Date:	August 4, 2020
Latest Amendment Date:	August 4, 2020
Award Number:	2007903
Award Instrument:	Standard Grant
Program Manager:	Stephanie Gage sgage@nsf.gov (703)292-4748 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering
Start Date:	October 1, 2020
End Date:	September 30, 2025 (Estimated)
Total Intended Award Amount:	$219,983.00
Total Awarded Amount to Date:	$219,983.00
Funds Obligated to Date:	FY 2020 = $219,983.00
History of Investigator:	Ranadip Pal (Principal Investigator) ranadip.pal@ttu.edu
Recipient Sponsored Research Office:	Texas Tech University 2500 BROADWAY LUBBOCK TX US 79409 (806)742-3884
Sponsor Congressional District:	19
Primary Place of Performance:	Texas Tech University 1012 Boston Avenue, Electrical a Lubbock TX US 79409-3102
Primary Place of Performance Congressional District:	19
Unique Entity Identifier (UEI):	EGLKRQ5JBCZ7
Parent UEI:
NSF Program(s):	FET-Fndtns of Emerging Tech
Primary Program Source:	01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s):	7931, 7923
Program Element Code(s):	089Y00
Award Agency Code:	4900
Fund Agency Code:	4900
Assistance Listing Number(s):	47.070

ABSTRACT

Large heterogeneous feature sets are quite common in biological studies such as genetic, transcriptomic, proteomic and metabolomic information and in electronic health records. The goal of personalized medicine is often to link this information to therapeutic responses. Higher accuracy prediction can assist in selecting the most desirable therapy for each individual patient. Some of the latest machine-learning tools, such as deep learning based on convolutional neural networks, have shown great promise in various areas of image-based predictive modeling but are often unsuitable for scenarios involving non-image based large feature sets that appear quite frequently in biological scenarios. The project develops a novel framework termed REFINED (REpresentation of Features as Images with NEighborhood Dependencies) to represent high-dimensional vectors as compact images that increases the accuracy of machine-learning models trained on such datasets and is able to handle heterogeneous feature set as well. Successful implementation of the innovation will assist in the goal of higher-accuracy predictive modeling from biological datasets. The developed algorithms will be made available online in a user-friendly manner. Investigators are deeply involved in educating and training the next generation of students at all levels with attention to minority and underrepresented groups.

The project involves the design of a novel regression framework that can convert scalar and functional predictors into mathematically justifiable image objects that can be processed by convolutional networks based deep-learning methodologies. Preliminary results illustrated on biological datasets show the higher prediction accuracy of the framework as compared to existing methodologies while maintaining desirable properties in terms of bias. The specific project contributions involve (a) an innovative design for representation of high-dimensional scalar features as images with neighborhood dependencies that results in high accuracy predictive modeling using Convolutional Neural Network based deep learning (b) extension of the image-based representation to incorporate functional changes in predictors and outputs. The project also explores the theoretical underpinnings for this new predictive-modeling framework for biological scenarios. The framework can be applied to any biological-prediction problem where the predictors have scalar, functional and/or image attributes. The successful completion of this project will result in a new effective tool for feature representation and function-on-function regression and will be a significant methodology to perform object regression.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Bazgir, Omid and Ghosh, Souparno and Pal, Ranadip "Investigation of REFINED CNN ensemble learning for anti-cancer drug sensitivity prediction" Bioinformatics , v.37 , 2021 https://doi.org/10.1093/bioinformatics/btab336 Citation Details

Nolte, Daniel and Bazgir, Omid and Ghosh, Souparno and Pal, Ranadip and Rattray, ed., Magnus "Federated learning framework integrating REFINED CNN and Deep Regression Forests" Bioinformatics Advances , v.3 , 2023 https://doi.org/10.1093/bioadv/vbad036 Citation Details

Zhang, Ruibo and Ghosh, Souparno and Pal, Ranadip "Predicting binding affinities of emerging variants of SARS-CoV-2 using spike protein sequencing data: observations, caveats and recommendations" Briefings in Bioinformatics , v.23 , 2022 https://doi.org/10.1093/bib/bbac128 Citation Details

Zhang, Ruibo and Nolte, Daniel and Sanchez-Villalobos, Cesar and Ghosh, Souparno and Pal, Ranadip "Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling" Nature Communications , v.15 , 2024 https://doi.org/10.1038/s41467-024-49372-0 Citation Details

Please report errors in award information by writing to: awardsearch@nsf.gov.