Skip to feedback

Award Abstract # 2125218
III: Small: Integrated prediction of intrinsic disorder and disorder functions with modular multi-label deep learning

NSF Org: IIS
Division of Information & Intelligent Systems
Recipient: VIRGINIA COMMONWEALTH UNIVERSITY
Initial Amendment Date: August 31, 2021
Latest Amendment Date: August 31, 2021
Award Number: 2125218
Award Instrument: Standard Grant
Program Manager: Sylvia Spengler
sspengle@nsf.gov
 (703)292-7347
IIS
 Division of Information & Intelligent Systems
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2021
End Date: September 30, 2025 (Estimated)
Total Intended Award Amount: $500,000.00
Total Awarded Amount to Date: $500,000.00
Funds Obligated to Date: FY 2021 = $500,000.00
History of Investigator:
  • Lukasz Kurgan (Principal Investigator)
    lkurgan@vcu.edu
Recipient Sponsored Research Office: Virginia Commonwealth University
910 WEST FRANKLIN ST
RICHMOND
VA  US  23284-9005
(804)828-6772
Sponsor Congressional District: 04
Primary Place of Performance: Virginia Commonwealth University
401 West Main Street
Richmond
VA  US  23298-0568
Primary Place of Performance
Congressional District:
04
Unique Entity Identifier (UEI): MLQFL4JSSAA9
Parent UEI: WXQLZ1PA6XP3
NSF Program(s): Info Integration & Informatics
Primary Program Source: 01002122DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 062Z, 7364, 7923
Program Element Code(s): 736400
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Proteins are remarkable biological machines. Hundreds of millions of protein sequences were decoded over the last two decades creating a significant knowledge gap related to the fact that we do not know what most of them do. A common way to decipher protein functions relies on the sequence-to-structure-to-function paradigm where protein function is learned from the protein structure that is produced from the sequence. However, recent research has identified a large family of the intrinsically disordered proteins that lack a stable structure under physiological conditions and which therefore cannot be characterized using the structure-based approaches. These proteins are particularly abundant in the eukaryotes and are involved in the pathogenesis of numerous human diseases. The discovery of the intrinsically disordered proteins has prompted the development of a new generation of computational methods that predict presence of intrinsic disorder directly from protein sequences. A recently completed Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment has shown that these methods are fast and provide accurate results. However, while intrinsic disorder can be readily and accurately identified in protein sequences, its function remains a mystery. This proposal will conceptualize, design, implement, test and deploy an innovative machine learning method that provides highly accurate and integrated predictions of disorder and disorder functions directly from protein sequences. The team will utilize this method to produce functional annotations of disorder on an unprecedented scale of dozens of millions of proteins, addressing the knowledge gap problem for this protein family. In the long run this project will advance understanding of fundamental biological processes and related human health issues in the context of the intrinsically disordered proteins. This project will also train STEM students and researchers via high-school outreach and multidisciplinary teaching and mentoring of undergraduate and graduate students and postdoctoral researchers, producing highly skilled researchers who are sought after by industry and academia.

An interdisciplinary and challenging problem of the structure of intrinsically disorder protein structure at the intersection of bioinformatics and machine learning fields is addressed by the team. Building on expertise in the computational analysis of intrinsic disorder and with focus on technical innovation, this project will deliver a novel deep sequential multi-label transformer architecture that provides accurate predictions of disorder and disorder functions. The solution will be designed to accommodate for the biological underpinnings of protein data, such as the inherently multi-label outcomes, imbalanced labels and sequential nature of protein data. Moreover, this architecture will feature modular design to facilitate transfer to other areas of protein and nucleic acids bioinformatics. The resulting method will be extensively benchmarked and disseminated to maximize impact. The code will be deposited into relevant public repositories and pre-computed functional annotations of intrinsic disorder will be made available using modern online resources, such as data repositories and webservers, in order to meet the needs of a broad spectrum of users including biologists, biochemist, biophysicists and bioinformaticians.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 19)
Basu, Sushmita and Gsponer, Jörg and Kurgan, Lukasz "DEPICTER2: a comprehensive webserver for intrinsic disorder and disorder function prediction" Nucleic Acids Research , v.51 , 2023 https://doi.org/10.1093/nar/gkad330 Citation Details
Basu, Sushmita and Hegeds, Tamás and Kurgan, Lukasz "CoMemMoRFPred: Sequence-based Prediction of MemMoRFs by Combining Predictors of Intrinsic Disorder, MoRFs and Disordered Lipid-binding Regions" Journal of Molecular Biology , v.435 , 2023 https://doi.org/10.1016/j.jmb.2023.168272 Citation Details
Basu, Sushmita and Kihara, Daisuke and Kurgan, Lukasz "Computational prediction of disordered binding regions" Computational and Structural Biotechnology Journal , v.21 , 2023 https://doi.org/10.1016/j.csbj.2023.02.018 Citation Details
Basu, Sushmita and Kurgan, Lukasz "Taxonomy-specific assessment of intrinsic disorder predictions at residue and region levels in higher eukaryotes, protists, archaea, bacteria and viruses" Computational and Structural Biotechnology Journal , v.23 , 2024 https://doi.org/10.1016/j.csbj.2024.04.059 Citation Details
Basu, Sushmita and Yu, Jing and Kihara, Daisuke and Kurgan, Lukasz "Twenty years of advances in prediction of nucleic acid-binding residues in protein sequences" Briefings in Bioinformatics , v.26 , 2025 https://doi.org/10.1093/bib/bbaf016 Citation Details
Basu, Sushmita and Zhao, Bi and Biró, Bálint and Faraggi, Eshel and Gsponer, Jörg and Hu, Gang and Kloczkowski, Andrzej and Malhis, Nawar and Mirdita, Milot and Söding, Johannes and Steinegger, Martin and Wang, Duolin and Wang, Kui and Xu, Dong and Zhang, "DescribePROT in 2023: more, higher-quality and experimental annotations and improved data download options" Nucleic Acids Research , v.52 , 2023 https://doi.org/10.1093/nar/gkad985 Citation Details
Biró, Bálint and Zhao, Bi and Kurgan, Lukasz "Complementarity of the residue-level protein function and structure predictions in human proteins" Computational and Structural Biotechnology Journal , v.20 , 2022 https://doi.org/10.1016/j.csbj.2022.05.003 Citation Details
Kurgan, Lukasz "Resources for computational prediction of intrinsic disorder in proteins" Methods , v.204 , 2022 https://doi.org/10.1016/j.ymeth.2022.03.018 Citation Details
Kurgan, Lukasz and Hu, Gang and Wang, Kui and Ghadermarzi, Sina and Zhao, Bi and Malhis, Nawar and Erds, Gábor and Gsponer, Jörg and Uversky, Vladimir N. and Dosztányi, Zsuzsanna "Tutorial: a guide for the selection of fast and accurate computational tools for the prediction of intrinsic disorder in proteins" Nature Protocols , v.18 , 2023 https://doi.org/10.1038/s41596-023-00876-x Citation Details
Song, Jiangning and Kurgan, Lukasz "Two decades of advances in sequence-based prediction of MoRFs, disorder-to-order transitioning binding regions" Expert Review of Proteomics , v.22 , 2025 https://doi.org/10.1080/14789450.2025.2451715 Citation Details
Uversky, Vladimir N. and Kurgan, Lukasz "Overview Update: Computational Prediction of Intrinsic Disorder in Proteins" Current Protocols , v.3 , 2023 https://doi.org/10.1002/cpz1.802 Citation Details
(Showing: 1 - 10 of 19)

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page