
NSF Org: |
CCF Division of Computing and Communication Foundations |
Recipient: |
|
Initial Amendment Date: | March 3, 2015 |
Latest Amendment Date: | February 11, 2019 |
Award Number: | 1452795 |
Award Instrument: | Continuing Grant |
Program Manager: |
Mitra Basu
mbasu@nsf.gov (703)292-8649 CCF Division of Computing and Communication Foundations CSE Directorate for Computer and Information Science and Engineering |
Start Date: | March 1, 2015 |
End Date: | February 28, 2022 (Estimated) |
Total Intended Award Amount: | $540,000.00 |
Total Awarded Amount to Date: | $540,000.00 |
Funds Obligated to Date: |
FY 2016 = $96,572.00 FY 2017 = $128,528.00 FY 2018 = $130,607.00 FY 2019 = $126,726.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
940 GRACE HALL NOTRE DAME IN US 46556-5708 (574)631-7432 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
940 Grace Hall NOTRE DAME IN US 46556-5708 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Algorithmic Foundations, Computational Biology |
Primary Program Source: |
01001617DB NSF RESEARCH & RELATED ACTIVIT 01001718DB NSF RESEARCH & RELATED ACTIVIT 01001819DB NSF RESEARCH & RELATED ACTIVIT 01001920DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Broader significance and importance. Proteins are major macromolecules of life. Thus, understanding how proteins function in the cell is critical. Genomic sequence research has revolutionized understanding of cellular functioning. However, as recognized in the post-genomic era, genes (proteins) do not function in isolation. Instead, they carry out cellular processes by interacting with each other. This is exactly what biological networks model. Unlike genomic sequence data, biological network data enable the study of complex cellular processes that emerge from the collective behavior of the proteins. Thus, biological network research is promising to give new insights into principles of life, evolution, disease, and therapeutics. However, current network research deals with static representations of biological data, even though cellular functioning is dynamic. This is in part due to unavailability of experimentally-derived dynamic biological network data, owing to limitations of biotechnologies for data collection. Efficient computational strategies for both inference and analysis of dynamic biological networks are needed to advance understanding of cellular functioning compared to static biological network research. This is exactly the focus of this project. Dynamic biological network research has biological applications of societal importance, such as studying cellular changes with disease progression, drug treatment, or age, which will be explored as a part of this project. Thus, the project could contribute to global health. It may impact other domains as well, e.g., social networks. Also, this project will result in educational activities that are intertwined with its research, such as forming interdisciplinary scientists via novel curriculum development activities, or strengthening the computer science population via research supervision, career mentoring, and community outreach to K-12 and (under)graduate students, focusing on women.
Technical description. This proposal will result in new computational directions for dynamic biological network research. New algorithms will be developed for inference of systems-level biological networks underlying a dynamic biological process, by combining the static network topology with other data types, such as measurements of gene expression or protein abundance at different times. Then, novel methods for analyzing the dynamic network data will be developed to gain insights into the underlying cellular changes. For example, the idea of graphlets (small subgraphs), which has been well established in static biological network research, will be taken to the next level to allow for graphlet-based analyses of dynamic biological networks. Also, novel computational strategies will be designed to allow for dynamic network clustering. The proposed methods will be used in collaborative applications that encompass representative dynamic biological processes: early cancer detection and chemotherapy resistance, both in the context of pancreatic cancer, as well as studying human aging. These interdisciplinary applications will be used as concrete model systems to innovate fundamental computational research. Because network research spans many domains, open-source software implementing the new methods will be offered to researchers from diverse disciplines. The software will also serve as an educational tool. Integration of research and education will be promoted even further. Interdisciplinary student training will be offered via novel courses on network research. A literate approach to education will aim to advance students' communication skills. Proven pedagogical strategies will be used to improve student learning. Research supervision and career mentoring will be offered to K-12 and (under)graduate students, with focus on minorities and women, thus integrating diversity into the project. Interdisciplinary research and educational collaborations will allow for wide distribution of the proposed ideas and results. The results will also be disseminated through tutorial and workshop organization at renowned international conferences.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
Genes produce proteins that interact with each other to carry out cellular functioning. Thus, the field of network biology, which deals with inference and analysis of protein-protein interaction networks and other types of biological networks, has a potential to revolutionize our understanding of cellular functioning, evolution, disease, and therapeutics. Traditional biological network research has dealt with static representations of biological -omics data, even though cellular processes are dynamic. To address this gap, as its intellectual merit, this project introduced efficient and accurate computational approaches (or algorithms) for inference and analysis of dynamic biological networks. As an example of the project's broader impact, the new approaches were evaluated in an application of studying human aging as a representative dynamic biological process of importance to global health, because aging is known to associate with many prevalent diseases. Because network research spans many domains, open-source software implementing the new approaches was provided as part of their publications, and also, the new approaches were evaluated on other types of real-world networks. A prominent example of a non-biological application under this project was evaluating the predictive power of dynamic social interactions on people's mental health.
In more detail, key novelties of this project regarding inference of a dynamic biological network are follows. The aging process is highly influenced by genetic factors. Hence, it is important to identify human aging-related genes. Doing this via wet lab experiments is hard because of ethical constraints and long human life span. So, computational identification (i.e., prediction) of aging-related genes has received significant attention. Gene expression-based methods for this purpose do account for aging-specific information (which genes are "active" at which age) but not for interactions between genes' protein products. On the other hand, protein-protein interaction (PPI) network-based methods for this purpose account for these interactions, but current PPI network data are context-unspecific, spanning different biological conditions (in this project, different ages). Instead, this project dealt with integration of aging-specific gene expression data and context-unspecific PPI network data into an aging-specific PPI subnetwork. Then, it demonstrated through a series of studies and methodological innovations for inference of such a subnetwork (first static then dynamic subnetwork, first unweighted then weighted subnetwork, first using older then newer gene expression and PPI data, etc.) that analyzing a weighted dynamic aging-specific subnetwork yields higher accuracy when predicting aging-related genes than using a weighted or static aging-specific subnetwork as well as the entire context-unspecific PPI network. So, this project's computational approaches for inference of a weighted dynamic aging-specific subnetwork could guide with higher confidence than the existing approaches the discovery of novel aging-related gene candidates for future wet lab validation. Importantly, beyond "just" aging, the approaches are generalizable to any dynamic biological process, including disease progression.
Key novelties of this project regarding analysis of an (inferred or existing, biological or non-biological) dynamic network are as follows. The idea of graphlets (small subgraphs, basic building blocks of real-world networks), which had been well established in static biological network research, was extended to allow for graphlet-based analyses of dynamic biological networks. The dynamic graphlets were then used in several downstream computational tasks, including clustering of a dynamic network (as opposed to traditional static network clustering). Specifically, two types of clustering approaches were developed. One questioned the traditional assumption that it is densely interconnected nodes in a dynamic network that should form a cluster (or functional module) and instead argued that it is topologically similar nodes that should be clustered together. The other approach argued that traditionally the problem of dynamic network clustering (or community detection) is approached by assuming either that each time point has a distinct community organization or that all time points share a single community organization, but that the reality likely lies between these two extremes. To find the compromise, this approach simultaneously partitions a dynamic network into contiguous time segments with consistent community organization and finds this community organization for each segment. Additional methodological novelties were proposed when predicting human aging-related genes, including new approaches for supervised node classification in a weighted dynamic network, as traditional approaches for this task can handle only unweighted or static networks.
As another example of this project's broader impact, research supervision and career mentoring were offered to 7 graduate students, 11 undergraduate students, and a high school student. Of the 19 students, 47% are women. Community outreach was organized to local middle school girls and high school students. Namely, workshops on network research were offered at the "Expanding Your Horizons in Science and Mathematics" career conference for middle school girls annually since 2015 and until the pandemic hit in 2020, with about 40 girls participating each year. Also, a collaboration was carried out with a local biology high school teacher to actively engage with their students with the goal of attracting them to the field of computing.
Last Modified: 06/20/2022
Modified by: Tijana Milenkovic
Please report errors in award information by writing to: awardsearch@nsf.gov.