
NSF Org: |
OAC Office of Advanced Cyberinfrastructure (OAC) |
Recipient: |
|
Initial Amendment Date: | September 18, 2019 |
Latest Amendment Date: | October 15, 2020 |
Award Number: | 1939945 |
Award Instrument: | Continuing Grant |
Program Manager: |
Reed Beaman
rsbeaman@nsf.gov (703)292-7163 OAC Office of Advanced Cyberinfrastructure (OAC) CSE Directorate for Computer and Information Science and Engineering |
Start Date: | October 1, 2019 |
End Date: | September 30, 2023 (Estimated) |
Total Intended Award Amount: | $299,520.00 |
Total Awarded Amount to Date: | $299,520.00 |
Funds Obligated to Date: |
FY 2020 = $149,715.00 |
History of Investigator: |
|
Recipient Sponsored Research Office: |
80 GEORGE ST MEDFORD MA US 02155-5519 (617)627-3696 |
Sponsor Congressional District: |
|
Primary Place of Performance: |
200 College Ave Medford MA US 02155-5530 |
Primary Place of
Performance Congressional District: |
|
Unique Entity Identifier (UEI): |
|
Parent UEI: |
|
NSF Program(s): |
Infrastructure Capacity for Bi, HDR-Harnessing the Data Revolu |
Primary Program Source: |
01002021DB NSF RESEARCH & RELATED ACTIVIT |
Program Reference Code(s): |
|
Program Element Code(s): |
|
Award Agency Code: | 4900 |
Fund Agency Code: | 4900 |
Assistance Listing Number(s): | 47.070 |
ABSTRACT
Mitigating the effects of climate change on public health and conservation calls for a better understanding of the dynamic interplay between biological processes and environmental effects. The state-of-the-art, which has led to many important discoveries, utilizes numerical or statistical models for making predictions or performing in silico experimentation, but these techniques struggle to capture the nonlinear response of natural systems. Machine learning (ML) methods are better able to cope with nonlinearity and have been used successfully in biological applications, but several barriers still exist, including the opaque nature of the algorithm output and the absence of ML-ready data. This project seeks to significantly advance technologies in ML and create a new interdisciplinary field, computational ecogenomics. This will be accomplished by designing ML techniques for encoding heterogeneous genomic and environmental data and mapping them to multi-level phenotypic traits, reducing the amount of necessary training data, and then developing interactive visualizations to better interpret ML models and their outputs. These advances will responsibly and transparently inform policy to maximize resources during this crucial window for planetary health, while revealing underlying biological mechanisms of response to stress and evolutionary pressure.
The long-term vision for this project is to develop predictive analytics for organismal response to environmental perturbations using innovative data science approaches and change the way scientists think about gene expression and the environment. The goal for this two-year award is to develop a proof-of-concept for an institute focused on predicting emergent properties of complex systems; an institute that would itself foster the development of many new sub-disciplines. The core of this activity is developing a machine learning framework capable of predicting phenotypes based on multi-scale data about genes and environments. Available data, ranging from simple vectors to complex images to sequences, will be ingested into this framework by applying proven semantic data integration tools and algorithmic data transformation methods. The central hypothesis of this research is that deep learning algorithms and biological knowledge graphs will predict phenotypes more accurately across more taxa and more ecosystems than do current numerical and traditional statistical modeling methods. The rationale for this project is that a timely investment in data science will push through a bottleneck in life science, accelerating discovery of gene-phenotype-environment relationships, and catalyzing a new computational discipline to uncover the complex "rules of life."
This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity, and is jointly supported by the HDR and the Division of Biological Infrastructure within the NSF Directorate of Directorate for Biological Sciences.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH
Note:
When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external
site maintained by the publisher. Some full text articles may not yet be available without a
charge during the embargo (administrative interval).
Some links on this page may take you to non-federal websites. Their policies may differ from
this site.
PROJECT OUTCOMES REPORT
Disclaimer
This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.
This report provides a comprehensive overview of the significant progress and achievements made during the recent performance period in a collaborative research effort at Tufts University. The project involves two parallel activities: the development of visualizations and the utilization of knowledge graphs in modeling. These activities aim to contribute to the scientific community, particularly in biology research, by addressing challenges associated with multi-attribute graph data exploration and analysis. The report also highlights major activities, the impact of publications, contributions to the NEON Ecological Forecasting Challenge, and advancements in Graph Neural Network (GNN) visualization.
Tufts University initiated a collaboration with KnowledgeVis, LLC, exploring visualization tools developed by KnowledgeVis for the same datasets used in their research. Additionally, a visualization system supporting users in exploring and interacting with data in a knowledgebase without manually programming complex SPARQL queries was developed. The efforts during this performance period focused on three key activities: developing tools and infrastructure for data analysis with knowledge graphs, providing visualization support for the NEON Ecological Forecasting Challenge, and creating a user interface for Graph Neural Network (GNN).
- Data Analysis with Knowledge Graphs: Led by a PhD student, the team designed and developed a new visualization system for data analysis, integrating knowledge graphs to aid users in a manner similar to Tableau. Research activities included interviews with domain scientists, literature reviews, and the development of a browser-based prototype visual analytics system.
- Visualization Support for the NEON Ecological Forecasting Challenge: Another PhD student led efforts to provide visualization support for the NEON Ecological Forecasting Challenge. This included the design of Taylor Diagrams to assess different prediction models and the development of a website to host resulting visualizations and other prediction models.
- Visualization of Graph Neural Networks (GNNs): We collaborated with biologists to develop a visual analytics tool for creating and analyzing multiple GNN architectures interactively. Noteworthy outcomes include the RekomGNN tool for evaluating the quality of GNN-generated recommendations and the HyperNP technique for interactive parameter-tuning in 2D projections of high-dimensional data.
The project has facilitated the participation of one postdoctoral researcher, four PhD students, and one undergraduate student in a large-scale, multi-disciplinary research effort. Notable achievements include the successful career trajectories of the postdoctoral researcher and the successful completion of two PhD students’ dissertations. The team produced twelve manuscripts during this period, contributing to diverse aspects of research, including knowledge graph practices, predictive analysis communication, multi-view visualizations, and theoretical frameworks for visualization system design.
The research efforts have led to promising directions for future research, particularly in knowledge-graph data and high-dimensional data visualization techniques. Techniques like HyperNP demonstrate the potential of neural network surrogate models for interactive manipulation of visualizations, while the RekomGNN tool illustrates the applicability of explainable GNNs in corporate analytics.
The collaborative research at Tufts University has made significant strides in advancing knowledge and tools for data analysis and visualization. The diverse contributions, impactful publications, and promising research directions underscore the project's significance in shaping the landscape of visualization research and addressing challenges in biology research.
Last Modified: 12/23/2023
Modified by: Remco Chang
Please report errors in award information by writing to: awardsearch@nsf.gov.