Skip to feedback

Award Abstract # 1939945
Collaborative Research: Converging Genomics, Phenomics, and Environments Using Interpretable Machine Learning Models

NSF Org: OAC
Office of Advanced Cyberinfrastructure (OAC)
Recipient: TRUSTEES OF TUFTS COLLEGE
Initial Amendment Date: September 18, 2019
Latest Amendment Date: October 15, 2020
Award Number: 1939945
Award Instrument: Continuing Grant
Program Manager: Reed Beaman
rsbeaman@nsf.gov
 (703)292-7163
OAC
 Office of Advanced Cyberinfrastructure (OAC)
CSE
 Directorate for Computer and Information Science and Engineering
Start Date: October 1, 2019
End Date: September 30, 2023 (Estimated)
Total Intended Award Amount: $299,520.00
Total Awarded Amount to Date: $299,520.00
Funds Obligated to Date: FY 2019 = $149,805.00
FY 2020 = $149,715.00
History of Investigator:
  • Remco Chang (Principal Investigator)
    remco@cs.tufts.edu
Recipient Sponsored Research Office: Tufts University
80 GEORGE ST
MEDFORD
MA  US  02155-5519
(617)627-3696
Sponsor Congressional District: 05
Primary Place of Performance: Tufts University School of Engineering
200 College Ave
Medford
MA  US  02155-5530
Primary Place of Performance
Congressional District:
05
Unique Entity Identifier (UEI): WL9FLBRVPJJ7
Parent UEI: WL9FLBRVPJJ7
NSF Program(s): Infrastructure Capacity for Bi,
HDR-Harnessing the Data Revolu
Primary Program Source: 01001920DB NSF RESEARCH & RELATED ACTIVIT
01002021DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 062Z, 1165
Program Element Code(s): 085Y00, 099Y00
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.070

ABSTRACT

Mitigating the effects of climate change on public health and conservation calls for a better understanding of the dynamic interplay between biological processes and environmental effects. The state-of-the-art, which has led to many important discoveries, utilizes numerical or statistical models for making predictions or performing in silico experimentation, but these techniques struggle to capture the nonlinear response of natural systems. Machine learning (ML) methods are better able to cope with nonlinearity and have been used successfully in biological applications, but several barriers still exist, including the opaque nature of the algorithm output and the absence of ML-ready data. This project seeks to significantly advance technologies in ML and create a new interdisciplinary field, computational ecogenomics. This will be accomplished by designing ML techniques for encoding heterogeneous genomic and environmental data and mapping them to multi-level phenotypic traits, reducing the amount of necessary training data, and then developing interactive visualizations to better interpret ML models and their outputs. These advances will responsibly and transparently inform policy to maximize resources during this crucial window for planetary health, while revealing underlying biological mechanisms of response to stress and evolutionary pressure.

The long-term vision for this project is to develop predictive analytics for organismal response to environmental perturbations using innovative data science approaches and change the way scientists think about gene expression and the environment. The goal for this two-year award is to develop a proof-of-concept for an institute focused on predicting emergent properties of complex systems; an institute that would itself foster the development of many new sub-disciplines. The core of this activity is developing a machine learning framework capable of predicting phenotypes based on multi-scale data about genes and environments. Available data, ranging from simple vectors to complex images to sequences, will be ingested into this framework by applying proven semantic data integration tools and algorithmic data transformation methods. The central hypothesis of this research is that deep learning algorithms and biological knowledge graphs will predict phenotypes more accurately across more taxa and more ecosystems than do current numerical and traditional statistical modeling methods. The rationale for this project is that a timely investment in data science will push through a bottleneck in life science, accelerating discovery of gene-phenotype-environment relationships, and catalyzing a new computational discipline to uncover the complex "rules of life."

This project is part of the National Science Foundation's Harnessing the Data Revolution (HDR) Big Idea activity, and is jointly supported by the HDR and the Division of Biological Infrastructure within the NSF Directorate of Directorate for Biological Sciences.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Appleby, G. and Espadoto, M. and Chen, R. and Goree, S. and Telea, A. C. and Anderson, E. W. and Chang, R. "HyperNP: Interactive Visual Exploration of Multidimensional Projection Hyperparameters" Computer Graphics Forum , v.41 , 2022 https://doi.org/10.1111/cgf.14531 Citation Details
Cashman, Dylan and Xu, Shenyu and Das, Subhajit and Heimerl, Florian and Liu, Cong and Humayoun, Shah Rukh and Gleicher, Michael and Endert, Alex and Chang, Remco "CAVA: A Visual Analytics System for Exploratory Columnar Data Augmentation Using Knowledge Graphs" IEEE Transactions on Visualization and Computer Graphics , v.27 , 2021 https://doi.org/10.1109/TVCG.2020.3030443 Citation Details
Chen, Xi and Zeng, Wei and Lin, Yanna and AI-maneea, Hayder Mahdi and Roberts, Jonathan and Chang, Remco "Composition and Configuration Patterns in Multiple-View Visualizations" IEEE Transactions on Visualization and Computer Graphics , v.27 , 2021 https://doi.org/10.1109/TVCG.2020.3030338 Citation Details
Fisher, Jacob and Chang, Remco and Wu, Eugene "Automatic Y-axis Rescaling in Dynamic Visualizations" 2021 IEEE Visualization Conference (VIS) , 2021 https://doi.org/10.1109/VIS49827.2021.9623319 Citation Details
Gleicher, Michael and Riveiro, Maria and von Landesberger, Tatiana and Deussen, Oliver and Chang, Remco and Gillman, Christina "A Problem Space for Designing Visualizations" IEEE Computer Graphics and Applications , v.43 , 2023 https://doi.org/10.1109/MCG.2023.3267213 Citation Details
He, Edward W and Tolessa, Daniel and Suh, Ashley and Chang, Remco "Analysis Without Data: Teaching Students to Tackle the VAST Challenge" IEEE Workshop on Visualization Guidelines in Research, Design, and Education , 2022 Citation Details
Li, Harry and Appleby, Gabriel and Brumar, Camelia Daniela and Chang, Remco and Suh, Ashley "Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities" IEEE Transactions on Visualization and Computer Graphics , 2023 https://doi.org/10.1109/TVCG.2023.3326904 Citation Details
Mosca, Ab and Ottley, Alvitta and Chang, Remco "Does Interaction Improve Bayesian Reasoning with Visualization?" CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , 2021 https://doi.org/10.1145/3411764.3445176 Citation Details
Suh, Ashley and Mosca, Ab and Robinson, Shannon and Pham, Quinn and Cashman, Dylan and Ottley, Alvitta and Chang, Remco "Inferential Tasks as an Evaluation Technique for Visualization" EuroVis 2022 Short Papers , 2022 Citation Details
Wu, Yifan and Chang, Remco and Hellerstein, Joseph M. and Wu, Eugene "Facilitating Exploration with Interaction Snapshots under High Latency" 2020 IEEE Visualization Conference (VIS) , 2021 https://doi.org/10.1109/VIS47514.2020.00034 Citation Details

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

This report provides a comprehensive overview of the significant progress and achievements made during the recent performance period in a collaborative research effort at Tufts University. The project involves two parallel activities: the development of visualizations and the utilization of knowledge graphs in modeling. These activities aim to contribute to the scientific community, particularly in biology research, by addressing challenges associated with multi-attribute graph data exploration and analysis. The report also highlights major activities, the impact of publications, contributions to the NEON Ecological Forecasting Challenge, and advancements in Graph Neural Network (GNN) visualization.

Tufts University initiated a collaboration with KnowledgeVis, LLC, exploring visualization tools developed by KnowledgeVis for the same datasets used in their research. Additionally, a visualization system supporting users in exploring and interacting with data in a knowledgebase without manually programming complex SPARQL queries was developed. The efforts during this performance period focused on three key activities: developing tools and infrastructure for data analysis with knowledge graphs, providing visualization support for the NEON Ecological Forecasting Challenge, and creating a user interface for Graph Neural Network (GNN).

  • Data Analysis with Knowledge Graphs: Led by a PhD student, the team designed and developed a new visualization system for data analysis, integrating knowledge graphs to aid users in a manner similar to Tableau. Research activities included interviews with domain scientists, literature reviews, and the development of a browser-based prototype visual analytics system.
  • Visualization Support for the NEON Ecological Forecasting Challenge: Another PhD student led efforts to provide visualization support for the NEON Ecological Forecasting Challenge. This included the design of Taylor Diagrams to assess different prediction models and the development of a website to host resulting visualizations and other prediction models.
  • Visualization of Graph Neural Networks (GNNs): We collaborated with biologists to develop a visual analytics tool for creating and analyzing multiple GNN architectures interactively. Noteworthy outcomes include the RekomGNN tool for evaluating the quality of GNN-generated recommendations and the HyperNP technique for interactive parameter-tuning in 2D projections of high-dimensional data.

The project has facilitated the participation of one postdoctoral researcher, four PhD students, and one undergraduate student in a large-scale, multi-disciplinary research effort. Notable achievements include the successful career trajectories of the postdoctoral researcher and the successful completion of two PhD students’ dissertations. The team produced twelve manuscripts during this period, contributing to diverse aspects of research, including knowledge graph practices, predictive analysis communication, multi-view visualizations, and theoretical frameworks for visualization system design.

The research efforts have led to promising directions for future research, particularly in knowledge-graph data and high-dimensional data visualization techniques. Techniques like HyperNP demonstrate the potential of neural network surrogate models for interactive manipulation of visualizations, while the RekomGNN tool illustrates the applicability of explainable GNNs in corporate analytics.

The collaborative research at Tufts University has made significant strides in advancing knowledge and tools for data analysis and visualization. The diverse contributions, impactful publications, and promising research directions underscore the project's significance in shaping the landscape of visualization research and addressing challenges in biology research.


Last Modified: 12/23/2023
Modified by: Remco Chang

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page