Award Abstract # 1546838
TOOLS-PGR: Computational Infrastructure to Enable High-throughput, High-quality Annotations of Compartmentalized Metabolic Networks for Plant Genomes

NSF Org: IOS
Division Of Integrative Organismal Systems
Recipient: CARNEGIE INSTITUTION OF WASHINGTON
Initial Amendment Date: August 19, 2016
Latest Amendment Date: August 30, 2018
Award Number: 1546838
Award Instrument: Continuing Grant
Program Manager: Gerald Schoenknecht
gschoenk@nsf.gov
 (703)292-5076
IOS
 Division Of Integrative Organismal Systems
BIO
 Directorate for Biological Sciences
Start Date: August 15, 2016
End Date: July 31, 2022 (Estimated)
Total Intended Award Amount: $2,193,335.00
Total Awarded Amount to Date: $2,193,335.00
Funds Obligated to Date: FY 2016 = $542,516.00
FY 2017 = $1,104,329.00

FY 2018 = $546,490.00
History of Investigator:
  • Seung Rhee (Principal Investigator)
    rheeseu6@msu.edu
  • Peter Karp (Co-Principal Investigator)
Recipient Sponsored Research Office: Carnegie Institution of Washington
5241 BROAD BRANCH RD NW
WASHINGTON
DC  US  20015-1305
(202)387-6400
Sponsor Congressional District: 00
Primary Place of Performance: Carnegie Institution of Washington
260 Panama Street
Stanford
CA  US  94305-4101
Primary Place of Performance
Congressional District:
16
Unique Entity Identifier (UEI): ZQ12LY4L5H39
Parent UEI:
NSF Program(s): Plant Genome Research Project,
Cross-BIO Activities
Primary Program Source: 01001617DB NSF RESEARCH & RELATED ACTIVIT
01001718DB NSF RESEARCH & RELATED ACTIVIT

01001819DB NSF RESEARCH & RELATED ACTIVIT
Program Reference Code(s): 7577, 9109, 9178, 9251, BIOT
Program Element Code(s): 132900, 727500
Award Agency Code: 4900
Fund Agency Code: 4900
Assistance Listing Number(s): 47.074

ABSTRACT

It has been estimated that agricultural productivity needs to be increased to meet the demands imposed by population growth and climate change. Changing the metabolism of crop species is one way to improve productivity. Thus, increasing our knowledge of plant metabolism can significantly accelerate crop improvement efforts. New DNA sequencing technologies have produced an enormous amount of data. However, it has been difficult to obtain useful metabolic information from those DNA sequences. The plant research community needs efficient tools that can extract information related to metabolism from those DNA sequences. This project will produce the tools and datasets that will be used to systematically characterize the components of metabolism: enzymes, transporters, and pathways. These tools will make it easy to compare the metabolic genetic potential of two or more species, and enable the identification of targets for crop improvement. This project will also offer training opportunities in biochemistry and computer sciences to postdoctoral associates and students. In addition, workshops will be offered at professional meetings to train members of the plant research community on the use of the tools developed by the project. Finally, the tools developed by this project will be made available to the scientific community through a web portal.

Accurate and rapid annotation of metabolic enzymes and transporters from sequenced genomes and their metabolic network reconstructions are essential resources for interpreting the results of 'omics' data systematically and enabling the generation of new hypotheses. This proposal aims to meet these needs by developing a computational pipeline to enable rapid and accurate prediction of genome-scale metabolic complements of any sequenced plant based on the large pool of experimentally characterized information. First, the team will improve the accuracy of enzyme function prediction by adding new classifiers and features to a redesigned machine-learning framework. Additions of new classifiers such as phylogenomics-based function prediction and new features such as conserved protein domain architecture and conserved residues would reduce false positive predictions of proteins that share high sequence similarity with known enzymes but catalyze distinct functions. The team will also develop a new learning­ based algorithm to predict subcellular locations of enzymes and reactions for any plant species. The algorithm will combine the localization likelihoods of enzymes derived from the experimentally determined localization information of their orthologs and the localization information of the neighboring reactions in the metabolic network to propagate the localization likelihoods among all the reactions in the network. Another new algorithm will be developed to predict transporters and the substrates of transporters. All data generated from this project will be integrated into the PMN databases. In addition, a pipeline will be packaged to enable users to submit their genome sequences online and obtain the prediction results through a web server. Finally, innovative, integrated views of metabolic pathways with gene co-expression, transporters and subcellular compartments will be developed.

PUBLICATIONS PRODUCED AS A RESULT OF THIS RESEARCH

Note:  When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

(Showing: 1 - 10 of 30)
Charles Hawkins, Daniel Ginzburg, Kangmei Zhao, William Dwyer, Bo Xue, Angela Xu, Selena Rice, Benjamin Cole, Suzanne Paley, Peter Karp, Seung Yon Rhee "Plant Metabolic Network: A multi-species resource of plant metabolic information" bioRxiv , 2021 10.1101
Cole, Benjamin and Bergmann, Dominique and Blaby-Haas, Crysten E and Blaby, Ian K and Bouchard, Kristofer E and Brady, Siobhan M and Ciobanu, Doina and Coleman-Derr, Devin and Leiboff, Samuel and Mortimer, Jenny C "Plant single-cell solutions for energy and the environment" Communications Biology , v.4 , 2021 , p.962 10.1038/s42003-021-02477-4
Daniel N Ginzburg, Flavia Bossi, Seung Y Rhee "Uncoupling differential water usage from drought resistance in a dwarf Arabidopsis mutant" Plant Physiology , 2022 10.1093/plphys/kiac411
Demirer, Gozde S and Silva, Tallyta N and Jackson, Christopher T and Thomas, Jason B and W. Ehrhardt, David and Rhee, Seung Y and Mortimer, Jenny C and Landry, Markita P "Nanotechnology to advance CRISPR--Cas genetic engineering of plants" Nature Nanotechnology , v.16 , 2021 , p.243--250 10.1038/s41565-021-00854-y
Dorone, Yanniv and Boeynaems, Steven and Jin, Benjamin and Bossi, Flavia and Flores, Eduardo and Lazarus, Elena and Michiels, Emiel and De Decker, Mathias and Baatsen, Pieter and Holehouse, Alex S. and Sukenik, Shahar and Gitler, Aaron D. and Rhee, Seung "Hydration-dependent phase separation of a prion-like protein regulates seed germination during water stress" bioRxiv , 2020 10.1101/2020.08.07.242172
Fan Lin, Elena Z Lazarus, Seung Y Rhee "QTG-Finder2: A Generalized Machine-Learning Algorithm for Prioritizing QTL Causal Genes in Plants" G3: Genes|Genomes|Genetics , v.10 , 2020 , p.2411 10.1534/g3.120.401122
Hawkins, Charles and Ginzburg, Daniel and Zhao, Kangmei and Dwyer, William and Xue, Bo and Xu, Angela and Rice, Selena and Cole, Benjamin and Paley, Suzanne and Karp, Peter and Rhee, Seung Y. "Plant Metabolic Network 15: A resource of genome-wide metabolism databases for 126 plants and algae" Journal of Integrative Plant Biology , v.63 , 2021 , p.1888-1905 https://doi.org/10.1111/jipb.13163
Hye-In Nam, Zaigham Shahzad, Yanniv Dorone, Sophie Clowez, Kangmei Zhao, Nadia Bouain, Katerina S. Lay-Pruitt, Huikyong Cho, Seung Y. Rhee & Hatem Rouached "Interdependent iron and phosphorus availability controls photosynthesis through retrograde signaling" Nature Communications , v.12 , 2021 10.1038/s41467-021-27548-2
Jesse R Walsh, Mary L Schaeffer, Peifen Zhang, Seung Y Rhee, Julie A Dickerson, Taner Z Sen "The quality of metabolic pathway resources depends on initial enzymatic function assignments: a case for maize" BMC Systems Biology , v.10 , 2016 , p.129 10.1186/s12918-016-0369-x
Kangmei Zhao, Seung Rhee "Epigenomic Landscape of Arabidopsis thaliana Metabolism Reveals Bivalent Chromatin on Specialized Metabolic Genes" bioRxiv , 2019 10.1101/589036
Kangmei Zhao, Seung Y.Rhee "Omics-guided metabolic pathway discovery in plants: Resources, approaches, and opportunities" Science Direct , v.67 , 2022 1369-5266
(Showing: 1 - 10 of 30)

PROJECT OUTCOMES REPORT

Disclaimer

This Project Outcomes Report for the General Public is displayed verbatim as submitted by the Principal Investigator (PI) for this award. Any opinions, findings, and conclusions or recommendations expressed in this Report are those of the PI and do not necessarily reflect the views of the National Science Foundation; NSF has not approved or endorsed its content.

During the course of this grant, we accomplished a massive expansion of the Plant Metabolic Network (PMN, https://plantcyc.org), an online database of plant metabolism. We published six major releases, bringing the number of single-species databases from 21 to 126. Each database contains enzymes predicted from the genome using our in-house pipeline E2P2, and reactions, pathways, and compounds implied by that enzyme set, along with varying degrees of curated information from the literature. PMN has added 161 metabolic pathways, 3,654 reactions, 2,777 compounds, and 1,065,384 proteins since the start of the grant. We have also added 8,705 proteins to our reference protein sequence dataset (RPSD) used to make enzyme function predictions.

PMN has been an invaluable tool to many plant biologists, with more than 1,100 unique visitors per month, 1,200 users registered for full-database downloads, and 429 literature citations. Common uses include transforming omics data, interrogating hypotheses about the evolution of plant metabolism, and annotating new genomes using the PMN BLAST feature. Our own group has published 28 papers on plant metabolism with support from this grant, including two major PMN update papers. Twenty-nine people were trained on this grant, including 6 postdocs, 6 postbac research assistants, 3 biocurators, and 14 undergraduate interns. This cohort included 15 women (52%), 15 people of color (52%), and 2 URMs (7%). All are still in STEM fields and many have moved on to the next stage of their careers, including 4 in PhD programs, 2 in MS programs, 4 in industry, 1 government, and 1 academic lab positions.  

We made significant improvements to our database-generation pipeline. The Ensemble Enzyme Prediction Pipeline (E2P2) and the associated RPSD that it uses to make predictions have been kept up to date with new information from the literature. Numerous classifiers have been tested for potential addition to E2P2, and two (the neural network-based DeepEC and the structure-based AlphaFold+TMAlign) have been selected for integration into E2P2 for future releases. We also developed a natural language processing (NLP) machine learning model to identify and assess papers with enzyme function information and differentiate between those whose enzyme classifications are based on experimental data from those based on computational prediction. We used this tool to filter enzyme function information from the BRENDA database for inclusion in the RPSD, to prevent computational predictions from being used to make more computational predictions. The semi-automated validation infrastructure (SAVI) is software that lets biocurators enter rules for inclusion or exclusion of specific pathways from the plant databases based on their phylogenetic placement. We added rules for more than 300 pathways to SAVI, bringing the total number of pathways to 1,352.

Several new website features have been implemented. Pathways that involve transport between cells or cellular compartments now show the membrane and compartments on the pathway display. Virtual PlantCyc is a new feature that allows users to select up to 10 PMN databases (genomes), for which the predicted enzymes will be pulled in, on the fly, and displayed on the PlantCyc pathway diagrams. Virtual PlantCyc also displays colored stack boxes next to each reaction of the pathway, with each box representing a genome, to indicate the presence (colored) or absence (white) of enzyme annotations from a given genome.

A number of new web applications have also been developed and published. Co-Expression Viewer has been created that can be used to view co-expression data for all the genes in a given PMN pathway, drawing data from the ATTED-II plant co-expression database. It supports all nine of the ATTED-II plant species and is accessible from PMN pathway views. Another new site shows the status of genome function annotation for several important model organisms and crops (https://genomeannotation.rheelab.org). A third new site presents plant metabolic clusters for 8 plant species computed using our PlantClusterFinder software (https://metabolicclusterviewer.dpb.carnegiescience.edu).

The project has conducted substantial outreach to the general public. We designed and implemented a program to introduce middle-school students to plant biology, and worked with Thomas R. Pollicita Middle School in Daly City, CA, a school whose student body consists of 98% Black, Indigenous and People of Color communities, to implement the program. More than 800 students across 6 classrooms learned about plant biology and the scientific process through hands-on experiments dissecting and regrowing vegetables. We have also partnered with the Canopy blog (https://canopy.org/blog) to publish 29 tree stories, posts about specific tree species, their history, and their importance to both humans and the environment. There have been 75,000 blog readers over the last 12 months and the Canopy TreEnews eblast is sent to 4,559 subscribers. The tree stories blogs authored by Rhee Lab trainees have received 41,588 views since the partnership began in 2018.


Last Modified: 10/07/2022
Modified by: Seung Rhee

Please report errors in award information by writing to: awardsearch@nsf.gov.

Print this page

Back to Top of page